Multiset

From Free net encyclopedia

In mathematics, a multiset (sometimes also called a bag) differs from a set in that each member has a multiplicity, which is a natural number indicating (loosely speaking) how many times it is a member, or perhaps how many memberships it has in the multiset. For example, in the multiset { a, a, b, b, b, c }, the multiplicities of the members a, b, and c are respectively 2, 3, and 1.

Contents

Formal definition

Within set theory, a multiset can be formally defined as a pair (A, m) where A is some set and m : AN is a function from A to the set N of (positive) natural numbers. The set A is called the underlying set of elements. For each a in A the multiplicity (that is, number of occurrences) of a is the number m(a).

It is common to write the function m as a set of ordered pairs { (a, m(a)) : aA } — indeed, this is the set-theoretic definition of the function m. For example,

  • the multiset written as { a, b, b } is defined as { (a, 1), (b, 2) },
  • likewise { a, a, b } is defined as { (a, 2), (b, 1) }, and
  • the multiset { a, b } is defined as { (a, 1), (b, 1) }.
  • an indexed family (mathematics), ( ai ), where i is in some index-set, defines the multiset { ai } , (provided no element occurs more than a finite number of times in the family. Even in an infinite multiset the multiplicities must be finite numbers).

If the set A is finite, the size or length of the multiset (A, m) is the sum of all multiplicities for each element of A:

<math>|(A,m)|=\sum_{a\in A}m(a).</math>

This is the size of A counted with multiplicity, while the size of A counted without multiplicity is

<math>|A|=\sum_{a\in A}1.</math>

A submultiset (B, n) of a multiset (A, m) is a subset BA and a function n : BN such that n(a) ≤ m(a).

Examples

One of the most natural and simple examples is the multiset of prime factors of a number n. Here the underlying set of elements is the set of prime divisors of n. For example the number 120 has the prime factorization

<math>120 = 2^3 3^1 5^1</math>

which gives the multiset {2, 2, 2, 3, 5}.

Another is the multiset of solutions of an algebraic equation. Everyone learns in secondary school that a quadratic equation has two solutions, but in some cases they are both the same number. Thus the multiset of solutions of the equation could be { 3, 5 }, or it could be { 4, 4 }. In the latter case it has a solution of multiplicity 2.

Operations

The usual set operations such as union, intersection and Cartesian product can be easily generalized for multisets.

Suppose (A, m) and (B, n) are multisets

  • The union can be defined as (AB, f) where f(x) = max{m(x), n(x)}.
  • The intersection can be defined as (AB, f) where f(x) = min{m(x), n(x)}.
  • The cartesian product can be defined as (A × B, f) where f((x,y)) = m(x)n(y).
  • The join (sometimes called the sum) can be defined as <math>(A,m) \biguplus (B,n) = (A \cup B, f) </math> where f(x) = m(x)+n(x).

Multiset coefficients

The number of submultisets of size k in a set of size n is the multiset coefficient

<math>\left\langle \begin{matrix}n \\ k \end{matrix}\right\rangle = {n + k - 1 \choose n-1}={n+k-1 \choose k},</math>

where the expressions to the right of "=" are binomial coefficients, i.e., the number of such multisets is the same as the number of subsets of size k in a set of size n + k − 1. Unlike the situation with sets, this cardinality will not be 0 when k > n. One simple way to prove this involves representing multisets in the following way. First, consider the notation for multisets that would represent { a, a, a, a, a, a, b, b, c, c, c, d, d, d, d, d, d, d } (6 as, 2 bs, 3 cs, 7 ds) in this form:

<math>\bullet \bullet \bullet \bullet \bullet \bullet \mid \bullet \bullet \mid \bullet \bullet \bullet \mid \bullet \bullet \bullet \bullet \bullet \bullet \bullet </math>

This is a multiset of size 18 made of elements of a set of size 4. The number of characters including both dots and vertical lines used in this notation is 18 + 4 − 1. The number of vertical lines is 4 − 1. The number of multisets of size 18 is then the number of ways to arrange the 4 − 1 vertical lines among the 18 + 4 − 1 characters, and is thus the number of subsets of size 4 − 1 in a set of size 18 + 4 − 1. Equivalently, it is the number of ways to arrange the 18 dots among the 18 + 4 − 1 characters, which is the number of subsets of size 18 of a set of size 18 + 4 − 1. This is

<math>{18+4-1 \choose 4-1}={18+4-1 \choose 18},</math>

so that is the value of the multiset coefficient

<math>\left\langle\begin{matrix} 4 \\ 18 \end{matrix}\right\rangle.</math>

One may define a generalized binomial coefficient

<math>{n \choose k}={n(n-1)(n-2)\cdots(n-k+1) \over k!}</math>

in which n is not required to be a nonnegative integer, but may be negative or a non-integer, or a non-real complex number. (If k = 0, then the value of this coefficient is 1 because it is the product of no numbers.) Then the number of multisets of size k in a set of size n is

<math>\left\langle\begin{matrix} n \\ k \end{matrix}\right\rangle=(-1)^k{-n \choose k}.</math>

This fact led Gian-Carlo Rota to ask "Why are negative sets multisets?". He considered that question worthy of the attention of philosophers of mathematics.

Polynomial notation

The set {x} may be represented by the monomial x. The set of subsets, { {}, {x} } , is represented by the binomial 1 + x.

The set {x,y} may be represented by the monomial xy. The set of subsets, { {}, {x}, {y}, {x,y} } , is represented by the polynomial

(1 + x)(1 + y) = 1 + (x + y) + xy.

The multiset {x,x} may be represented by the monomial xx = x2. The multiset of submultisets, { {}, {x}, {x}, {x,x} } , is represented by the polynomial

(1 + x)2 = 1 + 2x + x2.

The multiset of submultisets of the multiset xn is

<math>(1+x)^n=\sum_{k=0}^n{n \choose k} x^k.</math>

That is why the binomial coefficient count the number of k-combinations of an n-set.

The multiset xKyN−K , containing N elements, K of which are hits, is called a statistical population, and a submultiset is called a statistical sample. The set of samples is

(1 + x)K(1 + y)N−K

which by the binomial theorem equals

<math>\sum_{n=0}^N\sum_{k=0}^K{K \choose k}{N-K \choose n-k}x^ky^{n-k}.</math>

So the number of n-samples with k hits is

<math>{K \choose k}{N-K \choose n-k}</math>

See hypergeometric distribution and inferential statistics for further on the distribution of hits.

The infinite set of finite multisets of elements from the set {x}:

{ {}, {x}, {x,x}, {x,x,x}, ... }

may be represented by the formal power series

S = 1 + x + x2 + x3 + ... = 1 + xS .

The formal solution,

S = (1 − x)−1,

makes sense as a set of multisets, but the intermediate result, 1−x, does not make sense as a set of multisets.

The infinite set of finite multisets of elements from the set xy is

(1 − x)−1(1 − y)−1 = 1 + (x + y) + (x2 + xy + y2) + ...

The special case y=x : The infinite multiset of finite multisets of elements from the multiset x2 is

(1 − x)−2 =  1 + 2x + 3x2 + ...

The general case: The infinite multiset of finite multisets of elements from the multiset xn is

<math>(1-x)^{-n}=\sum_{k=0}^\infty{-n \choose k} (-x)^k</math> .

This explains why "multisets are negative sets". The negative binomial coefficients count the number of k-multisets of elements from an n-set.

Cumulant generating function

A natural number, n, can be represented by the monomial xn . So a finite multiset of natural numbers, say {2, 2, 2, 3, 5}, can be represented by a polynomial f(x), say 3x2+x3+x5 . It is convenient to consider the function g(s) = log(f(es)), say g(s) = log(3e2s+e3s+e5s) . The number of elements in the multiset is eg(0) . The mean value of the elements in the multiset is the derivative μ = g'(0) . The variance, i.e. the square of the standard deviation, of the elements in the multiset is σ2 = g''(0). The numbers ( μ, σ2, ... ) = ( g'(0), g''(0), ... ) are called cumulants, and so g is called the cumulant-generating function.

The set of nonnegative integers is infinite, and the mean value and standard deviation are undefined. Nevertheless the set has a cumulant-generating function, −log(1−es) , for s < 0.

A multiset of real numbers can also be represented by a cumulant-generating function, even if it cannot be represented by a polynomial. See partition function (statistical mechanics) for the case where the numbers in question are the energy levels of a physical system.

The cumulant-generating function of the multiset of sums of elements of two multisets of numbers is the sum of the two cumulant-generating functions.

<math>g_{a+b}(s) = \log \sum_{i,j} e^{s(a_i+b_j)} = \log \sum_{i,j} e^{sa_i}e^{sb_j} = \log \sum_i e^{sa_i}\sum_je^{sb_j} = \log \sum_i e^{sa_i}+ \log\sum_je^{sb_j} = g_{a}(s)+g_{b}(s) . </math>

That is the reason why the cumulant generating function is a convenient representation for multisets of numbers, and for probability distributions.

Free commutative monoids

There is a connection with the free object concept: the free commutative monoid on a set X can be taken to be the set of finite multisets with elements drawn from X, with the obvious addition operation.de:Multimenge es:Multiconjunto nl:Multiset pl:Multizbiór sl:Večkratna množica uk:Мультимножина