Multiset
From Free net encyclopedia
In mathematics, a multiset (sometimes also called a bag) differs from a set in that each member has a multiplicity, which is a natural number indicating (loosely speaking) how many times it is a member, or perhaps how many memberships it has in the multiset. For example, in the multiset { a, a, b, b, b, c }, the multiplicities of the members a, b, and c are respectively 2, 3, and 1.
Contents |
Formal definition
Within set theory, a multiset can be formally defined as a pair (A, m) where A is some set and m : A → N is a function from A to the set N of (positive) natural numbers. The set A is called the underlying set of elements. For each a in A the multiplicity (that is, number of occurrences) of a is the number m(a).
It is common to write the function m as a set of ordered pairs { (a, m(a)) : a ∈ A } — indeed, this is the set-theoretic definition of the function m. For example,
- the multiset written as { a, b, b } is defined as { (a, 1), (b, 2) },
- likewise { a, a, b } is defined as { (a, 2), (b, 1) }, and
- the multiset { a, b } is defined as { (a, 1), (b, 1) }.
- an indexed family (mathematics), ( ai ), where i is in some index-set, defines the multiset { ai } , (provided no element occurs more than a finite number of times in the family. Even in an infinite multiset the multiplicities must be finite numbers).
If the set A is finite, the size or length of the multiset (A, m) is the sum of all multiplicities for each element of A:
- <math>|(A,m)|=\sum_{a\in A}m(a).</math>
This is the size of A counted with multiplicity, while the size of A counted without multiplicity is
- <math>|A|=\sum_{a\in A}1.</math>
A submultiset (B, n) of a multiset (A, m) is a subset B ⊆ A and a function n : B → N such that n(a) ≤ m(a).
Examples
One of the most natural and simple examples is the multiset of prime factors of a number n. Here the underlying set of elements is the set of prime divisors of n. For example the number 120 has the prime factorization
- <math>120 = 2^3 3^1 5^1</math>
which gives the multiset {2, 2, 2, 3, 5}.
Another is the multiset of solutions of an algebraic equation. Everyone learns in secondary school that a quadratic equation has two solutions, but in some cases they are both the same number. Thus the multiset of solutions of the equation could be { 3, 5 }, or it could be { 4, 4 }. In the latter case it has a solution of multiplicity 2.
Operations
The usual set operations such as union, intersection and Cartesian product can be easily generalized for multisets.
Suppose (A, m) and (B, n) are multisets
- The union can be defined as (A ∪ B, f) where f(x) = max{m(x), n(x)}.
- The intersection can be defined as (A ∩ B, f) where f(x) = min{m(x), n(x)}.
- The cartesian product can be defined as (A × B, f) where f((x,y)) = m(x)n(y).
- The join (sometimes called the sum) can be defined as <math>(A,m) \biguplus (B,n) = (A \cup B, f) </math> where f(x) = m(x)+n(x).
Multiset coefficients
The number of submultisets of size k in a set of size n is the multiset coefficient
- <math>\left\langle \begin{matrix}n \\ k \end{matrix}\right\rangle = {n + k - 1 \choose n-1}={n+k-1 \choose k},</math>
where the expressions to the right of "=" are binomial coefficients, i.e., the number of such multisets is the same as the number of subsets of size k in a set of size n + k − 1. Unlike the situation with sets, this cardinality will not be 0 when k > n. One simple way to prove this involves representing multisets in the following way. First, consider the notation for multisets that would represent { a, a, a, a, a, a, b, b, c, c, c, d, d, d, d, d, d, d } (6 as, 2 bs, 3 cs, 7 ds) in this form:
- <math>\bullet \bullet \bullet \bullet \bullet \bullet \mid \bullet \bullet \mid \bullet \bullet \bullet \mid \bullet \bullet \bullet \bullet \bullet \bullet \bullet </math>
This is a multiset of size 18 made of elements of a set of size 4. The number of characters including both dots and vertical lines used in this notation is 18 + 4 − 1. The number of vertical lines is 4 − 1. The number of multisets of size 18 is then the number of ways to arrange the 4 − 1 vertical lines among the 18 + 4 − 1 characters, and is thus the number of subsets of size 4 − 1 in a set of size 18 + 4 − 1. Equivalently, it is the number of ways to arrange the 18 dots among the 18 + 4 − 1 characters, which is the number of subsets of size 18 of a set of size 18 + 4 − 1. This is
- <math>{18+4-1 \choose 4-1}={18+4-1 \choose 18},</math>
so that is the value of the multiset coefficient
- <math>\left\langle\begin{matrix} 4 \\ 18 \end{matrix}\right\rangle.</math>
One may define a generalized binomial coefficient
- <math>{n \choose k}={n(n-1)(n-2)\cdots(n-k+1) \over k!}</math>
in which n is not required to be a nonnegative integer, but may be negative or a non-integer, or a non-real complex number. (If k = 0, then the value of this coefficient is 1 because it is the product of no numbers.) Then the number of multisets of size k in a set of size n is
- <math>\left\langle\begin{matrix} n \\ k \end{matrix}\right\rangle=(-1)^k{-n \choose k}.</math>
This fact led Gian-Carlo Rota to ask "Why are negative sets multisets?". He considered that question worthy of the attention of philosophers of mathematics.
Polynomial notation
The set {x} may be represented by the monomial x. The set of subsets, { {}, {x} } , is represented by the binomial 1 + x.
The set {x,y} may be represented by the monomial xy. The set of subsets, { {}, {x}, {y}, {x,y} } , is represented by the polynomial
- (1 + x)(1 + y) = 1 + (x + y) + xy.
The multiset {x,x} may be represented by the monomial xx = x2. The multiset of submultisets, { {}, {x}, {x}, {x,x} } , is represented by the polynomial
- (1 + x)2 = 1 + 2x + x2.
The multiset of submultisets of the multiset xn is
- <math>(1+x)^n=\sum_{k=0}^n{n \choose k} x^k.</math>
That is why the binomial coefficient count the number of k-combinations of an n-set.
The multiset xKyN−K , containing N elements, K of which are hits, is called a statistical population, and a submultiset is called a statistical sample. The set of samples is
- (1 + x)K(1 + y)N−K
which by the binomial theorem equals
- <math>\sum_{n=0}^N\sum_{k=0}^K{K \choose k}{N-K \choose n-k}x^ky^{n-k}.</math>
So the number of n-samples with k hits is
- <math>{K \choose k}{N-K \choose n-k}</math>
See hypergeometric distribution and inferential statistics for further on the distribution of hits.
The infinite set of finite multisets of elements from the set {x}:
- { {}, {x}, {x,x}, {x,x,x}, ... }
may be represented by the formal power series
- S = 1 + x + x2 + x3 + ... = 1 + xS .
The formal solution,
- S = (1 − x)−1,
makes sense as a set of multisets, but the intermediate result, 1−x, does not make sense as a set of multisets.
The infinite set of finite multisets of elements from the set xy is
- (1 − x)−1(1 − y)−1 = 1 + (x + y) + (x2 + xy + y2) + ...
The special case y=x : The infinite multiset of finite multisets of elements from the multiset x2 is
- (1 − x)−2 = 1 + 2x + 3x2 + ...
The general case: The infinite multiset of finite multisets of elements from the multiset xn is
- <math>(1-x)^{-n}=\sum_{k=0}^\infty{-n \choose k} (-x)^k</math> .
This explains why "multisets are negative sets". The negative binomial coefficients count the number of k-multisets of elements from an n-set.
Cumulant generating function
A natural number, n, can be represented by the monomial xn . So a finite multiset of natural numbers, say {2, 2, 2, 3, 5}, can be represented by a polynomial f(x), say 3x2+x3+x5 . It is convenient to consider the function g(s) = log(f(es)), say g(s) = log(3e2s+e3s+e5s) . The number of elements in the multiset is eg(0) . The mean value of the elements in the multiset is the derivative μ = g'(0) . The variance, i.e. the square of the standard deviation, of the elements in the multiset is σ2 = g''(0). The numbers ( μ, σ2, ... ) = ( g'(0), g''(0), ... ) are called cumulants, and so g is called the cumulant-generating function.
The set of nonnegative integers is infinite, and the mean value and standard deviation are undefined. Nevertheless the set has a cumulant-generating function, −log(1−es) , for s < 0.
A multiset of real numbers can also be represented by a cumulant-generating function, even if it cannot be represented by a polynomial. See partition function (statistical mechanics) for the case where the numbers in question are the energy levels of a physical system.
The cumulant-generating function of the multiset of sums of elements of two multisets of numbers is the sum of the two cumulant-generating functions.
- <math>g_{a+b}(s) = \log \sum_{i,j} e^{s(a_i+b_j)} = \log \sum_{i,j} e^{sa_i}e^{sb_j} = \log \sum_i e^{sa_i}\sum_je^{sb_j} = \log \sum_i e^{sa_i}+ \log\sum_je^{sb_j} = g_{a}(s)+g_{b}(s) . </math>
That is the reason why the cumulant generating function is a convenient representation for multisets of numbers, and for probability distributions.
Free commutative monoids
There is a connection with the free object concept: the free commutative monoid on a set X can be taken to be the set of finite multisets with elements drawn from X, with the obvious addition operation.de:Multimenge es:Multiconjunto nl:Multiset pl:Multizbiór sl:Večkratna množica uk:Мультимножина