47

Whenever I go through the big pile of socks that just went through the laundry, and have to find the matching pairs, I usually do this like I am a simple automaton:

I randomly pick a sock, and see if it matches any of the single socks I picked out earlier and that haven't found a match yet. If there is a match, I will fold the two socks together and put them in the 'done' pile, otherwise I will add the single sock to the 'no match yet' pile of single socks, and pick out another random sock.

So, as I was doing this last night, I started thinking about this, and figured that the following would be true: The 'no match yet' pile can be expected to slowly grow, up to some point somewhere in the 'middle' of the process, after which the pile will gradually shrink, and eventually go down back to $0$. In fact, my intuition is that the expected number of loose socks as a function of the number of socks picked so far, is a symmetric function, with the maximum being when I have picked half of the socks.

So, my questions are:

With $n$ pairs of socks, what is the expected number of loose socks that are in my 'no match yet' pile after having picked $k$ socks?

Is it true that this function is a symmetric function, and that the maximum is for $k=n$? (if so, I figure there must be a conceptual way of looking at the problem that makes this immediately clear, without using any formulas ... what is that way? Is it just that I can think of reversing the process?)

Of course, this is all assuming there are $n$ pairs of socks total, and that there are no single socks in the original pile, and while this is something that never seems to apply to the pile of socks coming through my actual laundry, let's assume for the sake of mathematical simplicity that there really just are $n$ pairs of socks.

Bram28
  • 100,612
  • 6
  • 70
  • 118
  • 1
  • 2
    Zero for me, I use sock locks – Barmar Jul 06 '17 at 22:35
  • 3
    If the number of socks is large, I arrange the unpaired socks in buckets (based on color and/or pattern type) as I go. Also, don't miss the sibling question in SO: https://stackoverflow.com/questions/14415881/how-to-pair-socks-from-a-pile-efficiently – Klas Lindbäck Jul 07 '17 at 11:20
  • Do you have more than one pair of the same kind of socks? :) In that case, there will be a higher probability that you find a match each time you pick a new sock and the expected number of socks in the 'no match yet' pile will be lower that if there is only one pair of each kind of socks. – HelloGoodbye Jul 07 '17 at 15:33
  • @HelloGoodbye Yeah, good question! .. I was just thinking about that yesterday and figured I should add something to my post. My intention was to keep it simple and have only unique pairs, but it might be an interesting follow-up question! – Bram28 Jul 07 '17 at 15:43
  • In the more generic case, the function is still symmetric. Se my answer, and instead reinterpret $f$ as "the expected number subsets of identical socks for which an odd number of the subset end up in each pile." – HelloGoodbye Jul 07 '17 at 16:30
  • 2
    It is a well known fact that socks are not subject to the conventional rules of mathematics :) – Chris Johns Jul 07 '17 at 17:51
  • @ChrisJohns Well, I'll put the formula to the test next time I'll do laundry, but yes, somehow I always run out of room for my 'no match yet' pile, and in the end, it is never back to 0 :) – Bram28 Jul 07 '17 at 17:54
  • Great question. I think it is time for some experiments to see how things go. Excel would be a fine platform, if you know the rudiments of visual basic. In the case where k=n, every subsequent draw would have to match something. Most likely you would never reach that point. In fact, the probability of getting to k=n would be something like n!2^n/(2n)! The probabilies for n= 1, 2, 3, are 1, 1/3, 1/15. So with more than 3 pairs, it is unlikely you will ever reach k=n. – richard1941 Jul 11 '17 at 23:41
  • @richard1941 In my experience I have indeed never gotten to where the number of unmatched socks equals the number of pairs of socks, and yet somehow the number of unmatched socks always seems to be larger than what it mathematically should be! :) – Bram28 Jul 11 '17 at 23:54

3 Answers3

31

The expected number can be computed via Linearity of Expectation. Let $E[n,k]$ denote the answer and let $\{X_i\}_{i=1}^n$ denote the indicator variable for the $i^{th}$ pair. Thus $X_i=1$ if exactly one member of the $i^{th}$ pair has been chosen in your $k$ trials, and $X_i=0$ otherwise. It is easy to see that $$E[X_i]=2\times \frac k{2n}\times \left(1-\frac {k-1}{2n-1}\right)$$ from which it follows that $$E[n,k]=E\left[\sum X_i\right] =\sum E[X_i]= k\times \left(1-\frac {k-1}{2n-1}\right)$$

Sanity check: $k=1\implies E[n,1]=1$ as it should. Also $k=2n\implies E[n,2n]=0$ as it should.

Remark: it is easily seen that this function is maximized with $k=n$, confirming your intuition. Also the expression can be written as $$E[n,k]=\frac {k(2n-k)}{2n-1}$$ which is symmetric under the exchange of $k,2n - k$ also in line with your expectations.

Remark: more strongly, it is clear that at any time the number of unmatched socks in one pile is the same as the number in the other pile (indeed it's exactly the same pairs of socks which are split between the piles). That provides clear justification for the symmetry.

lulu
  • 70,402
  • 1
    But here's the real test: if n is the number of pairs, and k is how many socks you've picked, the expectation for the number of unmatched socks as k->2n must be non-zero in order to fit experimental data. Any model which presumes that all socks are matched by the time k=2n is clearly not an accurate model of reality! – Cort Ammon Jul 07 '17 at 19:36
  • @CortAmmon true, with the standing counterexample for color blind sorters or other individuals who have enjoy a non-standard notion of "match". – lulu Jul 07 '17 at 19:41
5

We can verify the accepted answer using the methodology from this MSE link where we see that the problem is very similar to a coupon collector without replacement and two instances of $n$ types of coupons. Suppose we have $j$ instances. Start by asking about the probability of getting the following distribution of coupons:

$$\prod_{q=1}^n C_q^{\alpha_a}$$

where $\alpha_q$ says we have that many instances of type $q$ and is at most $j.$ We get from first principles the probability

$$\frac{(nj-\sum_{q=1}^n \alpha_q)!}{(nj)!} \prod_{q=1}^n \frac{j!}{(j-\alpha_q)!}.$$

Now when we multiply a probability by the total number of events we get the favorable events. Therefore the EGF for a given coupon type is

$$\sum_{k=0}^j \frac{j!}{(j-k)!} \frac{z^k}{k!} = \sum_{k=0}^j {j\choose k} z^k = (1+z)^j.$$

With $j=2$ and $n$ types of coupons we get

$$m! [z^m] (1+z)^{2n}$$

and asking for the total count after $m$ coupons have been drawn yields

$$m! \times {2n\choose m}.$$

Placing a marker on the singletons we find

$$m! [z^m] \left.\frac{\partial}{\partial u} (1+2uz+z^2)^n\right|_{u=1} \\ = m! [z^m] \; \left. n \times (1+2uz+z^2)^{n-1} \times 2z \right|_{u=1} \\ = m! [z^m ] 2nz (1+z)^{2n-2} \\ = m! \times 2n {2n-2\choose m-1}.$$

Divide to get the expectation

$$ {2n\choose m}^{-1} 2n {2n-2\choose m-1} = 2n \frac{m! \times (2n-m)!}{(2n)!} \frac{(2n-2)!}{(m-1)! \times (2n-m-1)!} \\ = 2n \times m \times (2n-m) \frac{1}{(2n)(2n-1)} \\ = \frac{m\times (2n-m)}{2n-1}.$$

Marko Riedel
  • 61,317
4

If you let your function that gives the expected number of socks in your 'no match yet' pile take $k$ and $2n-k$ as arguments, i.e.

$$f\,=\,f(k,\,2n-k),$$

it will be symmetric. The 'no match yet' pile is just the subset of all $k$ socks that have been picked that have their matching sock among all $2n-k$ socks that have not been picked yet.

Therefore, we can make the following reinterpretation of $f$:

Out of $n$ pairs of socks, $f(a,\,b)$, where $a+b=2n$, is the expected number of pairs of socks for which the two socks in the pair end up in different piles when all $2n$ socks are randomly divided into two piles of sizes $a$ and $b$, respectively.

Since the piles don't have any order, the order of the arguments to $f$, which are just the sizes of the piles, doesn't matter. Hence, $f$ is symmetric.

HelloGoodbye
  • 571
  • 2
  • 12