I think that first we need to review what is the axiom of choice, and what can happen in its absence.
The axiom of choice asserts that if we have a collection $X$ of nonempty sets, then we can choose from each one. What does "choose" mean? It means that there is a function whose domain is $X$ and for every $a\in X$ we have that $f(a)\in a$.
Why is this a useful axiom? If we have a finite collection of nonempty sets and in a process of a proof we want to choose from every set in that collection, we can write:
There exists $a_1\in A_1$ and there exists $a_2\in A_2$ and so on ... such that $a_1$ has this property and $a_2$ has that property ...
This is effectively choosing an element from the sets, and inductively we have that we can choose from every finite collection of nonempty sets. However induction only covers the finite cases. What happens if the collection is infinite? We cannot write an infinitely long formula, so instead the axiom of choice comes in to save the day and we can say that there exists a choice function and use $f(A_i)$ as $a_i$.
Note that this is not limited to countably infinite collections, they can be larger and wilder than anything you can imagine.
Life without choice, is life in a model in which we can only say that some collections of nonempty sets do not have choice functions. There are still things which we can choose from.
How can we assure that we can choose from something? One way is if we can effectively write down a property that is satisfied by a unique element from every set.
For example, consider $X=\{A\subseteq\mathbb N\mid A\neq\varnothing\}$, this is a family of nonempty sets. Do we need the axiom of choice to choose exactly one element from every $A\in X$? No. We have the order $<$ on the natural numbers which has the property that every nonempty set has a $<$-minimal element.
We can therefore use this $<$ as a helper to select from every element of $X$. And indeed the function $f(A)=\min_< A$ is well-defined (every $A$ is nonempty thus has a minimal element, and the order is linear so this element is unique) and we have defined it without the magical machine which is the axiom of choice.
Russell's Strange Drawer, is the analogy Russell used to explain why the axiom of choice is needed in some cases. What are those cases? For example, if you do not have this sort of structure which allows you to choose from the sets.
Russell used the fact that shoes have predefined properties of being "left shoe" and "right shoe", while socks (often) lack that property and you only know which is the left sock after you wore the sock on your left foot.
This is translated into the idea that if you have finitely many pairs of socks then by induction you can always choose one from each pair (try that for yourself, put the socks in pairs and since you have a finite number of pairs of socks you can go and pick one from each pair). With shoes you can also do that "manually" and choose one shoe from each pair, however one can simply claim:
I choose the left shoe from each pair! There is exactly one left shoe in every pair, therefore this is a unique choice!
Note that this did not require that there are any limitations on how many shoes you have. Even if you had Imelda Marcos's collection of 2700 (read: infinitely many) pairs of shoes you can still say that you choose the left one from each pair.
With the socks, however, you cannot make such a choice. As we remarked before about choosing from a pair of socks: socks are indistinguishable in the sense that there is no left sock or right sock. Those are a posteriori properties.
Russell's analogy was given to say that there are models without the axiom of choice in which you have a collection of infinitely many pairs, but you cannot choose exactly one element from each pair.
Russell gave the analogy in 1907, several years later Fraenkel defined what is now known as Fraenkel's second model in which such a collection of pairs was formalized into a proper mathematical object.
Fraenkel's model is a model which is constructed to have a collection of the form $\{P_i\mid i\in\mathbb N\}$ such that $P_i\cap P_j=\varnothing$ for $i\neq j$ and $|P_i|=2$ for all $i$, and there is no function $f(i)\in P_i$ for all $i$.
By inspecting every pair individually it is possible for us to distinguish between its two elements, and so by inspecting finitely many pairs we can still choose one from every pair. However there is no uniform way to distinguish between the elements of each pair at once. There is no unifying property like "right shoe", or "minimal element" which we can use here.
There is a lot to say on the definition and construction of this model, on the fact that we can perhaps distinguish between the elements in a larger universe, but not in the smaller one (external vs. internal definition). However this is going beyond the scope of a post on math.SE.
The question in the comments, namely if we can only distinguish $n$ pairs at a time, is simply due to the fact that if $f$ chooses from $P_0,\ldots,P_n$ then its domain is only $\{0,\ldots,n\}$ (or $\{P_0,\ldots,P_n\}$) and since $n+1$ is not in the domain of $f$ it cannot choose from $P_{n+1}$.
Alas it is not the sets which distinguish between the elements of each pair. It is us. To distinguish means to assert that they are different and to choose one and not the other. We can do that since the set is of size $2$ so we can fix a bijection with $\{1,2\}$ and say which is the first and which is the second element. However due to the lack of further structure on the pairs (and on the collection of all "socks", $\bigcup_n P_n$) we can only tell to which pair a sock belongs to and not which sock is which (in that pair).
It is truly baffling and mind boggling. Which is why the implications of the axiom of choice (and possible worlds in its absence) cannot really be explained intuitively, except when the intuition is mathematical in nature to begin with.
Further reading:
- How do I choose an element from a non-empty set?
- Axiom of Choice Examples
- Finite choice without AC
- axiom of choice: cardinality of general disjoint union
- What is the set-theoretic definition of a function? (To complement Arturo's comment on the main question)