I can't really understand how the recursive formula for derangements is derived, i.e I can’t see how the patterns of it occur.
From Wikipedia:
Suppose that there are $n$ persons who are numbered $1, 2, \ldots, n.$ Let there be $n $hats also numbered $1, 2, \ldots, n$. We have to find the number of ways in which no one gets the hat having same number as his/her number. Let us assume that the first person takes hat $ i$. There are $n − 1$ ways for the first person to make such a choice. There are now two possibilities, depending on whether or not person $i$ takes hat $1$ in return:
Person $i$ does not take the hat $1$. This case is equivalent to solving the problem with $n − 1$ persons and $n − 1$ hats: each of the remaining $n − 1$ people has precisely 1 forbidden choice from among the remaining $n − 1$ hats (i's forbidden choice is hat 1). Person $i$ takes the hat $1$. Now the problem reduces to $n − 2$ persons and $n − 2$ hats. From this, the following relation is derived:
$!n = (n - 1) (!(n-1) + !(n-2)). $
In this particular example I do not understand why we take care of the case when person $i$ takes hat $1$ or not. When I think about derangements I see that the first person has $n-1$ choices, so the rest of the people, but, and here's another conceptual problem, how do we know that if we continue to count like this the last person, as it's only and one choice, won’t have a hat of the same number? Also, how would one calculate derangements with the formula above? I am a visual learner, so if you can provide some kind of mental images to help the understanding of the concept I would appreciate more.
(I've checked if this question was a duplicate but i didn't find other questions asking for some kind of intuitive argument for derangements.)