Reading the Wikipedia article on the topic, when you have a defective matrix that lacks a full rank eigenspace, you can form 'generalised eigenvectors' by looking at the equation $(A - \lambda I)v = v_\lambda$, where $v_\lambda$ denotes a true eigenvector and $v$ a generalised one (corresponding to the same eigenvalue).
Opening up the expression, we're looking for a vector that maps to a multiple of itself plus the true eigenvector, ie. $Av = \lambda v + v_\lambda$, which makes some sense when considering an action like a shear transformation, but it's a little hazy.
My questions are: why is this a good definition for these generalised vectors, and why exactly does it guarantee linear independence? If we have, say, n different eigenvalues with certain algebraic multiplicities, what guarantees that their respective generalised vectors do not overlap? Is it something to do with the way the nullspace maps these things?