There is a very visually appealing picture that goes along with this, but not being able to draw it on here I shall just describe it to you.
For each point $x\in X$ we know we can find some open set $U\subseteq X$ containing $x$ such that there exists an embedding $\phi:U\hookrightarrow\mathbb{R}^m$ which is homeomorphic to its image $U'$--an open subset of $\mathbb{R}^m$. Think visually that $U'$ sits above $X$ casting a shadow down onto $U$. Let's further cement this intuition by writing $\phi$ in the form
$$\begin{array}aU'& &\\ & _\phi\searrow & \\ & & U\subseteq X\end{array}$$
As you mentioned there is definitive intuition as to what this means. This means that $\phi$ allows us to give a sort of coordinate system for $U$ allowing us to geometrically think about it looking like $U'$.
The problem with the above is that this "$U$", or more precisely, what it represents, is not unique. We may equally well find some other open set $V\subseteq X$ containing $x$ and a homeomorphism $\psi:V\to V'\subseteq\mathbb{R}^m$ with $V'$ open such that we get a very similar picture
$$\begin{array} & & V'\\ & \swarrow^\psi & \\ X\supseteq V & & \end{array}$$
Once again, we can think of $V$, being just the shadow of $V'$, as allowing us to think locally around $x$ as just a normal subset of $\mathbb{R}^m$.
We still have not figured out exactly why we need this strange condition you mention. Well, the idea is simple. Imagine that we are doing math near $x$, and to make our lives simpler we would like to define coordinates locally around $x$ so that we can just pretend we are doing math in $\mathbb{R}^m$. We realize then that we are at a crossroads--which coordinate definition do we pick? Namely, we have a diagram of the form
$$\begin{array}\;\phi^{-1}(U\cap V) & & & \psi^{-1}(U\cap V)\\ & _\phi \searrow & \swarrow\psi &\\\ & \;\;\;\;\;\;U\cap V & & \end{array}$$
which represents the space where the shadows created by $U'$ and $V'$ intersect and the parts of $U'$ and $V'$ which are creating these shadows. So, which do we pick? Doing mathematics we'd like for us to not really care which we pick--I mean, it's all just arbitrary coordinaization. But, what exactly should it mean "it doesn't matter". Of course, it doesn't mean that they are literally the same, but what it should mean is that any information made by choosing one coordinazation over the other is both true, and translatable to the case where we make the opposite coordinaization. Of course, the translation means that we have a map $\phi^{-1}(U\cap V)\to \psi^{-1}(U\cap V)$ that corresponds to the coordinization these spaces represent. The diagram gives us precisely how to make this map, namely by going down to $U\cap V$ via $\phi$ and then back up to $\psi^{-1}(U\cap V)$ by $\psi^{-1}$ and thus we get our map $\psi^{-1}\circ\phi:\phi^{-1}(U\cap V)\to\psi^{-1}(U\cap V)$ which we can roughly think about as the dictionary between the two coordinazations of $X$ locally around $x$. But, we don't want these to be any old maps. Indeed, if we are doing things like calculus, to transfer ideas through this dictionary it shouldn't just be a dictionary with set-words in it, but a dictionary with calculus-words in it. In other words, all the calculus statements we can make in $\phi^{-1}(U\cap V)$ should be able to be translated via this dictionary to those in $\psi^{-1}(U\cap V)$. Of course, a set-map isn't going to give us this--it could completely mess up notions of differentiability, etc. Thus, we want maps that won't change any words involving calculus, we want maps that respect it. Of course, these are $C^k$ maps (where the $k$ is to your taste in calculus, mine is $k=\infty$).