What is a differential?
Let us consider a function in one variable to begin, say $f(x) = x^{2}$. The derivative, usually written $f'(x)$ or $\mathrm{d}f/\mathrm{d}x$, is the function that describe the gradient of the function $f$. To clarify, we say that the gradient of $f$ at the point $x$ is the slope of the tangent line to $f$ at the point $\big(x, f(x)\big)$. For example, the derivative of $f$ is
$$
f'(x) = \frac{\mathrm{d}f}{\mathrm{d}x} = 2x.
$$
This tell us that at the point $x = 3$, the gradient of the straight line tangent to $f$ at the point $(3, 9)$ is $6$, and so on. Notice that an additive constant doesn't change the gradient of the tangent lines, it only changes how high or low the graph is and so it makes sense that it would not appear in the derivative.
This makes sense, since $\mathrm{d}f$ denotes a small increment in $f$ and $\mathrm{d}x$ denotes a small increment in $x$, so that $\mathrm{d}f/\mathrm{d}x$ can be understand as the ratio of how much $f$ increases/decreases with a little increase in $x$. The higher the number, the more steeply $f$ changes with respect to $x$, so that the graph itself will be steep. In this case, we have that $\mathrm{d}f/\mathrm{d}x = 2x$ so that $\mathrm{d}f = 2x \mathrm{d}x$. This is just the function that tells us how much the function $f$ changes as $x$ changes.
Why are they important?
This interpretion becomes particularly useful in optimisation problems: say that you have the (arbitrary) price function $P = 5 + 5x - x^{2}$ where $P$ is the profit made from some variable $x$. The variable $x$ could represent the number of units of product produced. An equation similar to this is natural in this setting --- by plotting the graph, we see that if $x$ is too small then the profit is reduced, and similarly if $x$ is too large then the profit is also reduced. In words: if too little product is produced, there is not enough supply to meet demand and profit is not maximised; similarly if too much product is produced, there is not enough demand for the product to make it profitable. It is clear that there is some value of $x$ that corresponds to making just the right amount of product that you meet demand without supplying too much, and by looking at the graph it is fairly easy to get a rough estimate for this value.
Differentials become useful for determining the value of $x$ without having to graph the function. In this example, the optimal value corresponds to the tip of the peak of the graph. The gradient of the tangent line at this point is clearly zero, so we find the $x$ value by solving
$$
\frac{\mathrm{d}P}{\mathrm{d}x} = 0.
$$
We see that
$$
\frac{\mathrm{d}P}{\mathrm{d}x} = 5 - 2x = 0
$$
gives the value $x = 2.5$. This means that when $x = 2.5$, the function $P$ is maximised.
Differentials are not another way of expressing the function
Having just the derivative of a function is usually not as helpful as having the function itself (since if you have a function, it's generally easy to compute its derivative), and this is why the theory of solutions to differential equations is important. In Physics, there are many situations in which only a differential equation is known, and the objective is to find the function that satisfies the differential equation. One cannot in general write a function in terms of its derivative --- consider $f(x) = x^{2}$ again, whose derivative is $f'(x) = 2x$ and whose differential is
$$
\mathrm{d}f = 2x \mathrm{d}x.
$$
This shows that the original function and its derivative are different, so that $f$ and $\mathrm{d}f$ are (generally) different equations and that $\mathrm{d}f$ is not just "another way of writing $f$".
Several Variables
These intuitions naturally extend to functions of several variables. If $f(x, y) = x^{2} + xy + y^{2}$, then
$$
\mathrm{d}f = (2x + y)\mathrm{d}x + (x + 2y)\mathrm{d}y.
$$
As $f$ is a function of two variables, there are two directions that one can increment and the above function describes how the surface of $f$ changes as one increases $x$ and/or $y$. This is helpful because it is sometimes not obvious how the shape of $f$ varies with respect to each coordinate when one only has the original function.