There are plenty of ways to define the proximity operator, but loosely speaking:
The proximity operator is a tool used in modern optimization which allows the user to algorithmically minimize a function.
One nice thing about proximity operators is that for convex, nonsmooth functions, prox-operators are still perfectly well-defined. This means that prox-based algorithms (e.g. proximal-point, Douglas-Rachford, forward-backward) can be used to solve problems which gradient-based algorithms (e.g. gradient descent) cannot.
So, for your problem, the gradient may not be defined, due to the nonsmooth $\ell_1$ penalty. This means we will rely on a prox operator to come to the rescue.
However, another hurdle with this problem is that no one knows how to compute the prox-operator of this objective function in general.
This is where the forward-backward algorithm comes in -- instead of requiring the gradient of the entire problem (which will rarely exist) or the proximity operator of the entire objective function (which is rarely computable), we can minimize this sum of functions using only the gradient of the square-term and the prox of the $\ell_1$ penalty. This is an example of "splitting" -- minimizing an objective function using only operators (e.g. gradient or prox) which concern individual summands in the objective.
The proximity operator also captures the projection operator as a special case: If $C$ is a nonempty closed convex set, and $\iota_{C}$ is its $0$-$\infty$ indicator function, then the proximity operator $\iota_{C}$ turns out to be the projection operator onto $C$, i.e. $$\textrm{prox}_{\iota_C}=\textrm{proj}_C.$$