Intuitively, a kernelization algorithm is an algorithm which in polynomial time preprocesses a given instance and outputs an instance whose size is bounded in the parameter. The goal of kernelization is (at least) two-fold. We get provable performance guarantees, i.e., we can prove upper bounds on the output instance, which has applications both in the design of algorithms, and also as a complexity measure.

More formally a kernelization algorithm (often referred to as the kernel), is an algorithm for a problem which on an input $(G,k)$ outputs an equivalent
instance $(G',k')$ with $\max\{|G'|,k'\} \leq f(k)$ for some function $f$.
Furthermore, the algorithm needs to run in polynomial time.
The following result shows that the power of kernels is, so to speak, equivalent to the power of fixed parameter tractability (PDF)
Theorem (Folklore). A problem is fixed parameter tractable if and only if it admits a kernel and is decidable.
Although the notion of kernel coincides with fixed parameter tractability, there is a stronger version of kernelization where we demand the function $f$ above to be a polynomial.
If you want to see the original definitions, I advice you to pick up Downey and Fellows' book on parameterized complexity, or start from Niedermeier's Habilitation thesis mentioned above. There is also a Wikipedia article on Kernelization.