Certainly one could iterate a sum from $i = 0$ to $n_1$ and from $j=0$ to $n_2.$ The question is, what is the purpose of the sum?
The indexing scheme in the formula is apparently intended to extract two rectangular regions out of two much larger images.
The rectangular regions are parallel to the coordinate axes,
but each rectangle has a width, a height, and a position that need to be specified.
For reasons that may be explained in the parts of the book that have been omitted from the question, the author wanted to specify the position of each rectangle by the coordinates of a pixel in the exact center of the rectangle.
Perhaps this is due to considerations of symmetry, or perhaps it is because we are really supposed to be interested in what is happening at a specific pixel in each image, and the other pixels are considered only as "neighbors" of the interesting pixel.
Whatever the motivation is, we get a rectangle by selecting a certain pixel to be in the center of the rectangle, and then extending the rectangle $n_1$ pixels to the left and to the right and $n_2$ pixels upward and downward.
The coordinates of the center of the rectangle inside the $f$ image are
$(x,y),$ so the rectangle runs left and right from $x - n_1$ to $x + n_1$
and vertically from $y - n_2$ to $y + n_2.$
The coordinates of the center of the rectangle in side the $g$ image are
not necessarily the same as those of the $f$ rectangle.
Instead of $(x,y),$ they are $(x-d_1,y-d_2),$
so the rectangle runs left and right from $(x - d_1) - n_1$ to
$(x - d_1) + n_1$ and vertically from $(y - d_2) - n_2$ to $(y - d_2) + n_2.$
If you take all the integers from $x - n_1$ to $x + n_1,$
there are $2n_1 + 1$ of them,
so the $f$ rectangle is $2n_1 + 1$ pixels wide.
For similar reasons, it is $2n_1 + 1$ pixels high,
and the $g$ rectangle is the same size.