Item 2 has been answered satisfactorily, so this will focus on point 1: the s-box.
The size of the s-box is not a 16x16 array unless it is viewed as such. The s-box is actually an 8-bit non linear transformation of the input, and is only viewed as a 16x16 array if you arrange it as a table of such dimensions. This array would then be a 1 to 1 representation of all 8-bit inputs and outputs of the transformation, with the axes being the 4-bit halves of the input. It can just as easily be viewed as a 1x256 array.
Not all AES implementations use a table lookup for the s-box, some actually perform the calculation from scratch in hardware for security purposes, and it can be pipelined 16-wide for performance. Memory constrained 8-bit platforms also may perform the entire calculation (slowly), since storing a 256 byte table in memory is too expensive.
See How are the AES S-Boxes calculated? for details of the transformation.
The 4x4 array of input bytes is transformed 1 byte at a time to give the 4x4 output array. The s-box is also used to transform single bytes during key scheduling.