Take $a=2$, $p=4$. We can make $16$ necklaces, of which we throw away $2$. The remaining ones cannot be grouped into sets of $4$, however, because although there are four necklaces each for some designs:
$$BWWW=WBWW=WWBW=WWWB \\
BBWW=WBBW=WWBB=BWWB \\
BBBW=WBBB=BWBB=BBWB,$$
there are only two necklaces representing one of the designs $$BWBW=WBWB.$$
Cases such as this last one can only occur for composite $p$.
After looking at such examples, let's think back to what's going on in the proof. We want to show that the number of necklaces ($a^p-a$) is a multiple of $p$. If we can divide a set of objects evenly into groups of size $p$, then we have proved that the size of the set is a multiple of $p$. In cases where such a division does not occur, we have not proved anything.
In these cases, we have divided $2^3-2$ and $2^5-2$ evenly into groups of size $3$ and $5$, respectively. However, our method of grouping did not result in uniform groups of size $4$ or $6$ when we started with $2^4-2$ or $2^6-2$ necklaces.
The reason the division into groups worked out nicely for $3$ and $5$ was precisely because $3$ and $5$ are prime. The reason it didn't work for $4$ and $6$ can be seen by looking at the cases where it failed: The failure only occurred by exploiting the fact that $4$ and $6$ have factors. If you write out every cyclic permutation of $BWBW$, you get
$$B_1W_1B_2W_2, W_2B_1W_1B_2, B_2W_2B_1W_1, W_1B_2W_2B_1,$$ but two of those are identical, so there are really only $2$. That only happens because $4=2\times 2$, so we can have repetition between the first two beads and the second two.