Your method is on the right track. Yes, having the prime factors of $N-1$, if $N$ is prime, does allow one to compute the order of $a$ much more quickly than BSGS. Specifically it takes $O(\log^3 N)$ time (whereas BSGS takes $O(\sqrt{N})$):
Assume that $N-1=\prod_{i} p_i^{b_i}$ is the factorization of $N-1$ into distinct prime powers. Then note that $\operatorname{ord}_N(a)|N-1$, so we need to figure out which $p_i$ (and to what power) divide $N-1$. Luckily for us:
$a^{\frac{N-1}{p_i}}\not\equiv 1\mod N\Leftrightarrow p_i^{b_i}|\operatorname{ord}_N(a)$.
Note that it is important for this biconditionality for us to use the exponent $\frac{N-1}{p_i}$, not just $p_i$. If $a^{\frac{N-1}{p_i}}\equiv 1\mod N$, then (inclusively) until $n=b_i$ we can successively calculate:
$a^{\frac{N-1}{p_i^n}}\not\equiv 1\mod N, \text{ and } a^{\frac{N-1}{p_i^{n-1}}}\equiv 1\mod N \Leftrightarrow p_i^{b_i-n+1}|\operatorname{ord}_N(a)$.
Since there are at most $O(\log N)$ (number of prime factors of $N-1$) checks, and each check is a modular (and binary) exponentiation in $\mathbb{Z}/N\mathbb{Z}$ ($O(\log^2 N)$ time), the total complexity is $O(\log^3 N)$.
Warning: Actually factoring $N-1$ however is time complexity $O(e^{(1+o(1))\sqrt{\ln(N)\ln\ln(N)}})$, which is still better than $O(\sqrt{N})$, but much worse than $O(\log ^3 N)$ making $O(e^{(1+o(1))\sqrt{\ln(N)\ln\ln(N)}})$ be the total runtime. However, if you already know the factorization of $N-1$, then it runs in $O(\log^3 N)$ time.
Edit: to directly address your other question, we are taking advantage of $N$ being prime precisely via something you already mentioned: we know the group order is $N-1$, which serves as the starting point for finding the order of $a$, as seen above. If we already know the factorization of $N-1$, we don't have to waste time factoring $N$ to find the group order.
Edit 2 for fun: Shor's quantum period-finding algorithm can find the order of any element (regardless of known factorization of $N-1$) in $O(\log^3 N)$ steps on a quantum computer