I am not aware of any method that would let one make good use of a black box or API computing $f(a,e,n)=a^e\bmod n$ for $n$ of up to $2112$ bits, to efficiently compute $f(a,e,n)=a^e\bmod n$ with $n$ above that bound (like $4096$ bits), unless that bigger $n$ has known factorization into terms of at most $2112$ bits (in which case the usual CRT technique applies and significantly helps).
That issue is encountered when one wants to compute the RSA public key function for 4096-bit key on top of software (or API to hardware) limited to $2048$-bits-and-then some.
Especially if $e$ is small (like $65537$, $17$, $3$, or $2$), it is sometime possible to do a fast-enough software-only implementation in assembly language (which typically beats C by a decimal order of magnitude, and interpreted bytecode much more so). And for the purpose of signature verification, this is unquestionably safe.
But even if $e$ is small, if the context is a JavaCard Smart Card without any way to evade the JavaCard Virtual Machine, I'm afraid there is no practical solution, unless execution time is not an issue.
0x010001
) in an unsigned representation, but it is normally padded to 24 bits to fit into an N number of bytes. – Maarten Bodewes Oct 02 '13 at 15:23