Is there a fast algorithm for mapping $n$-bit numbers $s$ (for fixed $n$) into a cyclic subgroup of an elliptic curve (over a finite field) in which the Discrete Logarithm Problem is hard?
By fast, I mean it takes about as long as a point addition (and is much faster than a point scalar multiplication in the elliptic curve group, or an exponentiation operation in the finite field).
The mapping: $s \rightarrow s \cdot G$, for some generator $G$, would be too slow, and also would defeat the security of construction since the discrete logarithm with respect to $G$ would be known.
It is not necessary for the running time to be independent of the input. For instance, for elliptic curves defined by
$y^2 = x^3 + ax + b$
over the finite field $F_p$, there is the standard method of setting $x = s \bmod p$, and then computing the square root of $x^3 + ax + b$. If $p = 3 \bmod 4$, then we can compute the square root using exponentiation in the finite field. If the square root does not exist, we increment $x$ and try again. In this way we can obtain a point in the elliptic curve group, and if our cyclic subgroup has cofactor 1, then we are done. This method would be valid except that computing the square root via exponentiation takes significantly more time than a point addition.
Is there some method/choice of elliptic curve that allows a much more efficient mapping to be defined?
The speed I'm looking to beat (in order for this construction to be faster than an alternative that does not use elliptic curves) is around 100000 operations (hashes to curve points) per second on a standard single CPU core (e.g. 3.2GHz Nehalem) with a 256-bit elliptic curve. Testing with libsecp256k1, which seems to include an optimized square root implementation, I can compute only about 77000 square roots per second.