You probably want to look into the field of succinct rank/select data structures. Assuming that the set is static or semi-static, there is a good collection of data structures in Okanohara and Sadakane's paper, Practical Entropy-Compressed Rank/Select Dictionary.
We will suppose you want to represent a subset $S \subset \{0\ldots n-1\}$, where $\left| S \right| = m$. There are ${n \choose m}$ such subsets, so we need at least $\log_2 {n \choose m}$ bits to represent any subset. What you want is a data structure which uses about that much space, and which supports a membership test in close to constant time.
First, a quick note on measuring time vs space requirements. In the case of time, we customarily measure in big-oh (e.g. $O(1)$) because what we're really interested in, the time measured on a clock, depends on the language, compiler, hardware, etc.
In the case of space requirements, however, we can measure the space usage in bits, because we customarily measure space in bits, not in physical units (e.g. cubic metres of RAM).
If information theory tells us that we need at least $f(n)$ bits to store some data structure. Then:
- If we have a data structure which uses $f(n) + O(1)$ bits, we call it implicit.
- If we have a data structure which uses $f(n) + O(f(n)) = f(n) (1 + O(1))$ bits, we call it compact. Intuitively, the $O(1)$ means "constant relative overhead".
- If we have a data structure which uses $f(n) + o(f(n)) = f(n) (1 + o(1))$ bits, we call it succinct. Intuitively this means that as the data structure gets bigger, the relative overhead eventually becomes negligible.
Additionally, we will use the word RAM model and assume that the machine word size is $\Theta(\log n)$, that is, that an "integer" can be stored in a constant number of words.
If the subset is "dense", that is if $m \approx \frac{n}{2}$, then $\log_2 {n \choose m} \approx n$, so in that case you can't do better than a bit vector.
Things get more interesting when the set is sparse or almost full. If $m > \frac{n}{2}$, you can always store the complement of the subset instead. So we can just consider the case of a sparse subset where $m < \frac{n}{2}$.
The esp variant from that paper uses $n H_0(S) + o(n)$ bits of storage, and supports both rank and select queries in $O(1)$ time assuming the word-RAM model. Since a membership test can be trivially implemented with two rank queries, this supports membership tests in $O(1)$ time, and the space requirement is essentially a zero-order entropy compressed representation of the subset plus some overhead. This is pretty good, but if $m \ll n$, the $o(n)$ overhead term dominates the space requirement.
Because of this, the sdarray variant is more practical for sparse sets. The space requirement for sdarray is $m \log_2 \frac{n}{m} + 2m + o(m)$ bits, although if you only need membership tests, this can be reduced to $m \log_2 \frac{n}{m} + m + o(m)$ bits. Note if $m\ll n$, $\log_2 {n \choose m} \approx m \log_2 \frac{n}{m}$ and so this data structure is succinct. A membership test query takes $O(\log \frac{n}{m}) + O(\log^4 m/\log n)$ time in the worst case, but is typically constant in practice.