Password hashing or more generally key stretching takes as input low-entropy key and public salt, process that thru a public, purposely slow, pseudo-random function, yielding a hash/derived key that is ultimately stored for comparison or used as key. The goal is to make it hard for adversaries to enumerate the likely/possible values of the input, apply the public function, and test if the output behaves as the actually used one.
The ideal "numbers/data to prove how secure a password hashing is" would be: how much money is an adversary expected to spend to compute the outputs for some large number of inputs.
A first approach is the computational cost of the function for the legitimate user, measured e.g. in millisecond of CPU time. In practice that's controlled by an iteration count. Increasing iteration count increases the computational cost of the function, for both legitimate user and attackers. When we compare two instances differing only by iteration count, the higher the iteration count, the better the protection. And past a certain threshold, that's proportionally, because the cost for attackers is dominated by this iterated operation. The iteration count is thus set to as high as possible in a given context; e.g. without causing a significant delay for end users, or investment/electricity/VM metering cost for a server operator. It's then possible to measure the computational cost for legitimate user, in e.g. millisecond of CPU time.
This computational cost for legitimate user metric is useful, if only because said cost limits how high legitimate users can set the iteration count. However hat gives no useful idea about cost for an attacker, thus no useful idea about the protection offered; an thus does not allow meaningful comparison between different password hashing functions.
To illustrate how dramatic the difference can be between cost for legitimate user and cost for attacker, I'll take a hashing function in common use: PBKDF2-HMAC-SHA-256. It has a cost parameter $c$, which controls how many times it's iterated $U_{j+1}:=\operatorname{HMAC-SHA-256}(\mathrm{Password},U_j)$. One such iteration requires two rounds of SHA-256. $c$ is typically $10^3$ to $10^7$. On the PC I'm using right now, $10^5$ iterations uses an energy of like¹ $5\,J$. But common bitcoin mining hardware is advertised for $3.8\cdot10^{-6}J$ (soon $2.1\cdot10^{-6}J$) for the same number of SHA-256. Thus an hypothetical adversary using state of the art ASIC would hash one or two million passwords for the same energy cost as one for a legitimate user. For \$100 of electricity at 10¢ per kW⋅h, and PBKDF2-HMAC-SHA-256 at $c=10^5$, they would test $10^{15}$ passwords. That is
- every combination of 8 characters among 75: letters upper and lowercase, digits, and 13 special characters.
- an average of 100 passwords generated per the XKCD password strategy, which is much better than what most passwords are.
Even if state-level adversaries are likely far from that efficiency (because they invest in repurposable hardware like FPGA), it's safe to say they can crack most passwords people remember given it's PBKDF2-anything and salt, with $c=10^5$.
Arguably, the most important metric thus is: relative efficiency of legitimate user compared to what state of the art can achieve. I propose to take the base-2 log of that quantity, in absolute value. We want that as close to 0 as possible. We've seen that for PBKDF-HMAC-SHA-256 as on my PC, we are at about 20. Importantly, that depends a lot on the optimization of the implementation legitimate users use.
The best general technique to lower that quantity is: make sure computing the function requires a lot of RAM, and accesses to that, and can't be optimized. That is, a memory-hard iterated hasing function. That's the strategy pioneered by scrypt, and used by it's modern successor Argon2.
(to be continued, hopefully).
¹ I'm timing PBKDF2-HMAC-SHA-256 as bundled in python 3.10.5 with
import timeit;print(timeit.timeit('import hashlib;hashlib.pbkdf2_hmac("sha256", b"tst", b"abc", 100000, dklen=16)',number=100))
yielding 5.1s and make the power consumption 100W.