"As far as I understand there aren't many rigorous results on performance of these algorithms, similar to many classic machine learning approaches."
You are correct in that, unlike Grover's algorithm where we can prove that a search that would cost $\mathcal{O}(N)$ on a classical computer can be done with only $\mathcal{O}(\sqrt{N})$ on a quantum computer, "variational quantum
algorithms" tend to be heuristics with far less for us to say in terms of theorems about their exact cost.
Now consider as an example, the "VQE" algorithm for estimating the energy of a quantum mechanical system (for example a molecule). This is essentially the Ritz method from at least 112 years ago: vary the parameters of the wavefunction in pursuit of lower and lower values of the function:
$$
\tag{1}
\frac{\langle \psi | H| \psi \rangle }{\langle \psi|\psi \rangle},
$$
since Eq. 1 is provably an upper bound on the true energy of the system. This is already the way people have been estimating energies of quantum mechanical systems (for example if answering the question of whether glucose has more energy than fructose) on classical computers for many decades, and the VQE provably improves the efficiency of the overall procedure from a computational complexity perspective, because the cost of calculating Eq. 1 on a classical computer grows exponentially with the dimension of $H$, but on a quantum computer the cost would only grow polynomially.
Basically, VQE can exponentially speed up the most expensive part of the Ritz method from 112+ years ago, which for many problems is already the state-of-the-art method for classical computers.
Unfortunately the way VQE is usually described, the evaluation of Eq. 1 is done on a quantum computer (with provable exponential speed-up over classical computers), but then the energy and the parameters of the wavefunction are fed into a classical optimizer which tries to give improved parameters which will lead to a lower (and therefore better) estimation of the energy. Because the Hamiltonians are usually very sparse, the cost of calculating Eq. 1 on a classical computer is usually not actually exponentially scaling unless we have an exponentially scaling number of parameters in the wavefunction, which would mean that the cost of the "classical optimization" component of the VQE procedure would essentially make it worthless to do the whole procedure on a quantum computer (because an exponentially large number of parameters would be fed into the classical optimizer to get updated into improved parameters, but if a classical computer is capable of storing that many parameters, then the classical computer might as well also calculate Eq. 1, i.e. just do the whole VQE procedure classically like we've been doing since 112+ years ago!). Also, if you can calculate Eq. 1 with exponential speed-up on a quantum computer, you should be able to calculate the energy in a more direct way such as with quantum phase estimation or other methods in the Hamiltonian simulation category.
So then why did such a buzz around the term "VQE" arise since 2014?
Let's now talk about the coupled cluster method. This is a polynomially scaling classical algorithm for estimating quantum mechanical energies, and it is used extremely widely in quantum chemistry, where people use it to solve real-world problems about predicting the behavior of chemicals on a computer before doing dangerous experiments in a lab (unlike factoring numbers or cracking a Deutsch-Jozsa blackbox). It has widely been called the "gold standard" of quantum chemistry. Coupled cluster is not in general variational, and this is sometimes a concern for quantum chemists. Variational coupled cluster (vCC) and unitary coupled cluster (uCC) do exist as algorithms for classical computers, but are not considered practical, and the early VQE papers (for example in 2017) promoted the fact that quantum computers executing VQE could do uCC (meaning, an improved version of the "gold standard" of quantum chemistry). But unfortunately uCC this is just an improved version of a polynomially scaling energy estimation, whose advantage arises because there's only a polynomially scaling number of parameters in the wavefunction model, which is also a disadvantage because it doesn't have the exponentially scaling number of parameters required to get the "true" energy (in other words, it just gives an "estimation" of the energy which may or may not be slightly better than what people are doing on classical computers already).
While the above paragraph might be a bit depressing, the popularity of VQE with uCC circa 2017 led also to new methods like the hardware efficient ansatz in 2017 and qubit coupled cluster in 2018. The former created a huge buzz in the pop-sci media, not because VQE is promising, but because it was one of the world's biggest hardware companies (IBM) demonstrating an attempt at a solution of a real-world problem on real quantum hardware (a big deal back in 2017!) and publishing it in Nature; and the latter 2018 paper was actually very interesting science in itself.