I've posted the question in the physics site too.
It is said that
..the largest Lyapunov exponent, which measures the average exponential rate of divergence or convergence of nearby network states.
Lyapunov exponents (LEs) measure how fast nearby trajectories or flows diverge from each other.
Q1: Why does the largest LE measure the average divergence rate, instead of the mean LE?
My thought is that the LEs are somehow eigenvalues of a matrix involved in solving the ODEs $$\tau\frac{dh_i}{dt} = -h_i + \sum_{j=1}^N J_{ij} \phi(h_j),$$ so the solutions would possibly look like a linear combination of $e^{\lambda_i t}$. Since $e^{a t}\gg e^{b t}$ when $a > b$, as $t\to\infty$, the term with the largest LE will be much larger than other terms despite the coefficients, and therefore dominate.
I have not solved the ODEs so I am not sure whether my thought is correct.
How can I strictly solve the equation and answer my question?
This seems to have verified my guess. But the details of calculation are still unclear. It is a non-linear ODE and the solution is possibly more complex than that of a linear ODEs, for which the monotonicity of $e^{\lambda_i t}$ straightforwardly gives the result.
So perhaps the question can be restated as how solutions of non-linear ODEs differ from those of linear ODEs, and whether we can still use the linear algebra method of eigenvalues and eigenvectors to solve the former.
One idea is to linearize the ODEs near the fixed point, with Jacobian (of which LEs are, roughly speaking, eigenvalues).
Even if we can do so, the conclusion seems to be valid only near the fixed point; while here we need to consider $t\to\infty$ for chaos (unstable), and therefore it is almost certain that we would go away from the fixed point.
Q2: Why do the other LEs matter for characterizing chaos?
Q3:
We know $g$ is proportional to (sqrt of) the variance of $J$'s every entry ($J_{ij} \text{~} \mathcal{N}(0, g^2/N)$).
Why is it also positively related to the variance of $h_i$. In other words, why stronger coupling results in stronger neuronal signals, from a math perspective?
(This question is posted here too; with additional references.)
Here is the orginal paper: