"The error" in my reasoning was that I first used Clenshaw-Curtis to calculate length so that I can approximate the length function $s(t)$ with the Chebyshev series. Instead, Clenshaw-Curtis is the integral of the Chebyshev approximation itself.
So the answer would be to approximate $||\mathbf{C}'(t)||_2$ with Chebyshev series $\frac{1}{2}\sum_{k=0}c_kT_k(2t-1)$. Then, the length function can be easily derived:
\begin{align}
s(t) &= \int_0^t ||\mathbf{C}'(t)||_2dt \\
&= \frac{1}{2}\sum_{k=0}c_k \int T_k(2t-1)dt \\
&= \frac{1}{2}\sum_{k=0}\frac{c_k}{2}\left(\frac{T_{k+1}(2t-1)}{k+1} - \frac{T_{k-1}(2t-1)}{k-1}\right) + C\\
&= \frac{1}{2}\sum_{k=1}\frac{c_{k-1} - c_{k+1}}{2k}T_k(2t-1) + C
\end{align}
I.e., from the coefficients $\{c_0,\ c_1,\ \dots\}$ we can derive new coefficient for the Chebyshev approximation of the length function:
\begin{align}
c^{new}_k &= \frac{c_{k-1} - c_{k+1}}{2k} \\
s(t) &= \frac{1}{2}\sum_{k=0}c^{new}_kT_k(2t-1)dt
\end{align}
Since we have "lost" coefficient $c^{new}_0$, we can set the $c^{new}_0 = -2s(t)|_{c^{temp}_0=0}$
In Wiki, it is said that the integral fo Chebyshev polynomial $T_k(x)$ holds for $k \geq 2$. Since I'm not interested in the approximation of $||\mathbf{C}'(t)||$ itself, I do not divide $c_0$ and $c_N$ by two after calculating them with DCT, thusby "extending" it to $k \geq 1$ (usually $T_1$ and $T_0$ are special cases). Chebyshev nodes at which I am calculating correspond to the extrema of Chebyshev polynomials, i.e.:
\begin{align}
t_n &= \frac{\cos\left(\frac{n\pi}{N}\right)+1}{2} \\
\{c_0,\ c_1,\ \dots, \ c_N\} &= \text{DCT}\left(\frac{||\mathbf{C}'(t_0)||_2}{N},\ \frac{||\mathbf{C}'(t_1)||_2}{N},\ \dots,\ \frac{||\mathbf{C}'(t_0)||_N}{N}\right)
\end{align}
EDIT:
Comparing this to other applicable approaches (as I stated in a comment, we cannot use piecewise quadratic or cubic representation because the geometric continuity would be lost).
Legendre-Gauss and other Gaussian approaches give the same precision, but the problem lies in deducing the correct sample size for calculating them. Also, it is costly to recalculate $s(t)$ every time for some new parameter $t$.
Adaptive quadrature schemes give good results (with Clenshaw-Curtis being the best we tried), but one has to again recalculate them for different parameters $t$.
As for piecewise linear approximation. With the equally spaced $t$s (e.g., $0:0.01:1$), it is inefficient compared with the Gaussian quadrature approach, and one would need to evaluate at substantially more points. On the other hand, with subdividing strategy, we can adaptively subdivide a curve into linear segments. We use this for the graphical representation and have read a lot about the linearity criterion (a beast for another topic). The inefficiency comes from iteratively dividing the curve into sub-curves (even though we use matrix operation) to achieve the required precision of piecewise flatness.
The empirical tests in our case show that the Chebyshev approximation achieves the same precision as the Gaussian quadrature. On the first iteration, it is slower than Gauss-Legendre (because of DCT), but if one needs to make multiple length estimates, it is extremely more efficient.