This answer relies on the Fourier transform, which I denote with a hat. If you don't know the Fourier transform, you can simply jump to the second part of the answer.
Look at the problem using the Fourier transform. Using some basic properties of the Fourier transform, is easy to see that
$$ \widehat {f_\epsilon}(\xi) = \widehat f(\xi)\widehat \phi_\epsilon(\xi)= \widehat f(\xi) \widehat \phi (\epsilon \xi). $$
It follows that
$$ \widehat{(f-f_\epsilon)}(\xi)=(1-\widehat \phi(\epsilon\xi))\widehat f(\xi). \quad (1) $$
Recall that if $g\in L^1$, then $\widehat g$ is continuous and goes to zero at infinity. As you can see, the function $\widehat{(f-f_\epsilon)}$ goes to zero uniformly as $\epsilon\to 0$ (thanks to the fact that $\widehat f$ goes to zero at infinity); moreover, since $\widehat \phi$ is continuous and goes to zero at infinity, it is clear that the speed of uniform convergence of $\widehat{(f-f_\epsilon)}$ to zero is governed by the decay of the function $\widehat f(\xi)$ as $\xi\to\pm\infty$. This is important because it holds the inequality
$$ ||\widehat{(f-f_\epsilon)}||_{L^\infty}\leq ||f-f_\epsilon||_{L^1}, $$
so the speed of convergence of $f_\epsilon$ to $f$ in $L^1$ is at least as bad as the rate of uniform convergence of $\widehat{(f-f_\epsilon)}$ to zero, which we have showed is linked to the decay at infinity of the Fourier transform of $f$ itself.
The problem is that we know that $\widehat f$ goes to zero at infinity, but we do not know much more in general. For instance, I think one can show that exist functions in $L^1$ such that the Fourier transform decays like $\xi^{-\delta}$ with $\delta>0$ arbitrarily small (though I don't know a reference for this, sorry). So in general the rate of convergence of $f_\epsilon$ to $f$ in $L^1$ can be very bad.
I think from the above computations it is clear how the convergence of the mollification to the original function is related to the Fourier transform of the function itself, even for functions that are not in $L^1$.
In your case, though, the Fourier transform of $\chi_{[0,1]}$ is a sinc function, which decays as $\xi^{-1}$, so in principle one could hope for a convergence rate of $f_\epsilon$ of order $\epsilon$ (see formula (1)).
This is in fact true, but to show this one has to use some other properties of the function itself (note for instance that it belongs to any $L^p$). Note that if you ask for the rate of convergence in different norms (e.g. in $L^p$), the rate of convergence could depend on the kind of convergence itself in principle.
To show the convergence rate, let’s assume for the sake of simplicity that we are working in the easier case of the Heaviside function, $f(x)=\chi_{[0,+\infty)}$ (then one can argue by linearity, as $\chi_{[0,1)}(x)=f(x)-f(x-1)$). Let’s compute the mollification $f_\epsilon$:
$$ f_\epsilon(x)=\int f(y) \epsilon^{-1}\phi(\epsilon^{-1}(x-y)) dy=\int_{0}^\infty \epsilon^{-1}\phi(\epsilon^{-1}(x-y))dy = $$
$$ =\int_{0}^\infty \phi(\epsilon^{-1}x-y)dy = f_1(\epsilon^{-1}x). $$
This means that (let me do the calculation for any $L^p$, $p<\infty$) (note that $f(x)=f(\lambda x)$ for any $\lambda>0$)
$$ \int |f(x)-f_\epsilon(x)|^pdx= \int |f(\epsilon^{-1}x)-f_1(\epsilon^{-1}x)|^pdx= \int \epsilon|f(x)-f_1(x)|^pdx , $$
that is
$$ || f-f_\epsilon ||_{L^p}\leq C\epsilon^{1/p}. $$
As you can see, for $p=1$ we obtain what we hoped for, while for different values of $p$ we have a different rate (note that for $p=\infty$ we do not expect to have convergence at all).