Let $A$ and $B$ be any two operators on the Hilbert space $\mathscr H$, hermitian or not. We assume $A, B \in L(\mathscr H)$, the Banach algebra of bounded linear maps from $\mathscr H$ to itself. Consider the linear operator ordinary differential equation
$\dfrac{dX}{d \lambda} = [B, X] \tag{1}$
with initial condition
$X(0) = A. \tag{2}$
We observe that
$X(\lambda) = e^{\lambda B}Ae^{-\lambda B} \tag{3}$
is the unique solution to (1), (2), for from (3) it follows that
$\dfrac{dX}{d \lambda} = \dfrac{e^{\lambda B}}{d \lambda}Ae^{-\lambda B} + e^{\lambda B}\dfrac{dA}{d \lambda}e^{-\lambda B} + e^{\lambda B}A\dfrac{e^{-\lambda B}}{d \lambda} =$
$Be^{\lambda B}Ae^{-\lambda B} - e^{\lambda B}Ae^{-\lambda B}B = [B, e^{\lambda B}Ae^{-\lambda B}], \tag{4}$
where we have used the fact that $dA / d \lambda = 0$ and the Leibniz product rule for derivatives in (4), and furthermore it is evident from (3) that $X(0) = A$.
We next recall that for any $B \in L(\mathscr H)$ the adjoint linear operator $\text{ad}_B: L(\mathscr H) \to L(\mathscr H)$ may be defined via
$\text{ad}_B(A) = [B, A] \tag{5}$
for all $A \in L(\mathscr H)$. Denoting by $\Vert T \Vert _L$ the standard operator norm on $L(\mathscr H)$, we see that
$\Vert \text{ad}_B(A) \Vert_L = \Vert [B, A] \Vert_L = \Vert BA - AB \Vert_L \le \Vert BA \Vert_L + \Vert AB \Vert_L$
$\le \Vert B \Vert_L \Vert A \Vert_L + \Vert A \Vert_L \Vert B \Vert_L = 2 \Vert B \Vert_L \Vert A \Vert_L, \tag{6}$
which shows that
$\Vert \text{ad}_B \Vert_L \le 2 \Vert B \Vert_L, \tag{7}$
i.e. that $\text{ad}_B \in L(\mathscr H)$ is itself a bounded linear operator of norm at most $2\Vert B \Vert_L$. Furthermore, we have
$\text{ad}_B^2(A) = \text{ad}_B (\text{ad}_B(A)) = \text{ad}_B([B, A]) = [B, [B, A]], \tag{8}$
$\text{ad}_B^3(A) = \text{ad}_B (\text{ad}_B^2(A)) = \text{ad}_B([B, [B, A]]) = [B, [B, [B, A]]], \tag{9}$
and so on:
$\text{ad}_B^n(A) = [B, [B, [B, . . . [B, A]]] . . . ], \tag{10}$
where the operator $\text{ad}_B = [B, \cdot]$ occurs a total of $n$ times on the right-hand side of (10). We see that in fact (1) may be written in terms of $\text{ad}_B$ as
$\dfrac{dX}{d \lambda} = \text{ad}_B(X). \tag{11}$
Now set
$Y(\lambda) = A + \lambda [B, A] + \dfrac{\lambda^2}{2!}[B, [B, A]]$
$+ \ldots + \dfrac{\lambda^n}{n!}\underbrace{[B, [B, [B, \ldots [B}_{n \; \text{times}}, A]]]] \ldots ] + \ldots; \tag{12}$
from the above we see that $Y(\lambda)$ may be written
$Y(\lambda) = A + \lambda \text{ad}_B(A) + \dfrac{\lambda^2}{2!} \text{ad}_B^2(A) + \ldots + \dfrac{\lambda^n}{n!} \text{ad}_B^n(A) + \ldots$
$= \sum_0^\infty \dfrac{\lambda^n}{n!}\text{ad}_B^n(A) + \ldots = e^{\lambda \text{ad}_B}(A); \tag{13}$
since by (7) $\text{ad}_B$ is a bounded operator on $L(\mathscr H)$, all the series occuring above converge absolutely and uniformly on compacta for all $\lambda \in \Bbb R$, in fact for all $\lambda \in \Bbb C$. We thus have, exactly as in the case of ordinary calculus, that the derivative $Y'(\lambda)$ is given by
$\dfrac{dY}{d\lambda} = \text{ad}_B(e^{\lambda \text{ad}_B}(A)) = [B, e^{\lambda \text{ad}_B}(A)], \tag{14}$
and furthermore
$Y(0) = A, \tag{15}$
which follows trivially from (12) and/or (13). Comparing (1), (2), (11), (14) and (15), we see that $X(\lambda)$ and $Y(\lambda)$, satisfying as they do the same ODE with identical initial conditions, must by uniqueness etc. be identical for all $\lambda$: $X(\lambda) = Y(\lambda)$. Using (3) and (12), (13) we thus see that
$e^{\lambda B}Ae^{-\lambda B} = e^{\lambda \text{ad}_B}(A)$
$= A + \lambda [B, A] + \ldots + \dfrac{\lambda^n}{n!}\underbrace{[B, [B, [B, \ldots [B}_{n \; \text{times}}, A]]]] \ldots ] + \ldots; \tag{16}$
if we now set $B = iG$ we obtain
$e^{i\lambda G}Ae^{-i\lambda G} = e^{i\lambda \text{ad}_G}(A)$
$= A + i\lambda [G, A] + \ldots + \dfrac{(i\lambda)^n}{n!}\underbrace{[G, [G, [G, \ldots [G}_{n \; \text{times}}, A]]]] \ldots ] + \ldots, \tag{17}$
where we have used the fact that $\text{ad}_{iG} = i\text{ad}_G$, a consequence of the linearity of the bracket $[G, A]$ in each of its variables $A, G$. Equation (17) is the desired result. QED.
Note: The technique used here, based on uniqueness of ODEs, is similar in spirit to that used in my answers to several other questions; in particular see this one and this one.
Another Note: A couple of interesting formulas related to the above: $[B, e^{\lambda B}Ae^{-\lambda B}] = e^{\lambda B}[B, A]e^{-\lambda B}$ and $A = e^{-\lambda B} e^{\lambda \text{ad}_B(A)} e^{\lambda B}$.
Hope this helps. Cheerio,
and as always,
Fiat Lux!!!