A reference would be Hirsch's Differential Topology, Exr.2 on p.187:
Exercise: For any $d\in\mathbb{Z}_{\geq1}$ and any $r\in\mathbb{Z}_{\geq1}\cup\{\infty,\omega\}$, the orthogonal group $O(d,\mathbb{R})$ is a deformation retract of $\text{Diff}^r_{\text{weak}}(\mathbb{R}^d)$, where the latter is endowed with the weak ( = compact-open = topology of uniform convergence on compact subsets) topology.
Also note that the Wikipedia section in question assumes that elements of $\text{Diff}(\mathbb{R}^d)=\text{Diff}^1_{\text{weak}}(\mathbb{R}^d)$ are at least $C^1$, so a priori the first derivatives are continuous. See Hirsch's Differential Topology vs Rudin Functional analysis definition of weak and strong topology. for a discussion of this topology.
I will give outlines of proofs of a few statements in question. For the sake of brevity I will focus on the $r=1$ case, so that $f_n\to f$ in $\text{Diff}^1_{\text{weak}}(\mathbb{R}^d)$ iff
\begin{align*}
\forall \text{ compact }K\subseteq \mathbb{R}^d:
&\lim_{n\to \infty} \sup_{x\in K}|f_n(x)-f(x)|=0\,\,\, (\ast)\,\,\,\text{ and } \\
&\lim_{n\to\infty}\sup_{x\in K}\Vert f_n'(x)-f'(x)\Vert=0\,\,\, (\ast\ast),
\end{align*}
where the first norm $|\bullet|$ is the Euclidean norm on $\mathbb{R}^d$ and the second norm $\Vert\bullet\Vert$ is the associated operator norm on $\mathbb{R}^d$ (by finite dimensionality the norms that we use are not important, I chose these for the sake of definiteness), and for $g:\mathbb{R}^d\to\mathbb{R}^d$ a differentiable function, $g':\mathbb{R}^d\to \text{GL}(d,\mathbb{R})$ is the derivative of $g$. ($\text{Diff}^r_{\text{weak}}$ is always second countable and completely metrizable, so for convergence sequences are sufficient.)
(Below $|\bullet|_{K,C^0}$ stands for the supremum of the first norm over $K$ and $|\bullet|_{K,C^1}$ stands for the sum of the supremums of both norms over $K$.)
Let us list some claims and facts.
Claim 1: $\text{Diff}^1_{\text{weak}}(\mathbb{R}^d)$ has at least two connected components.
Claim 2: $\text{Diff}^1_{\text{weak}}(\mathbb{R}^d)$ has exactly two connected components. One component (= identity component) consists of all orientation preserving diffeomorphisms and the other component consists of all orientation reversing diffeomorphisms.
Claim 3: The closed subgroup $\text{Diff}^1_{\text{weak}}(\mathbb{R}^d,0)$ of $\text{Diff}^1_{\text{weak}}(\mathbb{R}^d)$ consisting of $C^1$ diffeomorphisms of $\mathbb{R}^d$ preserving $0$ is a deformation retract.
Claim 4: The closed subgroup $\text{GL}(d,\mathbb{R})$ of linear automorphisms of $\mathbb{R}^d$ is a deformation retract of $\text{Diff}^1_{\text{weak}}(\mathbb{R}^d,0)$.
Fact 1: "Being a deformation retract of" is a transitive relation and it implies homotopy equivalence.
See If $A\subset B\subset X$ and $A$ and $B$ are deformation retracts of $X$, then $A$ is a deformation retract of $B$ or https://planetmath.org/DeformationRetractIsTransitive.
Fact 2: Homotopy equivalent spaces have the same number (cardinality) of connected components.
See $X,Y$ are homotopy equivalent, so the number of connected component in $X$ and $Y$ is equal, Proof of another Hatcher exercise: homotopy equivalence induces bijection, $X \simeq Y \Rightarrow \pi_0(X) \cong \pi_0(Y)$
Claim 5: The closed subgroup $\text{GL}(d,\mathbb{R})$ of $\text{Diff}^1_{\text{weak}}(\mathbb{R}^d)$ is a deformation retract.
Fact 3: $\text{GL}(d,\mathbb{R})$ has exactly two connected components. One component (= identity component) consists of all orientation preserving linear automorphisms (i.e. automorphisms of positive determinant) and the other component consists of all orientation reversing linear automorphisms (i.e. automorphisms of negative determinant).
See How many connected components does $\mathrm{GL}_n(\mathbb R)$ have?. (Observe that here it is also mentioned that $O(d,\mathbb{R})$ is a deformation retract of $\text{GL}(d,\mathbb{R})$, which together with Fact 1 also gives a conclusive answer to the exercise from Hirsch's book, at least for $r=1$.)
Claim 2 implies Claim 1 (although below I will describe a shorter, more direct proof of Claim 1). Claims 3 & 4 together with Facts 1 & 2 imply Claim 5, which in turn together with Fact 3 implies Claim 2.
Outline of Proof of Claim 1: We claim that $\text{id}_{\mathbb{R}^d}$ and $-\text{id}_{\mathbb{R}^d}$ belong to different connected components of $\text{Diff}^1_{\text{weak}}(\mathbb{R}^d)$. Suppose otherwise. Then there is a continuous $\gamma_\bullet:[0,1]\to \text{Diff}^1_{\text{weak}}(\mathbb{R}^d)$ such that $\gamma_0=\text{id}_{\mathbb{R}^d}$ and $\gamma_1=-\text{id}_{\mathbb{R}^d}$. Define
$$\mathfrak{D}: \text{Diff}^1_{\text{weak}}(\mathbb{R}^d)\to \text{GL}(d,\mathbb{R}), f\mapsto f'(0).$$
It's straightforward that $\mathfrak{D}$ is continuous. Then we have that
$$\Gamma_\bullet:[0,1]\to \mathbb{R}\setminus0, t\mapsto \det(\mathfrak{D}(\gamma_t))=\det(\gamma_t'(0))$$
is a continuous path from $\Gamma_0=1$ to $\Gamma_1=-1$, a contradiction. (See Proof that determinant is continuous using $\epsilon-\delta $ definition for continuity of determinant.)
Outline of Proof of Claim 3: Put
$$\mathfrak{T}:[0,1]\times \text{Diff}^1_{\text{weak}}(\mathbb{R}^d)\to \text{Diff}^1_{\text{weak}}(\mathbb{R}^d), (t,f)\mapsto [\mathfrak{T}_t(f):x\mapsto f(x)-tf(0)].$$
Then $\mathfrak{T}_0(f)=f$, $\mathfrak{T}_1(f)(0)=0$ and if $f(0)=0$, then $\mathfrak{T}_t(f)=f$, so that once we show that $\mathfrak{T}$ is continuous we are done. For this let $t_n\to t$ and $f_n\to f$. For $K\subseteq \mathbb{R}^d$ a compact subset and $x\in K$ we have
\begin{align*}
|\mathfrak{T}(t_n,f_n)(x) - \mathfrak{T}(t,f)(x)|
&\leq |f_n(x)-f(x)| + |t_n-t||f_n(0)| +t |f_n(0)-f(0)|\\
&\leq (1+t)|f_n-f|_{K,C^0}+ |f_n(0)||t_n-t|.
\end{align*}
As $f_n(0)\to f(0)$ it is bounded, so $(\ast)$ above is satisfied. $(\ast\ast)$ is also immediate as
$$\mathfrak{T}(t_n,f_n)'(x)=f'(x),$$
so that $\mathfrak{T}$ is continuous, and is a deformation retraction.
Outline of Proof of Claim 4: Put
$$\mathfrak{L}:[0,1]\times \text{Diff}^1_{\text{weak}}(\mathbb{R}^d,0)\to \text{Diff}^1_{\text{weak}}(\mathbb{R}^d,0), (t,f)\mapsto \left[\mathfrak{L}_t(f):x\mapsto \begin{cases} \dfrac{f(tx)}{t}&\text{, if }t\neq0\\ f'(0)&\text{, if }t=0\end{cases}\right].$$
We have that $\mathfrak{L}_1(f)=f$, $\mathfrak{L}_0(f)=f'(0)$, and if $A\in\text{GL}(d,\mathbb{R})$, then $\mathfrak{L}_t(A)=A$. Thus again it suffices to show that $\mathfrak{L}$ is continuous. Let $t_n\to t$, $f_n\to f$, $K\subseteq \mathbb{R}^d$ be a compact subset and $x\in K$. If $t\neq0$, by switching to a subsequence we may assume $t_n\neq0$. Then
\begin{align*}
|\mathfrak{L}(t_n,f_n)(x)-\mathfrak{L}(t,f)(x)|
&\leq \dfrac{|f_n-f|_{K,C^0}}{t_n} + |f|_{K,C^1}|t_n-t||x|\\
&\implies |\mathfrak{L}(t_n,f_n)-\mathfrak{L}(t,f)|_{K,C^0}\leq \dfrac{|f_n-f|_{K,C^0}}{t_n} + |f|_{K,C^1}|t_n-t||\text{id}_{\mathbb{R}^d}|_{K,C^0};
\end{align*}
since $t_n\to t\neq0$, $\dfrac{1}{t_n}<\infty$ and $sup_{x\in K}|x|$ is bounded since $K$ is compact, so we have $(\ast)$. Also we have
$$\mathfrak{L}(t_n,f_n)'(x)=f_n'(t_nx),$$
so that the limit $(\ast\ast)$ is also straightforward. The case when $t=0$ is similar, and consequently $\mathfrak{L}$ is a deformation retraction.