Added 01/11/23: As noam.szyfer commented, there is currently a gap in the first proof below (at the statement marked with a ($\blacksquare$)). The second proof does imply that the $\Leftarrow$ in ($\blacksquare$) does hold, however since this second proof rests on a characterization of weak mixing that is not listed as admissible in the OP, my answer does not completely answer the question.
Below are two different proofs; the latter is based on a characterization of weak mixing that you are not listing as available, but it's still a fairly common characterization (namely, in terms of the eigenfunctions of the Koopman operator).
For reference purposes a version of this is in Einsiedler & Ward's Ergodic Theory, p.53, Exr.2.7.11; in the literature (both in the ergodic theory and in topological dynamics) these types of statements go by the name of "Furstenberg Intersection Lemma"s.
First let me introduce some notation and rephrase the exercise at hand. For $X=(X,\mathcal{B}(X))$ a measurable space denote by $\mathfrak{E}(X)$ ("the ergodic theory of $X$") the collection of all pairs $(\mu,f)$, where $\mu$ is a probability measure on $X$ and $f$ is a measurable self-map of $X$ preserving the measure $\mu$. Let us denote by $\mathcal{B}(X,\mu)^\ast$ the collection of all measurable subsets of positive $\mu$-measure. I will use the abbreviation $A=_\mu B$ for $A,B$ two sets to mean that they coincide up to a $\mu$-negligible set; in particular $A\neq_\mu \emptyset$ means $A$ has positive $\mu$-measure.
Let $(\mu,f)\in\mathfrak{E}(X)$ and put for any two measurable subsets $A,B$:
$$\text{Hit}^{(\mu,f)}(B\leftarrow A)=\{n\in\mathbb{Z}_{\geq1}\,|\, A\cap f^{-n}(B)\neq_\mu \emptyset\}.$$
Thus $\text{Hit}^{(\mu,f)}(B\leftarrow A)$ is the set of hitting times (see e.g. https://mathoverflow.net/a/431178/66883 for the topological analog and further details).
The characterization of ergodicity you mention can thus be stated like so:
$$(\mu,f) \text{ is ergodic } \iff \forall A,B\in\mathcal{B}(X;\mu)^\ast: \text{Hit}^{(\mu,f)}(B\leftarrow A)\neq\emptyset. \quad\quad (\bigstar)$$
Similarly, using the fifth characterization of weak mixing in your list, we can say that $(\mu,f)$ is weakly mixing iff
$$\forall A_1,A_2,B_1,B_2\in\mathcal{B}(X;\mu)^\ast: \text{Hit}^{(\mu\otimes \mu,f\times f)}(B_1\times B_2\leftarrow A_1\times A_2)\neq\emptyset. \quad (\blacksquare)$$
Here $(\mu\otimes \mu,f\times f)\in\mathfrak{E}(X\times X)$, where $X\times X$ is endowed with the product $\sigma$-algebra. Note that
$$\text{Hit}^{(\mu\otimes \mu,f\times f)}(B_1\times B_2\leftarrow A_1\times A_2) = \text{Hit}^{(\mu,f)}(B_1\leftarrow A_1)\cap \text{Hit}^{(\mu,f)}(B_2\leftarrow A_2),$$
so that weak mixing is a notion that models simultaneous hitting times. For any (ordered) quadruplet $(A,B,C,D)$ of measurable subsets of positive measure, let $\mathcal{P}(A,B,C,D)$ stand for the statement
$$"\text{Hit}^{(\mu,f)}(B\leftarrow A)\cap \text{Hit}^{(\mu,f)}(D\leftarrow C)\neq\emptyset".$$
Let us think of the problem at hand as reducing the number of "variables" (the subsets $A,B,C,D$ above) one needs for weak mixing; a priori weak mixing requires four variables, and the OP is about if three variables are enough, if they are chosen in a certain configuration. Indeed, two variables is enough ($\dagger$):
Claim: Let $X$ be a measurable space, $(\mu,f)\in\mathfrak{E}(X)$. Then $(\mu,f)$ is weakly mixing iff
$$\forall A,B,C,D\in\mathcal{B}(X;\mu)^\ast: \mathcal{Q}(A,B,C,D) \implies \mathcal{P}(A,B,C,D),$$
where $\mathcal{Q}(A,B,C,D)$ is any one of the following constraints:
- $A=C$
- $B=D$
- $B=C$
- $B=D$
- $B=C$
- $A=D$
- $C=D$
- $A=B$
- $A=B=C$
- $A=C=D$
- $A=B=D$
- $B=C=D$
Disclaimer: I tried to compress the statement but I couldn't see a way that is less confusing; I would be open to suggestions. The OP is asking if the validity of $\mathcal{P}(A,B,C,D)$ with the constraint $A=C$ is enough for weak mixing for instance. It's clear that its sufficient to check that either of the constraints $A=B=C$ and $A=B=D$ is sufficient for weak mixing.
($\dagger$) One variable is not enough since that $\mathcal{P}(A,B,C,D)$ with the constraint $A=B=C=D$ is exactly the content of (one of the versions of) the Poincaré Recurrence Theorem.
In this section for brevity let's agree that capital letters stand for measurable subsets with positive measure with and lowercase letters stand for hitting times. Further, let's abbreviate e.g. $"q\in \text{Hit}^{(\mu,f)}(P\leftarrow P)\cap \text{Hit}^{(\mu,f)}(Q\leftarrow P)"$ by the diagram $P \xleftarrow{q} P\xrightarrow{q} Q$.
Proof of Claim (via combinatorics): Suppose we are given the validity of $\mathcal{P}(A,B,C,D)$ with the constraint $A=B=C$, so that we are only allowed to choose the fourth set to be different for a simultaneous hitting time (the proof for the constraint $A=B=D$ is similar; I'll omit it). First we'll obtain the relaxed constraint $A=C$ and then we'll lift this new constraint to get weak mixing.
Let's first obtain the simultaneous hitting time for $Q \leftarrow P\rightarrow R$. By hypothesis for some $n\geq1$ we have $P \xleftarrow{n} P\xrightarrow{n} Q$. Since $\mu$ is preserved under $f$ again by hypothesis for some $m\geq1$ we have
$$[P\cap f^{-n}(Q)] \xleftarrow{m} [P\cap f^{-n}(Q) ]\xrightarrow{m} [f^{-n}(R)].$$
Then we have $Q \xleftarrow{n+m} P\xrightarrow{n+m} R$ ($\ast$).
Next we obtain the full statement. First by ($\ast$) we have $M \xleftarrow{k} K \xrightarrow{k} N $ for some $k\in\geq1$. Again by ($\ast$) we have for some $l\geq 1$
$$ L \xleftarrow{l} [K\cap f^{-k}(M)] \xrightarrow{l} f^{-k}(N). $$
Then we have $K\xrightarrow{l} L$ and $M\xrightarrow{l} N$, so that $\mathcal{P}(A,B,C,D)$ holds with no constraints, that is, $(\mu,f)$ is weakly mixing.
For the second proof, we use the following characterization of weak mixing: $(\mu,f)$ is weakly mixing iff
$$\forall \phi\in L^2(X,\mu;\mathbb{C}), \forall \theta\in[0,1[: \phi\circ f=_\mu e^{2\pi i\theta}\phi \implies \phi=_\mu \mathbb{E}_\mu(\phi).$$
(Here $\mathbb{E}_\mu(\phi)=\int_X \phi(x)\, d\mu(x)\in\mathbb{C}$ is the constant that is the expectation of $\phi$ w/r/t $\mu$.)
Recall also that the ergodicity of $(\mu,f)$ is characterized by the same statement with the constraint $\theta=0$, so that $(\mu,f)$ is ergodic iff
$$\forall \phi\in L^2(X,\mu;\mathbb{C}): \phi\circ f=_\mu \phi \implies \phi=_\mu \mathbb{E}_\mu(\phi).$$
Thus $(\mu,f)$ is weakly mixing iff for the associated Koopman operator $\phi\mapsto \phi\circ f$ any eigenvalue has (geometric) multiplicity $1$, and $(\mu,f)$ is ergodic iff for the Koopman operator the eigenvalue $1$ has (geometric) multiplicity $1$.
Let us make one general observation and state one lemma regarding rotations (arguably this is the most fun part in this whole answer) we'll use:
Observation: Let $X,Y$ be two measurable spaces, $(\mu,f)\in\mathfrak{E}(X)$, $(\nu,g)\in\mathfrak{E}(Y)$ be two systems. Suppose $\pi: (\mu,f)\to (\nu,g)$ is a factor map, so that $\pi: X\to Y$ is measurable, $\pi_\ast(\mu)=\nu$, and $\pi\circ f=_\mu g\circ \pi$. Then
$$\forall A,B\in\mathcal{B}(Y;\nu): \text{Hit}^{(\mu,f)}(\pi^{-1}(B)\leftarrow \pi^{-1}(A))\subseteq \text{Hit}^{(\nu,g)}(B\leftarrow A).$$
For any $\theta\in [0,1[$, denote by $R_\theta: S^1\to S^1$ rotation by $\theta$: $z\mapsto e^{2\pi i\theta}z$.
Lemma: For any $\theta\in]0,1[$ and for any Borel probability measure $\nu$ on $S^1$ invariant under $R_\theta$, there is an $\epsilon^\ast\in\mathbb{R}_{>0}$ such that for any $\epsilon\in ]0,\epsilon^\ast[$, there are disjoint closed arcs $I_1,I_2,J_1,J_2\subseteq S^1$ such that
- They are ordered as $I_1 < J_1 < I_2 < J_2$ w/r/t the counterclockwise orientation on $S^1$,
- $0 <\text{length}(I_i)<\epsilon < \text{length}(J_i)$ for $i=1,2$,
- $0< \nu(I_i)$ for $i=1,2$.
Let me give the second proof assuming the lemma for now; in the next section I will outline a proof of the lemma also. Let us note however that $\nu$ would be a Dirac measure iff $R_\theta$ has a fixed point iff $\theta=0$; thus the hypotheses of the lemma make Dirac measures irrelevant, but not necessarily purely atomic measures, as we are not requiring $\theta$ to be irrational.
Proof of Claim (via Koopmanism): Suppose for any $A,B\in\mathcal{B}(X,\mu)^\ast$ we have
$$\text{Hit}^{(\mu,f)}(A\leftarrow A)\cap \text{Hit}^{(\mu,f)}(B\leftarrow A)\neq\emptyset.$$
(To reiterate this corresponds to the constraint $A=B=C$; this proof actually works simultaneously for the constraint $A=B=D$; see $\blacktriangle$ below.)
In particular by ($\bigstar$) we have that $(\mu,f)$ is ergodic. Suppose $(\mu,f)$ is not weak mixing. Then the associated Koopman operator has a nonconstant eigenfunction, that is, there is a $\phi\in L^2(X,\mu;\mathbb{C})$ that is not constant $\mu$-almost everywhere and a $\theta\in]0,1[$ such that $\phi\circ f=_\mu e^{2\pi i\theta}\phi$. Taking the moduli of both sides we have $|\phi|\circ f =_\mu |\phi|$, so that $|\phi|\in L^2(X,\mu;\mathbb{R})$ is an eigenfunction with eigenvalue $1$; by ergodicity of $(\mu,f)$ $|\phi|\neq0$ is constant. Consequently we may define $\psi: X\to S^1$, $x\mapsto \frac{\phi(x)}{|\phi(x)|}$. $\psi$ is still an $L^2$ eigenfunction of the Koopman operator with eigenvalue $e^{2\pi i\theta}$. Put $\nu=\psi_\ast(\mu)$, so that $\nu$ is the pushforward of $\mu$ via $\psi$. Then we have that $\psi: (\mu,f)\to (\nu,R_\theta)$ is a factor map.
Note that $\psi$ is not constant, so that $\theta\neq0$. Applying the fun lemma to $(\nu, R_\theta)$, we obtain four closed arcs $I_1, I_2, J_1, J_2\subseteq S^1$. By construction, if an iterate of $I_1$ intersects with $I_2$, then this same iterate can't intersect with $I_1$ (and similarly for the indices switched). Then defining $A=\psi^{-1}(I_1)$ and $B=\psi^{-1}(I_2)$, which are measurable subsets of $X$ with positive $\mu$ by the fun lemma, we have by the general observation above that
$$\text{Hit}^{(\mu,f)}(A\leftarrow A)\cap \text{Hit}^{(\mu,f)}(B\leftarrow A)\subseteq \text{Hit}^{(\nu,R_\theta)}(I_1\leftarrow I_1)\cap \text{Hit}^{(\nu,R_\theta)}(I_2\leftarrow I_1)=\emptyset,$$
$$(\text{and }\text{Hit}^{(\mu,f)}(A\leftarrow A)\cap \text{Hit}^{(\mu,f)}(B\leftarrow A)\subseteq \text{Hit}^{(\nu,R_\theta)}(I_1\leftarrow I_1)\cap \text{Hit}^{(\nu,R_\theta)}(I_1\leftarrow I_2)=\emptyset\quad(\blacktriangle))$$
a contradiction.
In this section I'll give the proof of the lemma. Before the proof, here are some comments about the intuition: we want two arcs $I_1$ and $I_2$ whose preimages under $\psi$ have positive $\mu$ measure; taking these arcs short enough will break the simultaneous hitting time hypothesis by the general observation. In order to control this, one needs to make sure that the two small arcs are separated enough; the role of the other two arcs is to act as bumpers.
Proof of lemma: We argue by contradiction. Suppose there is an $\epsilon_0\in\mathbb{R}_{>0}$ such that for any four disjoint closed arcs $I_1,I_2,J_1,J_2$ with positive length such that if they are ordered as $I_1 < J_1 < I_2 < J_2$ w/r/t the counterclockwise orientation on $S^1$, and $0 <\text{length}(I_i)<\epsilon < \text{length}(J_i)$ for $i=1,2$, then
$$\nu(I_1)=0\text{ or }\nu(I_2)=0.$$
Let $k$ be the unique integer such that $\frac{1}{2^{k+1}}<\epsilon_0<\frac{1}{2^k}$. Partition the circle into half-open half-closed arcs of length precisely $\frac{1}{2^{k+1}}$. $\nu$ is a probability measure on $S^1$, so at least one of these arcs have positive $\nu$ measure, say $I_1$. Then by hypothesis the arc consisting of all points at least $\frac{1}{2^k}$ away from $I_1$ has zero $\nu$ measure; thus the $\frac{1}{2^k}$ neighborhood $N_1$ of $I_1$ has full measure. By induction (for the next step partition the closure of $N_1$ etc) in countable steps this procedure gives that $\nu$ is a Dirac measure, which implies that $\theta=0$, a contradiction.
Added: The reason why ($\blacksquare$) is equivalent to weak mixing is because the collection $\mathcal{R}=\mathcal{B}(X)\times \mathcal{B}(X)\subseteq \mathcal{B}(X\times X)$ is a semi-algebra generating the whole $\sigma$-algebra: $\mathcal{R}$ contains the emptyset, is closed under finite intersections, and the complement of any element in $\mathcal{R}$ can be written as the union of finitely many disjoint elements in $\mathcal{R}$. One can then verify ergodicity/weak mixing/strong mixing only for elements in $\mathcal{R}$; see e.g. Walters' An Introduction to Ergodic Theory, p.41, Thm.1.17, or p.52, Exr.2.7.3 of Einsiedler-Ward's book mentioned above.
Alternatively, the second proof above also proves this without notice.
More Added: As per request, here are some further details and references regarding ($\blacksquare$). Let $X$ be a measurable space, $(\mu,f)\in\mathfrak{E}(X)$. Denote by $\mathcal{M}(X;\mu)$ the measure algebra of $X$ w/r/t $\mu$; by definition this is the quotient $\sigma$-algebra $\mathcal{B}(X)/\mathcal{N}(\mu)$ of $\mathcal{B}(X)$ by the $\sigma$-ideal of $\mu$-negligible sets. Then $d_\mu:(A,B)\mapsto \mu(A\triangle B)=\mu(A\setminus B)+\mu(B\setminus A)$ is a distance function on $\mathcal{M}(X;\mu)$, $(\mathcal{M}(X;\mu),d_\mu)$ is a complete metric space, $\mu:\mathcal{M}(X;\mu)\to [0,1]$ is (globally) $1$-Lipschitz, $f^{-1}:\mathcal{M}(X;\mu)\to \mathcal{M}(X;\mu)$ is distance preserving, and when the product $\mathcal{M}(X;\mu)\times \mathcal{M}(X;\mu)$ is endowed with the $\ell^1$-product distance (i.e. distance between pairs is the sum of the distances of the corresponding components), then $\cup,\cap,\triangle,\setminus:\mathcal{M}(X;\mu)\times \mathcal{M}(X;\mu)\to \mathcal{M}(X;\mu)$ are all also (globally) $1$-Lipschitz. (see e.g. Fremlin's Measure Theory, Ch.32 for all these and much more, available at https://www1.essex.ac.uk/maths/people/fremlin/mt.htm). Consequently, for any $n\in\mathbb{Z}_{\geq0}$, $\mathcal{F}^{(\mu,f)}_n:(A,B)\mapsto \mu(A\cap f^{-n}(B))$ is (globally) $1$-Lipschitz.
We'll also use the following facts (these are from Walters' book, Ch.0; he lists some references there):
Fact 1: Let $X$ be a measurable space, $\mathcal{S}\subseteq\mathcal{B}(X)$ be a generating semi-algebra, $\mathcal{A}\subseteq\mathcal{B}(X)$ be the algebra generated by $\mathcal{S}$. Then $\forall A\in\mathcal{A},\exists p\in\mathbb{Z}_{\geq1}, \exists A_\bullet: \{1,2,...,p\}\to \mathcal{S}$ such that $A=\biguplus_{i=1}^p A_i$. In words, any element of the algebra generated by the semi-algebra can be written as the disjoint union of finitely many elements of the semi-algebra.
Fact 2: Let $X$ be a measurable space, $\mu$ be a probability measure on $X$. Then for any generating subalgebra $\mathcal{A}\subseteq \mathcal{B}(X)$, the subalgebra $\mathcal{A}/\mathcal{N}(\mu)\subseteq \mathcal{M}(X;\mu)$ is dense w/r/t the topology induced by $d_\mu$.
Sketch of Proof of ($\blacksquare$): As the OP commented below, it suffices to show that existence of hitting times for sets in a generating semi-algebra implies existence of hitting times for arbitrary sets. Let $\mathcal{S}$ be a generating semi-algebra and $\mathcal{A}$ be the algebra generated by $\mathcal{S}$. Suppose we are given the existence of hitting times for nonnegligible sets in $\mathcal{S}$. Extend this to the existence of hitting times for nonnegligible sets in $\mathcal{A}$ by using Fact 1 (if a large set is nonnegligible not all of its small parts can be negligible, and if a small part of a large set hits a small part of another large set, then the large set hits the other large set). Then extend this to the existence of hitting times arbitrary nonnegligible sets by using Fact 2 ( if both $A,B$ are nonnegligible, for $\epsilon>0$ small enough they can be $\epsilon$-approximated (w/r/t $d_\mu$) by nonnegligible elements $A_\epsilon,B_\epsilon$ in the algebra $\mathcal{A}$; existence of hitting times for $\mathcal{A}$ means that for some $n\in\mathbb{Z}_{\geq1}$, $\mathcal{F}^{(\mu,f)}_n(A_\epsilon,B_\epsilon)>0$, shrinking $\epsilon$ if necessary, the Lipschitz nature of $\mathcal{F}^{(\mu,f)}_n$ gives that $n$ is a hitting time from $A$ to $B$). Finally lifting the statement from $\mathcal{M}(X;\mu)$ to $\mathcal{B}(X)$ is straightforward.
Some Heuristics: As a final note here are some comments as to why it is warranted for the OP to be careful about generating semi-algebra constraints etc.. Consider the cartesian square of rotation $R_\theta\times R_\theta: (z,w)\mapsto (e^{2\pi i \theta}z,e^{2\pi i\theta}w)$. Then any diagonal strip on the torus is invariant, so that $R_\theta\times R_\theta$ is not ergodic (w/r/t the haar measure on the torus), and rightfully so, as it is well known that no rotation $R_\theta$ is weakly mixing (see also the discussion at Dynamics on the torus). However $R_\theta\times R_\theta$ satisfies one of the ergodicity conditions for sets in $\mathcal{B}(S^1)\times\mathcal{B}(S^1)$, namely that any invariant rectangular set has either measure zero or one.
However as a general rule of thumb whenever a property can be formulated in a way that is similar to "asymptotic independence" in some sense, this property ends up being verifiable using fewer sets as variables than it seems to require a priori (let us not worry about properties beyond mixing ($K$, Bernoulli,...) for the sake of this part; arguably for these this discussion needs to be enhanced). See also the discussion at Understanding a basic ergodic theory physical analogy for more heuristics. Philosophically speaking, the difference between the two types of conditions is that the first type is in reality comparing objects (the sub-$\sigma$-algebra of the measure algebra of invariant sets versus the indiscrete sub-$\sigma$-algebra of the measura algebra consisting of sets of measures zero or one) that form "at the end of time" (by way of the ergodic theorem); whereas the second type gives conditions for objects "in time" and what they should turn into "at the end of time". If one starts with objects "in time" then constraints end up not mattering, but if one starts with objects "at the end of time" then constraints may have effects. As an example, the "constraint" in the above example $R_\theta\times R_\theta$ is the choice of a coordinate system; indeed up to a coordinate change $R_\theta\times R_\theta$ is of the form $R_\eta\times\text{id}_{S^1}$; hence for ergodicity the constraint, when considered "at the end of time" (by taking into account rectangles w/r/t the particular coordinate system that are invariant), ends up not working.