Description 1 is indeed a formulation of the sheaf axiom, but the way you interpreted it is incorrect. Saying that that sequence is an equalizer does not just mean that it commutes; it means that if any other map $X \longrightarrow \prod F(U_i)$ commutes with the two arrows to $\prod F(U_i \cap U_j)$, then the map $X \longrightarrow \prod F(U_i)$ factors uniquely through the equalizer $F(U) \longrightarrow \prod F(U_i)$. That is, given such a map from $X$, we have the following commutative diagram:
$$
\require{AMScd}
\begin{CD}
F(U) @>>> \prod F(U_i)\\
@AAA @AidAA\\
X @>>> \prod F(U_i)
\end{CD}
$$
where in fact the map $X \longrightarrow F(U)$ is unique among all maps making this square commute. I wrote this diagram as a square instead of a triangle for two reasons. First of all, the AMScd package doesn't allow diagonal arrows. But this is a blessing in disguise in this case, as it illustrates that the equalizer of $\prod F(U_i) {{{} \atop \longrightarrow}\atop{\longrightarrow \atop {}}} \prod F(U_i \cap U_j)$ is not just the object $F(U)$, but the map $F(U) \longrightarrow \prod F(U_i)$. Then the diagram above essentially says that the map $F(U) \longrightarrow \prod F(U_i)$ is terminal among all maps $X \longrightarrow \prod F(U_i)$. I won't specify the category in which it is literally terminal, but I hope this gets the idea across.
For a more concrete example, we can take two maps $f_0, f_1: A \longrightarrow B$. It turns out that their equalizer is $\{a \in A : f_0(a) = f_1(a)\}$ along with the inclusion map into $A$. For instance, if $f_0 = f_1$ then this is just $A$ itself. And indeed, in this case, any map $X \longrightarrow A$ commutes with $A {{{f_1} \atop \longrightarrow}\atop{\longrightarrow \atop {f_0}}} B$, but only the identity $A \longrightarrow A$ is the equalizer. This terminal/unique factoring condition is an essential part of the definition.
Now, let's see why that is the sheaf axiom. This won't be a wholly rigorous proof but it does give the key idea. Let's take a map $\{*\} \longrightarrow \prod F(U_i)$ such that the diagram $\{*\} \longrightarrow \prod F(U_i) {{{} \atop \longrightarrow}\atop{\longrightarrow \atop {}}} \prod F(U_i \cap U_j)$ commutes. A map from a singleton is just a choice of element, so call the image of $*$ under this map $(f_i)_{i \in I}$, where each $f_i \in F(U_i)$. Then the assumption of commutativity says that $(f_i)_{i \in I}$ maps to the same thing under both maps $\prod F(U_i) {{{} \atop \longrightarrow}\atop{\longrightarrow \atop {}}} \prod F(U_i \cap U_j)$. In other words, it means that $(f_i|_{U_i \cap U_j})_{i, j} = (f_j|_{U_i \cap U_j})_{i, j}$. This is the exact compatibility for the gluing assumption! Now, since we assumed that we had an equalizer diagram, we have the following commutative diagram:
$$
\require{AMScd}
\begin{CD}
F(U) @>>> \prod F(U_i)\\
@AAA @AidAA\\
\{*\} @>>> \prod F(U_i)
\end{CD}
$$
so let $f$ be the image of $*$ under $\{*\} \longrightarrow F(U)$. Then this says precisely that $f \mapsto (f_i)_{i \in I}$ under $F(U) \longrightarrow \prod F(U_i)$. In other words, $f|_{U_i} = f_i$ for all $i$. Thus, saying that that diagram was an equalizer allowed us to take a compatible sequence of sections $f_i \in F(U_i)$ and glue them to a section $f \in F(U)$. Now, this was not a whole proof. First of all, I didn't show uniqueness of the glued map $f$, but this comes from uniqueness of the induced map in the equalizer. I also only showed that equalizer $\implies$ sheaf (with the usual gluing definition). The converse isn't too bad once you have the main idea.
Now for your second description, you made an error in transcribing it. There should not be two maps $\prod F(U_i) {{{} \atop \longrightarrow}\atop{\longrightarrow \atop {}}} \prod F(U_i \cap U_j)$. Instead, you want one arrow that is the difference of these two maps. But of course, that means we need to sensibly define the difference of maps, so this works if $F$ is a sheaf of abelian groups (or more generally in some abelian category). This way we can make sense of what that $0$ at the start is and what exactness means. That is, we say that a diagram $A \xrightarrow{f} B \xrightarrow{g} C$ of abelian groups is exact if $im(f) = ker(g)$. And we say that a longer sequence is exact if each 3-long segment like this is exact. Now, let me return to equalizer for a moment. Recall that I said that the equalizer of $A {{{f_0} \atop \longrightarrow}\atop{\longrightarrow \atop {f_1}}} B$ is $\{a \in A : f_0(a) = f_1(a)\}$ along with the inclusion into $A$. Well, if these are both abelian groups, and if the $f_i$ are group homomorphisms, then this is precisely the kernel of $f_0 - f_1$. In other words, equalizers can be computed as kernels when you're working with abelian groups, so exactness of the sequence
$$
0 \longrightarrow F(U) \longrightarrow \prod F(U_i) \longrightarrow \prod F(U_i \cap U_j)
$$
will turn out to be the same as the equalizer diagram you drew.
This last question is a harder to answer, and my expertise here is not so great. The answer, in the case of sheaves of abelian groups, seems to be that kernels play nicely and cokernels do not because sheafification is a left adjoint to the forgetful functor from the category of sheaves to the category of presheaves. This means that the forgetful functor is left exact (preserves kernels), but is not generally right exact (preserves cokernels). This is probably hard to parse, and I'm afraid I don't know this well enough to give a more elementary answer, but at the very least here are two references that give some more information (but are still very abstract):
When the the presheaf of image of morphism of sheaves is a sheaf?
Why does sheafification functor being left adjoint imply that the presheaf kernel is a sheaf kernel