No, this problem is not hard. Here is algebraic reformulation which leads to efficient key recovery.
Ring representation
Let $R = \mathbb{F}_p[X]/(X^4-1)$. We identify any $a \in S$ with an element of $R$ as follows:
$$
R(a) = a_0 + a_1X+a_2X^2+a_3X^3.
$$
From now on assume that we work with elements of $R$.
Let $rev$ denote the polynomial reciprocal function:
$$
rev(R(a)) = a_3 + a_2X+a_1X^2+a_0X^3.
$$
Then, it is easy to verify that for any $a,b \in R$
$$
rev(a\cdot b) = X\cdot rev(a) \cdot rev(b)
$$
and that
$$
f(a,b) = rev(a\cdot rev(b)) = X \cdot rev(a) \cdot b.
$$
Using the simple fact that $rev(X^i a)=X^{-i}rev(a)$, we can reduce the result of any sequence of operations of $f$ to the form
$$
s_i(k_1,k_2) = X^j \cdot k_1^x\cdot rev(k_1)^y\cdot k_2^z\cdot rev(k_2)^w,
$$
where $i,j,x,y,z,w$ are known integers (depending on the number of iterations $i$).
Note that we are given two such values $s_i, s_{i+1}$.
This settles simple (up to $rev$) algebraic formulation of the problem.
Clearly, without $rev$ we would be given a pair $(k_1^xk_2^y, k_1^zk_2^w)$ and it would be easy to recover $k_1,k_2$, excluding corner cases when the matrix
$$
\pmatrix{x & y \\ z & w}
$$
is not invertible modulo $p-1$, in which case we'll have to deal with factorization of $p-1$.
Recovering linear components
Observe that $X^4-1 = (X-1)(X+1)(X^2+1)$. We will consider $s_i \in R$ modulo each of the linear factors and two cases for the last factor depending on whether it splits or not.
Case of $X-1$
Reduction modulo $(X-1)$ is equivalent to computing the trace of the vector: $tr(a) = a_0+a_1+a_2+a_3$. It is easy to see (e.g. by plugging $X=1$ in the polynomial form of $a,b$) that
$$
tr(f(a,b)) = tr(a)\cdot tr(b).
$$
As a result,
$$
tr(s_i) = tr(k_1)^{F_{i-1}}tr(k_2)^{F_i},
$$
where $F_{-1},F_0,F_1,F_2,\ldots = 1,0,1,1,2,\ldots$ are the Fibonacci numbers.
Given $s_i, s_{i+1}$, we can "undo" the iterations via divisions like
$$
tr(s_{i-1})=tr(s_{i+1})/tr(s_i)
$$
and recover
$$
tr(k_1), tr(k_2).
$$
This amounts to inverting the matrix described above
$$
\pmatrix{x' & y' \\ z' & w'} = \pmatrix{x & y \\ z & w}^{-1}
$$
and computing
$$
tr(k_1) = tr(s_i)^{x'}tr(s_{i+1})^{y'},
tr(k_2) = tr(s_i)^{z'}tr(s_{i+1})^{w'}.
$$
Case of $X+1$
Similarly, let $tr'(a) = a_0 - a_1 + a_2 - a_3$. Then we also have
$$
tr'(f(a,b)) = tr'(a)\cdot tr'(b).
$$
The recovery of $tr'(k_1), tr'(k_2)$ is analogous to the case of $tr$, except that we have to track the sign, since $tr'(X) =-1$.
Case when $X^2+1$ splits ($p=4k+1$)
If $p=4k+1$, then $X^2+1$ splits into $(X-i)(X+i)$ (thanks @Mark).
However, reduction modulo $(X-i)$ or $(X+i)$ does not behave very well with respect to $rev$. Let $a = a_0 + a_1X + a_2X^2 + a_3X^3$. Then,
$$
a \mod (X-i) = (a_0 - a_2) + i(a_1 - a_3),
$$
$$
rev(a)\cdot i \mod (X-i) = (a_0 - a_2) - i(a_1 - a_3).
$$
We see that $a$ and $rev(a)\cdot i$ are "complex conjugates" of each other (note that $i$ is actually not imaginary but lies in $\mathbb{F}_p$).
The problem with previous approach is that $a$ and $rev(a)$ are not related by a constant multiplication and so each $s_i$ has 4 variables (whereas previously $k_1$ and $rev(k_1)$ collapsed into one variable).
Now,
$$
a \mod (X+i) = (a_0 - a_2) - i(a_1 - a_3),
$$
$$
rev(a)\cdot (-i) \mod (X+i) = (a_0 - a_2) + i(a_1 - a_3).
$$
Observe that the four expressions $a \mod (X+i), rev(a) \mod (X+i), a \mod (X-i), rev(a) \mod (X-i)$ contain only 2 unique variables. Therefore, $s_i \mod (X-i), s_i \mod (X+i), s_{i+1} \mod (X-i), s_{i+1} \mod (X+i)$ give four equations on the exponents of four variables (a pair of "conjugates" for each of $k_1,k_2$). Therefore, we can invert the respective 4x4 matrix and recover $k_1,k_2$ modulo $(X-i)$ and $(X+i)$.
Now that we have the solution modulo each linear factor, we can reconstruct the keys using the Chinese Remainder Theorem (CRT). This is a complete break for the case $p=4k+1$.
See SageMath implementation of the attack.
Case when $X^2+1$ does not split ($p=4k+3$)
In this case apply the same technique to recover the complex norm of $k_1,k_2$ modulo $X^2+1$, which is multiplicative.
Further, we reuse the previously noted fact that
$$
X\cdot rev(a) \mod{X^2+1}
$$
is the complex conjugate of $a \mod {X^2+1}$. In particular,
$$
X\cdot rev(a) \cdot a \mod{X^2+1} = norm_{X^2+1}(a)
$$
Recall that we have values of the form
$$
s_i(k_1,k_2) = X^j \cdot k_1^x\cdot rev(k_1)^y\cdot k_2^z\cdot rev(k_2)^w.
$$
Working modulo $X^2+1$, we can replace $rev(k_1)$ by $norm_{X^2+1}(k_1)/X/k_1$. Similarly with $k_2$. Since the norm is recovered,
we end up with equations of the form
$$
u_1 = k_1^{x'}\cdot k_2^{y'}
$$
$$
u_2 = k_1^{z'}\cdot k_2^{w'},
$$
where $u_1,u_2,x',z',y',w'$ are known. This is completely similar to previous attacks, and so we recover $k_1$ and $k_2$ modulo $X^2+1$.
After combining the components using CRT, we recover $k_1,k_2$ completely.
The scheme is broken.
UPD: I just realized that I was recovering $k_1,k_2$ from $s_i, s_{i+1}$ which is trivial to do in any representation by inverting each step. However, for $r_1=f(s_i,k_1), r_2=f(s_{i+1},k_2)$ the attack still works by inversion of the relevant exponent matrix $[[x,y],[z,w]]$ modulo element order of the multiplicative group of the ring (which is $p-1$ for $p=4k+1$ and $p^2-1$ for $p=4k+3$). The only caveat is that in the latter case the matrix is not always invertible modulo each factor of the order. I suppose that the solution is not unique in such cases.