Can this equation be solved in polynomial time?

Question

I came across a more general form of this question.

Can we find the value of variables in polynomial time ?

Let $m = n^{2}$, there are $m$ variables ($x,y,z\ldots$) in the equation and these $m$ values can only take positive integer values ranging from 1 to $n$.

We have equation generator,
$$a\cdot f(x)+b\cdot f(y)+c\cdot f(z)+\cdots=k$$

for variables $x,y,z\in \{1,2\ldots, n\}$.

$f$ can be any thing such as $x, x^{2}, \log x$ etc.

Given that the constants are rational numbers and that there exists a solution, can we solve for the variables in polynomial time?

Update: Please note that I am not giving a single instance of $a\cdot f(x)+b\cdot f(y)+c\cdot f(z)+\cdots=k$, instead I am giving you the entire family of all possibilities of the above equation.

Basically I am giving you a black box, if you input $f$ it gives you corresponding $k$, given that $f$ is computable in polynomial time. We can do this multiple times. The goal is to solve for variables using this black box, in a polynomial time.

So we have:
1. The values of $m$ and $n$
2. The constants $a,b,c ...$
3. The equation generator $a\cdot f(x)+b\cdot f(y)+c\cdot f(z)+\cdots=k$

We can go on choosing different $f$ and the generator will give us corresponding equation, with a new $k$.

Goal: To solve for variables in the equation when $f(x) = x$

Update 2: For people wondering the origin of the question and its purpose. If the answer for this question is yes, then I think Sudoku can be solved in polynomial time. I tried to make use of the numerical property of Sudoku, that the sum of the entries of any row or column is known. By dealing each unknown cell as a variable we could treat each row and column as an equation. For a Sudoku of $n*n$ it would result in $2n$ equations with $n^2-k$ variables where $k$ is the number of already filled cells. This is an underdetermined system of equations. We could represent this in a matrix and it can be reduced to Echelon form. This can be repeated for sum of squares, cubes etc etc of the values. If we could solve this we could solve Sudoku.

If $f$ can really be anything, I would expect to meet some computability issues. Even assuming $f$ to be computable, computing $f(x)$ could take a very long time. I think we need some complexity assumption on $f(x)$. — chi, Mar 22 '19 at 14:08
We are free to choose f(x), but we will have to bear its cost. Therefore we have to by definition choose f(x) so that it is computable in polynomial time. — , Mar 22 '19 at 14:49
I am not sure how that would turn out to be. As long as computing f(x) is polynomial, and then solving for variables is polynomial, f(x) can be a irrational number. But I am not sure how that would affect the processing. I think it depends on what kind of f(x) we are dealing with — , Mar 22 '19 at 15:37
I already showed in my answer to your previous question that when $f$ goes over all natural powers, then it is NP-complete to decide whether a solution exists. This suggests that it is hard to find a solution even if one is guaranteed to exist. — Yuval Filmus, Mar 22 '19 at 15:41
@YuvalFilmus yes, but now we have the possibility for choosing $f(x)$ as $log(x), x-1, x+1, x^x$. So I was hoping to see if there was a clever way of finding variables by choosing different functions, and comparing with the resulting different $k$. That is why I said this was a generalized form of my previous quesiton — , Mar 22 '19 at 15:48
I don't understand your black box. What does it mean to choose $f(x)$ and get back $k$? Do you mean that I can choose any function $f$ I want, give it to the black box, and get back the value of $k$ (i.e., the value of $a \cdot f(x) + b \cdot f(y) + \dots$?) And I know $m,n,a,b,\dots$? [Note that there is a difference between $f$ and $f(x)$. $f$ is the function. $f(x)$ is the result of the function when evaluated at the input $x$. Since I don't know $x$, it is weird to talk about me choosing $f(x)$.] — D.W., Mar 22 '19 at 15:55
@VARUN.NRAO Once you give me the solution for $f(x) = x^\ell$ for enough values of $\ell$, I can compute the solution for any other $f$. You gain absolutely nothing by allowing arbitrary $f$. — Yuval Filmus, Mar 22 '19 at 15:57
You mention $f(x)=x^x$, but that is not computable in polynomial time. Further, $\log(x)$ usually contains an infinite amount of digits -- maybe we need to require that the first $k$ such digits are computable in polynomial time wrt $k$? I do realize that this sounds as if I am nitpicking, but before trying to tackle a question it is important to make it precise (which is often a non trivial task on its own!) — chi, Mar 22 '19 at 16:01
Regarding my previous comment, here is why it is hard to find a solution even if one is not guaranteed to exist. Suppose you had a polytime algorithm that finds a solution if one exists. Apply it to an arbitrary instance, stopping it when its running time in the promise version runs out. If the instance had a solution, you are guaranteed to find it, and can verify that you indeed found one. If the instance had no solution, naturally you won't find any (or the algorithm will report a solution which you can check isn't a solution). — Yuval Filmus, Mar 22 '19 at 16:01
@YuvalFilmus are you suggesting taylor series? but that would take quite long time. you cannot so easily compute $log(x)$ with $x^l$. And one more thing we actually don't know the variables, so I am not giving you solution only for $f(x)$, I am giving you weighted sum of the function of all variables. — , Mar 22 '19 at 16:02
@VARUN.NRAO No, I am suggesting something much simpler. In my answer to your previous question, I show how to find the total weight $w(i)$ of variables equal to $i$, for each $i \in {1,\ldots,n}$. Given that, you can easily compute $[f]$ for every function $f$ (in terms of $f(1),\ldots,f(n)$): it is just $\sum_{i=1}^n w(i) f(i)$. — Yuval Filmus, Mar 22 '19 at 16:03
@D.W. thanks for pointing that out, I will update it. As you rightly said we can only choose $f$ — , Mar 22 '19 at 16:04
In fact, by choosing $f_i(x) = 1_{x=i}$, you can directly find $w(i)$. By finding $w(1),\ldots,w(n)$, you can compute the value for an arbitrary $f$ on your own. — Yuval Filmus, Mar 22 '19 at 16:05
@YuvalFilmus that is correct. That is really a valuable insight — , Mar 22 '19 at 16:07
@chi $x^x$ was a mistake. But for $log(x)$ reasonable level approximation works as the solution set are integers. And regarding asking a precise quesiton, I agree 100% — , Mar 22 '19 at 16:17

D.W. · Accepted Answer · 2019-03-22T17:30:15.787

Summary

Your problem remains NP-hard, despite the fact that the algorithm can choose the function $f$. Consequently, no, you should not expect a polynomial-time algorithm.

The problem is at least as hard as the subset-sum problem. Fortunately, subset-sum is one of the easier NP-hard problems, so if your problem instance is small enough, you might be able to use algorithms for subset sum to solve your problem.

Problem statement

I'm going to reformulate your problem statement, so we are all on the same page:

Input: integers $m,n, a_1,\dots,a_m$; and access to an oracle, as follows:
Oracle: has secret values $x_1,\dots,x_m \in \{1,\dots,n\}$; when provided a function $f:\{1,2,\dots,n\}\to \mathbb{N}$, the oracle outputs the value $a_1 f(x_1) + \dots + a_m f(x_m)$
Goal: find $x_1,\dots,x_m$

An algorithm

You could use algorithms for subset-sum to help solve your problem, using the following strategy.

Let $\delta_1$ denote the function so that $\delta_1(1)=1$ and $\delta_1(j)=0$ for all $j \ne 1$. Send $\delta_1$ to the oracle, and call the result $k_1$, i.e., $k_1 = a_1 \delta_1(x_1) + \dots + a_m \delta_1(x_m)$. Notice that $k_1$ is a sum of the a subset of the $a_i$'s, namely, those where $x_i=1$. In other words, $k_1=\sum_{i \in S_1} a_i$ where $S_1=\{j : x_j=1\}$. So, feed $a_1,\dots,a_m$ and $k_1$ into an algorithm for the subset sum algorithm, and ask it to find a subset of the $a_i$'s that sums to $k_1$. If this subset sum problem has a unique solution, then the result will tell you which $x_i$'s have the value 1.

Next, let $\delta_2$ denote the function so that $\delta_2(2)=1$ and $\delta_2(j)=0$ for all $j \ne 2$. Send $\delta_2$ to the oracle, and call the result $k_2$. Do the same thing, using a subset-sum solver to find a subset of $a_i$'s that sum to $k_2$; if this solution is unique, that will be the set of $x_i$'s that have the value 2.

Repeat, for $\delta_1,\dots,\delta_n$, until you have learned all the value of all of the $x_i$'s.

If you're lucky, the subset-sum problem has a unique solution at each stage, and after solving $n$ subset-sum problems you can read off the values of the $x_i$'s. If you're unlucky, it has multiple solutions, and now you might need to consider all combinations of solutions to find a combination where the subsets are disjoint.

(Incidentally, once you've queried the oracle on $\delta_1,\dots,\delta_n$, you can predict the response of the oracle on any other function, so there's not much point in querying it again. This shows that $n$ queries to the oracle suffice; we'll never need more than that.)

The problem is NP-hard

Here is a reduction to show that the problem is as hard as subset sum. In particular, we'll show that the special case of your problem where $n=2$ is as hard as subset sum. Suppose we have a subset sum instance, namely numbers $a_1,\dots,a_m$ and a target $t$. We'll show how to use any algorithm for your problem to solve this subset sum instance.

We will implement a sneaky oracle, which behaves as follows: when provided a function $f:\{1,2\} \to \mathbb{N}$, it returns the value

$$f(1) \cdot (a_1 + \dots + a_m) + (f(2)-f(1)) \cdot t.$$

Notice that the sneaky oracle doesn't have a particular set of values $x_1,\dots,x_m$ in mind. Nonetheless, if there is a subset of the $a_i$'s that sums to $t$, the responses from the sneaky oracle would be equivalent to what we would obtain if we had an ordinary oracle where the $x_i$'s were 2 for all $i$ in that subset and 1 otherwise; and if there is no subset of the $a_i$'s that sums to $t$, the responses from the sneaky oracle are not equivalent to any ordinary oracle (for any value of the $x_i$'s).

Now, ask the purported algorithm for your problem to work with this sneaky oracle and find values for the $x_i$'s. If the algorithm for your problem finds a solution for the $x_i$'s, that yields a solution for the subset sum problem (simply include the $a_i$'s where $x_i=2$ in the subset). If the algorithm for your problem doesn't find a solution for the $x_i$'s, there is no subset of the $a_i$'s that sums to $t$.

(Strictly speaking, this shows the case where $n=2$ is hard. However, one can show that increasing $n$ does not make the problem any harder. Thus, the case where $n=\sqrt{m}$ is also hard.)

This is a great answer, as it shows why exactly it's NP-hard. I kind of intuitively knew this had to be NP, because if this was P then we could use this and solve Sudoku in P. — , Mar 22 '19 at 17:11

Can this equation be solved in polynomial time?

1 Answers1

Summary

Problem statement

An algorithm

The problem is NP-hard