Understanding the definition of quantum neural network of Abbas et al. 2020

Question

My Question based on this Paper https://arxiv.org/pdf/2011.00027.pdf "Power of Quantum Neural Networks" - Section 2.

So I know that there are different ways to implement Neural Networks into QNNs. In that paper, they used the QNN as a subclass from Variational Quantum Algorithm. So first we have a Feature-Map which is mapping our data by applying Hadamard gates and RZ gates.

My first question is, why are all the states 0 ($|0\rangle$)?
After the Feature Map they applying a variational model, where the parameters are chosen to minimize the loss function.

+1 and welcome to the site, but your question was getting close votes because you tried to ask more than one question in a single post. I've commented out everything after the first question. — user1271772 No more free time, Apr 01 '21 at 22:40

score 3 · Answer 1 · answered Mar 31 '21 at 21:13

Why are all the states 0 ?

Oversimplified, there are three main components to any quantum circuit: the input, the quantum function, and the output. QML research will usually fall into two buckets. In the first, we take a fixed quantum function and feed it different inputs, and see how the output varies. In the second, we take a fixed input and play around with the quantum function and see how the output varies. So when you see the circuit initialized with all 0's, you can think of the input state as the 'controlled variable' and the quantum function as the 'independent variable.'

What are we measuring?

On each training iteration, we measure each qubit's spin state in the chosen computational basis (usually the Z basis). We then interpret the combined output of all qubits as a classical piece of information. Here, this output is the QNNs 'prediction' associated with the given (feature-mapped) input data.

What exactly are we optimizing?

The body of a QNN consists of a feature-map component and a variational component. The former is where we 'encode' the input data. The ladder is where we store our trained parameters. These often take the form of varying degrees of X, Y, and/or Z rotations. On each training iteration, we evaluate the circuit many times for each input data vector all using the same variational parameters. We then construct a distribution based on the outputs of the circuit, and can then determine how accurate the parameters we chose were to representing the input/output relationship for this circuit design according to a chosen cost function. At the end of the training iteration, the parameters are adjusted (trained) in the direction that minimizes this cost function. This can be done through gradient descent, as you mentioned, SPSA, or many other classical algorithms.

score 3 · Answer 2 · answered Mar 31 '21 at 23:57

Typically when you use these kinds of variational circuits to do classification, your goal is to use the circuit to classify some input data $x\in\mathbb{R}^d$ with a decision function of the form $$\tag{1} f(x;\theta) = \langle \psi_x | \mathcal{G}^\dagger (\theta) \hat{M} \mathcal{G}(\theta)|\psi_x\rangle = \langle \hat{M} \rangle $$ where $\hat{M}$ is some observable that you're computing the expectation value for at the output of the circuit. In their specific implementation it looks like they're using a parity observable like $\hat{M} = \bigotimes_{k=1}^n Z_k$ that measures whether your observed bitstring had an even or odd number of zeros at each measurement. $f(x;\theta)$ has two inputs: a datapoint $x$ that you're trying to classify and the current choice of parameters $\theta$ that your classifier is parameterized by.

Also they choose to define \begin{align}\tag{2} |\psi_x \rangle = \mathcal{U}_x|0\rangle \\ \end{align}

which is some unitary $\mathcal{U}$ parameterized by a datapoint $x$.

To answer your questions

Why are the input states $|0\rangle$? Suppose the starting state was instead $|\phi\rangle$. If this is a valid quantum state then there must be some unitary $\mathcal{V}$ whose first column is $|\phi\rangle$, i.e. $|\phi\rangle = \mathcal{V}|0\rangle$. But since $\mathcal{U}_x$ is a generic unitary and the product of two unitaries is also unitary, we're free just absorb the fixed $\mathcal{V}$ into Equation (2) as follows $$ \mathcal{U}_x \rightarrow \mathcal{U}_x \mathcal{V} |0\rangle $$ so then we've recovered the situation where the input state is $|0\rangle$. No matter what (pure) state you decide to start from you can rearrange the circuit like this so you might as well define the circuit as starting from $|0\rangle$.
What are we measuring? As I said before they chose to measure the $n$-qubit parity operator for their specific setup. But I would argue that you could also view this circuit as measuring a parameterized version of $\hat{M}$, namely $$ \hat{M}(\theta) \equiv \mathcal{G}^\dagger (\theta) \hat{M} \mathcal{G}(\theta) $$ From this perspective, the classifier is trying to optimize a function of the form $$ f(x;\theta) = \langle M(\theta, x)\rangle = \langle \psi_x |\hat{M}(\theta)|\psi_x\rangle $$ by searching for an optimal measurement described by $\theta$ that classifies encoded states $|\psi_x\rangle$. I did not modify anything in the circuit - this is just a matter of perspective on the goal they're trying to acheive.
What exactly are they optimizing? Without digging through the paper, typically you're trying to minimize some empirical loss function that looks like $$\tag{3} \min_\theta \sum_{i} L(\langle M(\theta, x_i)\rangle, y_i) $$ where $L$ describes how different the label prediction for $\hat{y} = f(x_i) = \langle M(\theta, x_i)$ is from the actual label $y_i$, and the $i$ index all the points in the dataset. One good choice for loss function in binary classifiers is the ``hinge loss'' $L(f(x;\theta), y)) = 1 - f(x;\theta)y$, but again I did not check what specific loss function they are using. Either way, the goal is to find some $f(x; \theta)$ that correctly predicts the label $y$ for $x$ as often as possible, and solving Equation (3) is usually seen as a good best guess for that choice of $f$.

Understanding the definition of quantum neural network of Abbas et al. 2020

2 Answers2