# Improved local models and new Bell inequalities via Frank-Wolfe algorithms

*TL;DR: This is an informal overview of our recent paper Improved local models and new Bell inequalities via Frank-Wolfe algorithms by Sébastien Designolle, Gabriele Iommazzo, Mathieu Besançon, Sebastian Knebel, Patrick Gelß, and Sebastian Pokutta, where we use Frank-Wolfe algorithms to improve several non-locality constants.*

*Written by Sébastien Designolle.*

*Comment (Sebastian Pokutta):* This research arose out of peculiar conincidence with me preparing the (still unpublished!) follow-up blog post for Quantum Computing for the Uninitiated, which was about the Bell polytope and Sébastien visiting ZIB right at that time. It is funny how these things happen sometimes.

## Frank-Wolfe in Bell polytopes

Today we’re using Frank-Wolfe in a specific polytope arising from quantum information. This should give an introductory foretaste of our recent preprint.

Consider a bipartite scenario in which Alice and Bob, as the parties are usually referred to, receive questions and give answers. By repeating this many rounds, you can construct the joint probability characterising their strategy. If this strategy only makes use of classical resources, the resulting joint probability must be in the convex hull of deterministic strategies (those mapping the questions directly to the answers). Interestingly, if Alice and Bob are allowed to exploit the correlations arising from entangled quantum systems, they can get a point outside of this polytope. This result is the essence of Bell’s theorem, from a paper published in 1964 in an obscure journal [1]. Of course, his setup and phrasing are a bit different than the one just briefly sketched above, but the fundamental idea remains the same and led to an entire branch of research working on this so-called quantum nonlocality. This field was recently under the spotlights as the experimental demonstrations of Bell’s theorem were awarded the Nobel prize in physics in 2022.

### CHSH

But let’s give an example to be a bit more specific; as often this will be the celebrated CHSH inequality [2]. So we fix the number of questions to be two (labelled by \(x\) for Alice and \(y\) for Bob, both being 1 or 2) and the answers to be \(\pm1\). For simplicity of the presentation, we further assume that these answers are balanced (-1 and +1 are equally likely) so that the only things that count are the expectation values \(\langle A_xB_y\rangle\) of the product of Alice’s and Bob’s answers. Then the polytope we were mentioning before, called the local polytope and denoted \(\mathcal{L}\), has eight vertices:

\[\begin{pmatrix} - & -\\ - & - \end{pmatrix}\quad \begin{pmatrix} + & +\\ + & + \end{pmatrix}\quad \begin{pmatrix} - & +\\ - & + \end{pmatrix}\quad \begin{pmatrix} + & -\\ + & - \end{pmatrix}\quad \begin{pmatrix} - & -\\ + & + \end{pmatrix}\quad \begin{pmatrix} + & +\\ - & - \end{pmatrix}\quad \begin{pmatrix} - & +\\ + & - \end{pmatrix}\quad \begin{pmatrix} + & -\\ - & + \end{pmatrix},\]where the lines of these correlation matrices are labelled by \(x\) and the columns by \(y\). These deterministic strategies are denoted \(\mathbf{d}\) in the following.

With a quantum strategy whose detail does not really matter here, the following correlation matrix can be obtained:

\[\mathbf{p}=\frac{1}{\sqrt2}\begin{pmatrix} + & +\\ + & - \end{pmatrix},\]which is indeed outside of the local polytope. Actually we can go a bit further than this dichotomic criterion inside/outside and ask how far away from the polytope this point is, in the following sense: by moving along the line between the point and the center of the polytope (the zero matrix), there is a threshold at which we enter \(\mathcal{L}\). In the CHSH example, the corresponding parameter is \(v^\ast=1/\sqrt2\). To show this, we need two ingredients:

a) a separating hyperplane, called a Bell inequality, showing that any \(v > v^\ast\) leads to a point outside \(\mathcal{L}\):

\[\mathrm{tr}\left[\begin{pmatrix} + & +\\ + & - \end{pmatrix}\mathbf{d}\right]\leqslant2 \quad\text{while}\quad \mathrm{tr}\left[\begin{pmatrix} + & +\\ + & - \end{pmatrix}\mathbf{p}\right]=2\sqrt2.\]b) a convex decomposition of \(v^\ast \mathbf{p}\), called a local model, that explicitly proves that \(v^\ast\mathbf{p} \in\mathcal{L}\):

\[\frac{1}{\sqrt2}\mathbf{p}=\frac12\begin{pmatrix} + & +\\ + & - \end{pmatrix} =\frac14\left[ \begin{pmatrix} + & +\\ + & + \end{pmatrix}+ \begin{pmatrix} + & -\\ + & - \end{pmatrix}+ \begin{pmatrix} + & +\\ - & - \end{pmatrix}+ \begin{pmatrix} - & +\\ + & - \end{pmatrix} \right].\]### Nonlocality threshold

From a physical perspective, this quantity \(v^\ast\) also has a meaning: it corresponds to the robustness of the quantum strategy to white noise on the shared quantum state. And, even if the actual experiment naturally gets way more difficult to implement as the number \(m\) of measurements increases, the question of finding Bell inequalities with high robustnesses (that is, low \(v^\ast\)) is of important theoretical relevance. So our main problem is the following: find the noise threshold at which our two-qubit state (a so-called singlet state whose form is irrelevant for the discussion here) cannot be used to observe nonlocality, no matter how many (projective) measurements we use.

At this game, the CHSH inequality described above, published in 1969, is a good contender. Actually, the question of finding a better inequality remained open until 2008, where an inequality with \(v^\ast<1/\sqrt2\) was found, with 465 possible measurements! With this example, you can probably foresee why our general question is so hard: the number \(m\) of measurements is not fixed, and the optimal robustness can only happen in the limit of an infinite number of them. And the structure of the local polytope gets complex very quickly with \(m\): it has \(2^{2m-1}\) vertices in a space of dimension \(m^2\), so that even enumerating all vertices is very quickly impossible. Note that a complete list of its facets is only known up to \(m=4\).

### Frank-Wolfe algorithms

This is where Frank-Wolfe (FW) algorithms come handy, in that it gives a way to construct:

- a sparse decomposition of a point inside \(\mathcal{L}\),
- a separating hyperplane for a point outside \(\mathcal{L}\).

The idea is to solve, for an initial visibility \(v_0\), the following minimisation problem:

\[\min_{\mathbf{x}\in\mathcal{L}}\frac{1}{2}\|\mathbf{x}-v_0\mathbf{p}\|_2^2,\]which converges to \(\mathbf{x}\) if \(\mathbf{x} \in \mathcal{L}\) and to its orthogonal projection onto \(\mathcal{L}\) otherwise.

This approach in itself is not particularly new as it was already used in 2016 for this specific problem. What is new, however, is to use an efficient algorithm to drastically accelerate the convergence of previous works relying of vanilla FW (and thus suffering from zigzagging). In our case, we implemented the blended pairwise conditional gradient algorithm [3]. This speeds up things immensely, as the costly Linear Minimisation Oracle (LMO) is not called at each step but only when the progress achievable by means of previously computed atoms becomes too low.

### Linear Minimisation Oracle

In our case, the LMO looks like

\[\mathrm{max}_{\vec{a},\vec{b} \in \{\pm1\}^m} \sum_{x,y \in [m]} a_x M_{xy} b_y,\]where \(\{M_{xy}\}_{xy}\) is the direction along which we want to minimise in the local polytope. From a physical perspective, this “simply” amounts to finding the local bound of a Bell inequality encoded by the coefficients \(M_{xy}\). However, this problem is NP-hard and going beyong \(m\approx100\) is extremely challenging.

Fortunately though, a good enough heuristic is all we need in general. It turns out that starting from a random choice of \(\vec{b}\) and alternatately minimising over \(\vec{a}\) and \(\vec{b}\) gives rather good solutions, at least when doing this process many times (a few thousands in practice). There is absolutely no guarantee of optimality, but as soon as FW can use this heuristic LMO to make progress, this approach is sufficient in the case \(\mathbf{x} \in \mathcal{L}\). Why not when \(\mathbf{x} \not\in \mathcal{L}\)? Because it is necessary to solve at least one instance to optimality in order to obtain a valid separating hyperplane.

For this last problem, we also came with a solution to go a bit beyond existing methods: by reformulating the LMO as a Quadratic Unconstrained Binary Optimisation (QUBO) instance, we could use a recently developed solver [4], which is unfortunately closed source, to solve to optimality an instance with \(m=97\) measurements.

### Results

With the overview of our methodology done above we can state the results obtained in the bipartite case, that is, with correlation matrices as presented above. You’re most likely not familiar with the problem so it’s worth giving some historical context on the progress made on both lower and upper bounds over the years.

\(v_c^\mathrm{Wer}\) | \(m\) | Year |
---|---|---|

0.7071 | 2 | 1969 |

0.7056 | 465 | 2008 |

0.7054 | \(\infty\) | 2015 |

0.7012 | 42 | 2016 |

0.6964 | 90 | 2017 |

0.6961 |
97 | Our |

0.6875 |
\(406\sim\infty\) | work |

0.6829 | \(625\sim\infty\) | 2017 |

0.6595 | \(\infty\) | 2006 |

0.5 | \(\infty\) | 1989 |

A few comments about the lower bounds are in order: they contain an extra step (not explained here) that goes from a finite number of measurements to all projective measurements.
The idea is that we can simulate the former by means of the latter, up to some factor that gets closer to 1 as the number of measurements increases.
Part of the improvements that we brought to the problem came from the choice of these measurements; this explains, by the way, why we use less measurements than the previous lower bound while still beating it.
Without going into details here, we used symmetric polyhedra in the Bloch sphere; they can be easily constructed and visualised here (try, for instance, the recipe `Au3I`

which gives an idea of the kind of structures that we used).

We don’t present the multipartite results here as most of you don’t care, but the entire approach naturally generalises to correlation tensors of arbitrary order (meaning any number of parties). And the known results in this direction were quite scarce, so that our bounds provide way more insight than the tightening (arguably minute to some extent) presented in the table above. For instance, we give the first proof that, for three-qubit states, the W state is less nonlocal than the GHZ state.

### References

[1] John Bell. *On the Einstein-Podolsky-Rosen paradox*. Physics Physique Fizika **1**, 195-200 (1964).

[2] Clauser, Horne, Shimony, Holt. *Proposed experiment to test local hidden-variable theories*. Phys. Rev. Lett. **23**, 880 (1969).

[3] Tsuji, Tanaka, Pokutta. *Sparser kernel herding with pairwise conditional gradients without swap steps*. arXiv:2110.12650 (2021).

[4] Rehfeldt, Koch, Shinano. *Faster exact solution of sparse MaxCut and QUBO problems*. arXiv:2202.02305 (2022).