Alternating Linear Minimization
TL;DR: This is an informal overview of our recent paper Alternating Linear Minimization: Revisiting von Neumann’s alternating projections by Gábor Braun, Sebastian Pokutta, and Robert Weismantel where we replace the projections in von Neumann’s alternating projections algorithm with linear minimizations, which are much cheaper, while maintaining essentially the same convergence rate.
Why you might care
In 1933, von Neumann established a fundamental algorithm to compute (the approximation of) a point within the intersection of two convex sets using an approach known as alternating projections [vN49]. This process involves sequential projections onto one set followed by the other. The feasibility of execution of this algorithm, however, is contingent upon the availability of (computationally efficient) projection operators for both convex sets.
In our paper [BPW22], our objective is to examine a scenario with more limited resources, specifically where access is restricted to linear minimization oracles over the convex sets. Linear minimization is often much cheaper than projection, in particular when we are a concerned with complicated constraints. We present a new algorithm tailored to these assumptions. Despite using the much simpler linear minimization oracles, our algorithm (approximately) identifies a point within the intersection of the two convex sets, while essentially maintaining the same convergence rate, generalizing von Neumann’s original result. Moreover, the algorithm can be made exact (i.e., exactly decide if $P \cap Q \neq \emptyset$) in the case of e.g., polytopes.
von Neumann’s algorithm and convergence
Let us briefly recall von Neumann’s original algorithm. It is really straightforward, alternatingly projecting on the respective sets:
von Neumann’s Alternating Projections (POCS).
Requirements: Point \(y_0 \in \RR^n\), \(\Pi_P\) projector onto \(P \subseteq \RR^n\) and \(\Pi_Q\) projector onto \(Q \subseteq \RR^n\).
Output: Iterates \(x_1, y_1, \dots\)
for \(t = 0, 1, 2, \dots\)
\(\quad x_{t+1} = Π_P(y_t)\)
\(\quad y_{t+1} = Π_Q(x_{t+1})\)
end for
Showing convergence for this algorithm follows a simple but powerful euclidean norm expansion argument; we reproduce the proof here for completeness. Suppose that \(u \in P \cap Q \neq \emptyset\). Then:
\[\begin{align*} \norm{y_t - u}^2 & = \norm{y_t - x_{t+1} + x_{t+1} - u}^2 = \norm{y_t - x_{t+1}}^2 + \norm{x_{t+1} - u}^2 - 2 \underbrace{\langle x_{t+1} - y_t, x_{t+1} - u \rangle}_{\leq 0} \\ & \geq \norm{y_t - x_{t+1}}^2 + \norm{x_{t+1} - u}^2 = \norm{y_t - x_{t+1}}^2 + \norm{x_{t+1} - y_{t+1} + y_{t+1} - u}^2 \\ & = \norm{y_t - x_{t+1}}^2 + \norm{x_{t+1} - y_{t+1}}^2 + \norm{y_{t+1} - u}^2 - 2 \underbrace{\langle y_{t+1} - x_{t+1}, y_{t+1} - u \rangle}_{\leq 0} \\ & \geq \norm{y_t - x_{t+1}}^2 + \norm{x_{t+1} - y_{t+1}}^2 + \norm{y_{t+1} - u}^2, \end{align*}\]where the non-positivity of the scalar products follows from the fact that these are precisely the first-order optimality conditions of the projection operation. The above can be rearranged to
\[\norm{y_t - u}^2 - \norm{y_{t+1} - u}^2 \geq \norm{y_t - x_{t+1}}^2 + \norm{x_{t+1} - y_{t+1}}^2.\]Now we are essentially done: Starting from
\[\norm{y_t - u}^2 - \norm{y_{t+1} - u}^2 \geq \norm{y_t - x_{t+1}}^2 + \norm{x_{t+1} - y_{t+1}}^2.\]1) We simply sum up:
\[\sum_{t = 0}^{T-1} \left(\norm{y_t - u}^2 - \norm{y_{t+1} - u}^2\right) \geq \sum_{t = 0}^{T-1} \left( \norm{y_t - x_{t+1}}^2 + \norm{x_{t+1} - y_{t+1}}^2 \right).\]2) This implies, through telescoping:
\[\norm{y_0 - u}^2 \geq \sum_{t = 0}^{T-1} \left( \norm{y_t - x_{t+1}}^2 + \norm{x_{t+1} - y_{t+1}}^2\right).\]3) We divide by $T$, obtaining:
\[\frac{\norm{y_0 - u}^2}{T} \geq \frac{1}{T} \sum_{t = 0}^{T-1} \left( \norm{y_t - x_{t+1}}^2 + \norm{x_{t+1} - y_{t+1}}^2 \right) \geq \norm{x_{T} - y_{T}}^2,\]as distances are non-increasing. This completes the convergence proof and (after a little more work) we obtain:
Proposition (von Neumann, 1949 + minor perturbations).
Let \(P\) and \(Q\) be compact convex sets with \(P \cap Q \neq \emptyset\) and let \(x_1, y_1, \dots, x_T, y_T \in \RR^n\) be the sequence of iterates of von Neumann’s algorithm. Then the iterates converge: \(x_{t} \to x\) and \(y_{t} \to y\) to some \(x \in P\) and \(y \in Q\). Furthermore, we have:
\[
\norm{x_{T} - y_{T}}^{2}
\leq
\frac{1}{T} \sum_{t = 0}^{T-1} \left(\norm{y_{t} - x_{t+1}}^{2} + \norm{x_{t+1} - y_{t+1}}^{2} \right)
\leq
\frac{\operatorname{dist}(y_0, P \cap Q)^{2}}{T}.
\]
Alternating Linear Minimizations
Now suppose we can access the feasible regions $P$ and $Q$ only by means of linear minimization oracles, i.e., we can compute only \(x \leftarrow \arg\min_{u \in P} \langle c, u \rangle\) and \(y \leftarrow \arg\min_{w \in Q} \langle c, w \rangle\), for any given $c \in \RR^n$. Then it turns out we can formulate an algorithm very close to von Neumann’s original algorithm, but only relying on linear minimizations as access to $P$ and $Q$:
Alternating Linear Minimizations (ALM).
Requirements: Points \(x_{0} \in P\), \(y_{0} \in Q\), LMO over \(P, Q \subseteq \RR^{n}\).
Output: Iterates \(x_1, y_1, \dots \in \RR^n\).
for \(t = 0, 1, 2, \dots\)
\(\quad u_{t} = \arg\min_{x \in P} \langle x_{t} - y_{t}, x \rangle\)
\(\quad x_{t+1} = x_{t} + \frac{2}{t+2} \cdot (u_{t} - x_{t})\)
\(\quad v_{t} = \arg\min_{y \in Q} \langle y_{t} - x_{t+1}, y \rangle\)
\(\quad y_{t+1} = y_{t} + \frac{2}{t+2} \cdot (v_{t} - y_{t})\)
end for
Effectively we are doing a single Frank-Wolfe step on each set and then repeat. We obtain the following convergence rates:
Proposition (Intersection of two sets).
Let \(P\) and \(Q\) be compact convex sets. Then ALM generates iterates \(z_t = \frac{1}{2} (x_t + y_t)\), such that:
\[
\max\{\operatorname{dist}(z_t, P)^2, \operatorname{dist}(z_t, Q)^2\} \leq \frac{\norm{x_{t} - y_{t}}^{2}}{4}
\leq \frac{(1 + 2 \sqrt{2}) (D_{P}^{2} + D_{Q}^{2})}{t+2} + \frac{\operatorname{dist}(P,Q)^{2}}{4},
\]
as primal convergence guarantee and also,
and
\[
\min_{1 \leq t \leq T} \max_{x \in P, y \in Q} \norm{x_{t} - y_{t}}^{2} - \langle x_{t} - y_{t}, x - y \rangle
\leq
\frac{6.75 (1 + 2 \sqrt{2})}{T + 2}
(D_{P}^{2} + D_{Q}^{2}),
\]
as dual convergence guarantee.
This guarantee can also be used to certify disjointness of $P$ and $Q$. Moreover, in the case of $P$ and $Q$ being polytopes, the algorithm and guarantee can be made exact in the sense of exactly deciding whether $P \cap Q \neq \emptyset$ (rather than doing so only approximately) by combination with linear programming; see [BPW22] for details.
For completeness let us briefly compare the obtained rates to those of von Neumann’s original algorithm:
Remark (Comparison to von Neumann’s alternating projection algorithm).
For simplicity, consider the case where \(P \cap Q \neq \emptyset\).
After minor reformulation, von Neumann’s alternating projection method yields:
\[
\min_{t=0, \dots, T-1} \max{\operatorname{dist}(z_{t+1}, P)^2, \operatorname{dist}(z_{t+1}, Q)^2} \leq \frac{\operatorname{dist}(y_0, P \cap Q)^{2}}{T}.
\]
Alternating Linear Minimization yields:
\[
\max{\operatorname{dist}(z_T, P)^2, \operatorname{dist}(z_T, Q)^2}
\leq \frac{(1 + 2 \sqrt{2}) (D_{P}^{2} + D_{Q}^{2})}{T+2}.
\]
As we can see, the convergence rate of ALM is essentially identical to that of von Neumann’s algorithm except for different ($P$ and $Q$ dependent) constants. The ALM algorithm also works quite well in actual computations but that is beyond the scope of that specific paper.
References
[vN49] Von Neumann, J. (1949). On rings of operators. Reduction theory. Annals of Mathematics, 401-485. [pdf]
[BPW22] Braun, G., Pokutta, S., & Weismantel, R. (2022). Alternating Linear Minimization: Revisiting von Neumann’s alternating projections. [arxiv].