## Posts

### Accelerated and Sparse Algorithms for Approximate Personalized PageRank and Beyond [research]

*TL;DR: This is an informal discussion of our recent paper Accelerated and Sparse Algorithms for Approximate Personalized PageRank and Beyond by David Martinez-Rubio, Elias Wirth, and Sebastian Pokutta. We answer the open question to the COLT 2022 open problem by Fountoulakis and Yang [FY] in the affirmative, presenting sparse accelerated optimization algorithms for the $\ell_1$-regularized personalized PageRank problem.*### Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging [research]

*TL;DR: This is an informal summary of our recent paper Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging by Max Zimmer, Christoph Spiegel, and Sebastian Pokutta. Recent work [WM] demonstrated that generalization performance can be significantly improved by merging multiple models fine-tuned from the same base model. However, averaging sparse models leads to a decrease in overall sparsity, due to different sparse connectivities. We discovered that by exploring a single retraining phase of Iterative Magnitude Pruning (IMP) [HPTD] with various hyperparameters, we can create models suitable for averaging while retaining the same sparse connectivity from the pruned base model. Building on this idea, we introduce Sparse Model Soups (SMS), an IMP-variant where every prune-retrain cycle starts with the averaged model from the previous phase. SMS not only maintains the sparsity pattern but also offers full modularity and parallelizability, and substantially enhances the performance of IMP.*### How I Learned to Stop Worrying and Love Retraining [research]

*TL;DR: This is an informal summary of our recent ICLR2023 paper How I Learned to Stop Worrying and Love Retraining by Max Zimmer, Christoph Spiegel, and Sebastian Pokutta, in which we reassess Iterative Magnitude Pruning (IMP) [HPTD]. Recent works [RFC, LH] demonstrate how the learning rate schedule during retraining crucially influences recovery from pruning-induced model degradation. We extend these findings, proposing a linear learning rate schedule with an adaptively chosen initial value, which we find to significantly improve post-retraining performance. We also challenge commonly held beliefs of IMP’s performance and efficiency by introducing BIMP, a budgeted IMP-variant. Our study not only enhances understanding of the retraining phase, but it also questions the prevailing belief that we should strive to avoid the need for retraining.*### Alternating Linear Minimization [research]

*TL;DR: This is an informal overview of our recent paper Alternating Linear Minimization: Revisiting von Neumann’s alternating projections by Gábor Braun, Sebastian Pokutta, and Robert Weismantel where we replace the projections in von Neumann’s alternating projections algorithm with linear minimizations, which are much cheaper, while maintaining essentially the same convergence rate.*### Improved local models and new Bell inequalities via Frank-Wolfe algorithms [research]

*TL;DR: This is an informal overview of our recent paper Improved local models and new Bell inequalities via Frank-Wolfe algorithms by Sébastien Designolle, Gabriele Iommazzo, Mathieu Besançon, Sebastian Knebel, Patrick Gelß, and Sebastian Pokutta, where we use Frank-Wolfe algorithms to improve several non-locality constants.*### Sh**t you can do with the euclidean norm [research]

*TL;DR: Some of my favorite arguments all following from a simple expansion of the euclidean norm and averaging.*### Monograph on Conditional Gradients and Frank-Wolfe methods [research]

*TL;DR: Finally we finished our monograph on Conditional Gradients and Frank-Wolfe methods.*### Boscia.jl - a new Mixed-Integer Convex Programming (MICP) solver [research]

*TL;DR: This is an informal overview of our new Mixed-Integer Convex Programming (MICP) solver Julia package*`Boscia.jl`

and associated preprint Convex integer optimization with Frank-Wolfe methods by Deborah Hendrych, Hannah Troppens, Mathieu Besançon, and Sebastian Pokutta.### Acceleration of Frank-Wolfe algorithms with open loop step-sizes [research]

*TL;DR: This is an informal discussion of our recent paper Acceleration of Frank-Wolfe algorithms with open loop step-sizes by Elias Wirth, Thomas Kerdreux, and Sebastian Pokutta. In the paper, we study accelerated convergence rates for the Frank-Wolfe algorithm (FW) with open loop step-size rules, characterize settings for which FW with open loop step-size rules is non-asymptotically faster than FW with line search or short-step, and provide a partial answer to an open question in kernel herding.*### Pairwise Conditional Gradients without Swap Steps [research]

*TL;DR: This is an informal summary of our recent article Sparser Kernel Herding with Pairwise Conditional Gradients without Swap Steps by Kazuma Tsuji, Ken’ichiro Tanaka, and Sebastian Pokutta, which was accepted at ICML 2022. In this article we present a modification of the pairwise conditional gradient algorithm that removes the dreaded swap steps. The resulting algorithm can even be applied to infinite dimensional feasible regions and promotes high levels of sparsity, which is useful in applications such as e.g., kernel herding.*### Quantum Computing for the Uninitiated: The Basics [research]

*TL;DR: Cheat Sheet for Quantum Computing. Target audience is non-physicists and no physics background required. This is the very first post in the series presenting the very basics to get started. Long and technical.*### Conditional Gradients for the Approximately Vanishing Ideal [research]

*TL;DR: This is an informal discussion of our recent paper Conditional Gradients for the Approximately Vanishing Ideal by Elias Wirth and Sebastian Pokutta. In the paper, we present a new algorithm, the Conditional Gradients Approximately Vanishing Ideal algorithm (CGAVI), for the construction of a set of generators of the approximately vanishing ideal of a finite data set $X \subseteq \mathbb{R}^n$. The novelty of our approach is that CGAVI constructs the set of generators by solving instances of convex optimization problems with the Pairwise Frank-Wolfe algorithm (PFW).*### Fast algorithms for fair packing and its dual [research]

*TL;DR: This is an informal summary of our recent article Fast Algorithms for Packing Proportional Fairness and its Dual by Francisco Criado, David Martínez-Rubio, and Sebastian Pokutta. In this article we present a distributed, accelerated and width-independent algorithm for the $1$-fair packing problem, which is the proportional fairness problem $\max_{x\in\mathbb{R}^n_{\geq 0}} \sum_i \log(x_i)$ under positive linear constraints, also known as the _packing proportional fairness_ problem. We improve over the previous best solution [DFO20] by means of acceleration. We also study the dual of this problem, and give a Multiplicative Weights (MW) based algorithm making use of the geometric particularities of the problem. Finally, we study a connection to the Yamnitsky-Levin simplices algorithm for general purpose linear feasibility and linear programming.*### Simple steps are all you need [research]

*TL;DR: This is an informal summary of our recent paper, to appear in NeurIPS’21, Simple steps are all you need: Frank-Wolfe and generalized self-concordant functions by Alejandro Carderera, Mathieu Besançon, and Sebastian Pokutta, where we present a monotonous version of the Frank-Wolfe algorithm, which together with the simple step size $\gamma_t = 2/(2+t)$ achieves a $\mathcal{O}\left( 1/t \right)$ convergence in primal gap and Frank-Wolfe gap when minimizing Generalized Self-Concordant (GSC) functions over compact convex sets.*### New(!!) NeurIPS 2021 competition: Machine Learning for Discrete Optimization (ML4CO) [news]

*TL;DR: This year at NeurIPS 2021 there is brand new competition: improving integer programming solvers by machine learning. The learning problems are quite different than your usual suspects requiring non-standard learning approaches.*### Learning to Schedule Heuristics in Branch and Bound [research]

*TL;DR: This is an informal discussion of our recent paper Learning to Schedule Heuristics in Branch and Bound by Antonia Chmiela, Elias Khalil, Ambros Gleixner, Andrea Lodi, and Sebastian Pokutta. In this paper, we propose the first data-driven framework for scheduling heuristics in a MIP solver. By learning from data describing the performance of primal heuristics, we obtain a problem-specific schedule of heuristics that collectively find many solutions at minimal cost. We provide a formal description of the problem and propose an efficient algorithm for computing such a schedule.*### FrankWolfe.jl: A high-performance and flexible toolbox for Conditional Gradients [research]

*TL;DR: We present $\texttt{FrankWolfe.jl}$, an open-source implementation in Julia of several popular Frank-Wolfe and Conditional Gradients variants for first-order constrained optimization. The package is designed with flexibility and high-performance in mind, allowing for easy extension and relying on few assumptions regarding the user-provided functions. It supports Julia’s unique multiple dispatch feature, and interfaces smoothly with generic linear optimization formulations using $\texttt{MathOptInterface.jl}$.*### Linear Bandits on Uniformly Convex Sets [research]

*TL;DR: This is an informal summary of our recent paper Linear Bandit on Uniformly Convex Sets by Thomas Kerdreux, Christophe Roux, Alexandre d’Aspremont, and Sebastian Pokutta. We show that the strong convexity of the action set $\mathcal{K}\subset\mathbb{R}^n$ in the context of linear bandits leads to a gain of a factor of $\sqrt{n}$ in the pseudo-regret bounds. This improvement was previously known in only two settings: when $\mathcal{K}$ is the simplex or an $\ell_p$ ball with $p\in]1,2]$ [BCY]. When the action set is $q$-uniformly convex (with $q\geq 2$) but not necessarily strongly convex, we obtain pseudo-regret bounds of the form $\mathcal{O}(n^{1/q}T^{1/p})$ (with $1/p+1/q=1$), i.e., with a dimension dependency smaller than $\sqrt{n}$.*### CINDy: Conditional gradient-based Identification of Non-linear Dynamics [research]

*TL;DR: This is an informal summary of our recent paper CINDy: Conditional gradient-based Identification of Non-linear Dynamics – Noise-robust recovery by Alejandro Carderera, Sebastian Pokutta, Christof Schütte and Martin Weiser where we propose the use of a Conditional Gradient algorithm (more concretely the Blended Conditional Gradients [BPTW] algorithm) for the sparse recovery of a dynamic. In the presence of noise, the proposed algorithm presents superior sparsity-inducing properties, while ensuring a higher recovery accuracy, compared to other existing methods in the literature, most notably the popular SINDy [BPK] algorithm, based on a sequentially-thresholded least-squares approach.*### DNN Training with Frank–Wolfe [research]

*TL;DR: This is an informal discussion of our recent paper Deep Neural Network Training with Frank–Wolfe by Sebastian Pokutta, Christoph Spiegel, and Max Zimmer, where we study the general efficacy of using Frank–Wolfe methods for the training of Deep Neural Networks with constrained parameters. Summarizing the results, we (1) show the general feasibility of this markedly different approach for first-order based training of Neural Networks, (2) demonstrate that the particular choice of constraints can have a drastic impact on the learned representation, and (3) show that through appropriate constraints one can achieve performance exceeding that of unconstrained stochastic Gradient Descent, matching state-of-the-art results relying on $L^2$-regularization.*### Projection-Free Adaptive Gradients for Large-Scale Optimization [research]

*TL;DR: This is an informal summary of our recent paper Projection-Free Adaptive Gradients for Large-Scale Optimization by Cyrille Combettes, Christoph Spiegel, and Sebastian Pokutta. We propose to improve the performance of state-of-the-art stochastic Frank-Wolfe algorithms via a better use of first-order information. This is achieved by blending in adaptive gradients, a method for setting entry-wise step-sizes that automatically adjust to the geometry of the problem. Computational experiments on convex and nonconvex objectives demonstrate the advantage of our approach.*### Accelerating Domain Propagation via GPUs [research]

*TL;DR: This is an informal discussion of our recent paper Accelerating Domain Propagation: an Efficient GPU-Parallel Algorithm over Sparse Matrices by Boro Sofranac, Ambros Gleixner, and Sebastian Pokutta. In the paper, we present a new algorithm to perform domain propagation of linear constraints on GPUs efficiently. The results show that efficient implementations of Mixed-integer Programming (MIP) methods are possible on GPUs, even though the success of using GPUs in MIPs has traditionally been limited. Our algorithm is capable of performing domain propagation on the GPU exclusively, without the need for synchronization with the CPU, paving the way for the usage of this algorithm in a new generation of MIP methods that run on GPUs.*### Join CO@Work and EWG-POR – online and for free! [news]

*TL;DR: Announcement for CO@WORK and EWG-POR. Fully online and participation is free.*### Projection-Free Optimization on Uniformly Convex Sets [research]

*TL;DR: This is an informal summary of our recent paper Projection-Free Optimization on Uniformly Convex Sets by Thomas Kerdreux, Alexandre d’Aspremont, and Sebastian Pokutta. We present convergence analyses of the Frank-Wolfe algorithm in settings where the constraint sets are uniformly convex. Our results generalize different analyses of [P], [DR], [D], and [GH] when the constraint sets are strongly convex. For instance, the $\ell_p$ balls are uniformly convex for all $p > 1$, but strongly convex for $p\in]1,2]$ only. We show in these settings that uniform convexity of the feasible region systematically induces accelerated convergence rates of the Frank-Wolfe algorithm (with short steps or exact line-search). This shows that the Frank-Wolfe algorithm is not just adaptive to the sharpness of the objective [KDP] but also to the feasible region.*### Second-order Conditional Gradient Sliding [research]

*TL;DR: This is an informal summary of our recent paper Second-order Conditional Gradient Sliding by Alejandro Carderera and Sebastian Pokutta, where we present a second-order analog of the Conditional Gradient Sliding algorithm [LZ] for smooth and strongly-convex minimization problems over polytopes. The algorithm combines Inexact Projected Variable-Metric (PVM) steps with independent Away-step Conditional Gradient (ACG) steps to achieve global linear convergence and local quadratic convergence in primal gap. The resulting algorithm outperforms other projection-free algorithms in applications where first-order information is costly to compute.*### On the unreasonable effectiveness of the greedy algorithm [research]

*TL;DR: This is an informal summary of our recent paper On the Unreasonable Effectiveness of the Greedy Algorithm: Greedy Adapts to Sharpness with Mohit Singh, and Alfredo Torrico, where we adapt the sharpness concept from convex optimization to explain the effectiveness of the greedy algorithm for submodular function maximization.*### An update on SCIP [news]

*TL;DR: A quick update on what is on the horizon for SCIP.*### Psychedelic Style Transfer [research]

*TL;DR: We point out how to make psychedelic animations from discarded instabilities in neural style transfer. This post builds upon a remark we made in our recent paper Interactive Neural Style Transfer with Artists. In this paper, we questioned several simple evaluation aspects of neural style transfer methods. Also, it is our second series of interactive painting experiments where style transfer outputs constantly influence a painter, see the other series here. See also our medium post.*### Boosting Frank-Wolfe by Chasing Gradients [research]

*TL;DR: This is an informal summary of our recent paper Boosting Frank-Wolfe by Chasing Gradients by Cyrille Combettes and Sebastian Pokutta, where we propose to speed-up the Frank-Wolfe algorithm by better aligning the descent direction with that of the negative gradient. This is achieved by chasing the negative gradient direction in a matching pursuit-style, while still remaining projection-free. Although the idea is reasonably natural, it produces very significant results.*### Non-Convex Boosting via Integer Programming [research]

*TL;DR: This is an informal summary of our recent paper IPBoost – Non-Convex Boosting via Integer Programming with Marc Pfetsch, where we present a non-convex boosting procedure that relies on integer programing. Rather than solving a convex proxy problem, we solve the actual classification problem with discrete decisions. The resulting procedure achieves performance at par or better than Adaboost however it is robust to label noise that can defeat convex potential boosting procedures.*### Approximate Carathéodory via Frank-Wolfe [research]

*TL;DR: This is an informal summary of our recent paper Revisiting the Approximate Carathéodory Problem via the Frank-Wolfe Algorithm with Cyrille W Combettes. We show that the Frank-Wolfe algorithm constitutes an intuitive and efficient method to obtain a solution to the approximate Carathéodory problem and that it also provides improved cardinality bounds in particular scenarios.*### SCIP x Raspberry Pi: SCIP on Edge [random]

*TL;DR: Running SCIP on a Raspberry Pi 4 with relatively moderate performance losses (compared to a standard machine) of a factor of 3-5 brings Integer Programming into the realm of Edge Computing.*### Universal Portfolios: how to (not) get rich [research]

*TL;DR: How to (not) get rich? Running Universal Portfolios online with Online Convex Optimization techniques.*### Toolchain Tuesday No. 6 [random]

*TL;DR: Part of a series of posts about tools, services, and packages that I use in day-to-day operations to boost efficiency and free up time for the things that really matter. This time around will be about privacy tools. Use at your own risk - happy to answer questions. For the full, continuously expanding list so far see here.*### Conditional Gradients and Acceleration [research]

*TL;DR: This is an informal summary of our recent paper Locally Accelerated Conditional Gradients [CDP] with Alejandro Carderera and Jelena Diakonikolas, showing that although optimal global convergence rates cannot be achieved for Conditional Gradients [CG], acceleration can be achieved after a burn-in phase independent of the accuracy $\epsilon$, giving rise to asymptotically optimal rates, accessing the feasible region only through a linear minimization oracle.*### Cheat Sheet: Acceleration from First Principles [research]

*TL;DR: Cheat Sheet for a derivation of acceleration from optimization first principles.*### Blended Matching Pursuit [research]

*TL;DR: This is an informal summary of our recent paper Blended Matching Pursuit with Cyrille W. Combettes, showing that the blending approach that we used earlier for conditional gradients can be carried over also to the Matching Pursuit setting, resulting in a new and very fast algorithm for minimizing convex functions over linear spaces while maintaining sparsity close to full orthogonal projection approaches such as Orthogonal Matching Pursuit.*### Sharpness and Restarting Frank-Wolfe [research]

*TL;DR: This is an informal summary of our recent paper Restarting Frank-Wolfe with Alexandre D’Aspremont and Thomas Kerdreux, where we show how to achieve improved convergence rates under sharpness through restarting Frank-Wolfe algorithms.*### Cheat Sheet: Subgradient Descent, Mirror Descent, and Online Learning [research]

*TL;DR: Cheat Sheet for non-smooth convex optimization: subgradient descent, mirror descent, and online learning. Long and technical.*### Mixing Frank-Wolfe and Gradient Descent [research]

*TL;DR: This is an informal summary of our recent paper Blended Conditional Gradients with Gábor Braun, Dan Tu, and Stephen Wright, showing how mixing Frank-Wolfe and Gradient Descent gives a new, very fast, projection-free algorithm for constrained smooth convex minimization.*### The Zeroth World [random]

*TL;DR: On the impact of AI on society and economy and its potential to enable a zeroth world with unprecedented economic output.*### Toolchain Tuesday No. 5 [random]

*TL;DR: Part of a series of posts about tools, services, and packages that I use in day-to-day operations to boost efficiency and free up time for the things that really matter. Use at your own risk - happy to answer questions. For the full, continuously expanding list so far see here.*### Cheat Sheet: Smooth Convex Optimization [research]

*TL;DR: Cheat Sheet for smooth convex optimization and analysis via an idealized gradient descent algorithm. While technically a continuation of the Frank-Wolfe series, this should have been the very first post and this post will become the Tour d’Horizon for this series. Long and technical.*### Toolchain Tuesday No. 4 [random]

*TL;DR: Part of a series of posts about tools, services, and packages that I use in day-to-day operations to boost efficiency and free up time for the things that really matter. Use at your own risk - happy to answer questions. For the full, continuously expanding list so far see here.*### Emulating the Expert [research]

*TL;DR: This is an informal summary of our recent paper An Online-Learning Approach to Inverse Optimization with Andreas Bärmann, Alexander Martin, and Oskar Schneider, where we show how methods from online learning can be used to learn a hidden objective of a decision-maker in the context of Mixed-Integer Programs and more general (not necessarily convex) optimization problems.*### Toolchain Tuesday No. 3 [random]

*TL;DR: Part of a series of posts about tools, services, and packages that I use in day-to-day operations to boost efficiency and free up time for the things that really matter. Use at your own risk - happy to answer questions. For the full, continuously expanding list so far see here.*### Cheat Sheet: Hölder Error Bounds for Conditional Gradients [research]

*TL;DR: Cheat Sheet for convergence of Frank-Wolfe algorithms (aka Conditional Gradients) under the Hölder Error Bound (HEB) condition, or how to interpolate between convex and strongly convex convergence rates. Continuation of the Frank-Wolfe series. Long and technical.*### Toolchain Tuesday No. 2 [random]

### Cheat Sheet: Linear convergence for Conditional Gradients [research]

*TL;DR: Cheat Sheet for linearly convergent Frank-Wolfe algorithms (aka Conditional Gradients). What does linear convergence mean for Frank-Wolfe and how to achieve it? Continuation of the Frank-Wolfe series. Long and technical.*### Training Neural Networks with LPs [research]

*TL;DR: This is an informal summary of our recent paper Principled Deep Neural Network Training through Linear Programming with Dan Bienstock and Gonzalo Muñoz, where we show that the computational complexity of approximate Deep Neural Network training depends polynomially on the data size for several architectures by means of constructing (relatively) small LPs.*### Toolchain Tuesday No. 1 [random]

### Cheat Sheet: Frank-Wolfe and Conditional Gradients [research]

*TL;DR: Cheat Sheet for Frank-Wolfe and Conditional Gradients. Basic mechanics and results; this is a rather long post and the start of a series of posts on this topic.*### Tractability limits of small treewidth [research]

*TL;DR: This is an informal summary of our recent paper New Limits of Treewidth-based tractability in Optimization with Yuri Faenza and Gonzalo Muñoz, where we provide almost matching lower bounds for extended formulations that exploit small treewidth to obtain smaller formulations. We also show that treewidth in some sense is the only graph-theoretic notion that appropriately captures sparsity and tractability in a broader algorithmic setting.*### On the relevance of AI and ML research in academia [random]

*TL;DR: Is AI and ML research in academia relevant and necessary? Yes.*### Collaborating online, in real-time, with math-support and computations [random]

*TL;DR: Using atom + teletype + markdown as real-time math collaboration environment.*

subscribe via RSS