What is the difference between Frank-Wolfe and Interior-Point Methods?

Question

I am building a code to solve an optimization problem defined as

\begin{array}{cl} \text{maximize} & f(x) \\ \text{subject to} & x\in\mathcal{X}, \end{array}

where $f:\mathbb{R}^n\to\mathbb{R}$ is concave with respect to $x\in\mathbb{R}^n$, and $\mathcal{X}\subseteq\mathbb{R}^n$ is a simplex set, e.g.,

$$\mathcal{X}=\left\{ x\in\mathbb{R}^n : x_i \ge 0 , \sum_i x_i \le c\right\}$$

In this regard, I made a code using the Frank-Wolfe method (a.k.a. conditional gradient method). However, many papers dealing with convex problems said that "Since the above problem is a convex one, it can be solved any convex programming tools, e.g., interior-point method."

Why were many authors mentioning the interior-point method, instead of the conditional gradient one? I think both methods can solve constrained convex problems and the main difference between them is whether the algorithm base is gradient or Hessian.

Is there a special reason that many authors only mention the interior-point method? If the interior-point method is better than the Frank-Wolfe one, I will rebuild my code using the interior-point one, instead of the Frank-Wolfe one.

I think (second-order) interior-point is typically faster (in terms of required iterations to achieve the desired accuracy) than the first-order method, e.g., Frank-Wolfe. But, you can try on your problem. ;). Furthermore, if your problem is of decent size, then one prefers (?) interior-point-like methods. — user550103, Jun 15 '20 at 19:31
Interior point methods might be mentioned because they are extremely popular and standard, and for small or medium-sized problems they tend to be state of the art. For large scale problems, it becomes prohibitively expensive to solve the linear system of equations which must be solved at each iteration of an interior point method. Kind of like how Newton's method converges much faster than gradient descent, but each iteration of Newton's method is very expensive (maybe impossibly expensive) for a large scale problem. — littleO, Jun 17 '20 at 06:35

Zenan Li · Accepted Answer · 2020-06-17T06:24:50.670

In my humble opinion, the Frank-Wolfe method will be applied if the computation of projection is very expensive or difficult (One can refer to this slide Frank-Wolfe Method). However, the projection onto a simplex often can be computed directly (e.g., Orthogonal Projection onto the Unit Simplex). Thus maybe projected gradient method is also acceptable. In general, the convergence rate of projection gradient method and Frank-Wolfe method are both $O(1/k)$, thus actually we cannot say which one is better.

The convergence rate of inner point method is superlinear but it needs the second-order gradient of $f$. If the dimension of problem is not very high and the Hessian matrix of $f$ is easy to get, the inner point method is more recommended.

Is the superlinear faster than $\mathcal{O}(1/k)$? – Danny_Kim Jun 16 '20 at 04:33 — Danny_Kim, Jun 16 '20 at 04:33

Kamy · Answer 2 · 2023-12-09T01:46:08.617

Another approach you might consider is utilizing the Mirror Descent Method (MDM), known for its improved convergence rates relative to dimensionality, especially when the Bregman divergence is chosen wisely. In your specific scenario, the KL-divergence would be the suitable Bregman divergence. Commonly, Bregman divergence-based methods are discussed in the context of the probability simplex. In comparison to your problem, this alters the constraint from $\sum_i x_i \leq c$ to $\sum_i x_i = 1$. However, this shouldn't pose a problem as the KL-divergence maintains strong convexity. Additionally, the projection in this method has a closed-form solution, which can be particularly advantageous.

What is the difference between Frank-Wolfe and Interior-Point Methods?

2 Answers2