Duality theory
Table of Contents
Setup
Consider the problem from Nonlinear optimization: For \(f: C \to \mb{R}\), \(C \subseteq \mb{R}^n\), find \(\inf_{x \in C} f(x)\) subject to \(\forall\, i \in [r] \quad g_i(x) \leq 0\), \(\forall\, j \in [m] \quad h_j(x) = 0\).
The infimum may be attained: indeed, it would be nice if we could find a point \(x^\ast\) that satisfies \(x^\ast = \argmin f(x)\) where \(x^\ast\) also satisfies conditions \(g_i(x^\ast) \leq 0\), \(h_j(x^\ast) = 0\).
The following theory will make use of the Lagrangian from Nonlinear optimization:
Definition of primal and dual problems and saddle points
Saddle point
Definition: For \(L: \mc{A} \times \mc{B} \to \mathbb{R}\) (not necessarily the the Lagrangian from above), \((x^\ast, y^\ast) \in \mc{A} \times \mc{B}\) is a saddle point of \(L\) iff
(Güler, 2010, p. 278)
Primal and dual problems
Let \(L: \mc{A} \times \mc{B} \to \mb{R}\). Define \(L_0(x) = \sup_{y \in \mc{B}} L(x, y)\), and define \(L_1(y) = \inf_{x \in \mc{A}} L(x, y)\).
Define the primal problem \((P)\) as finding
Define the dual problem \((D)\) as finding
(Güler, 2010, p. 277)
Note there is a language pitfall here: we could solve the problem without attaining one or both of the infima/suprema. When we solve the problem such that the infimum and/or supremum are attained, we will say there exists an "optimal argument" to the relevant function.
The weak duality theorem
Theorem (Weak duality theorem): \(\sup_y L_1(y) \leq \inf_x L_0(x)\).
(Güler, 2010, p. 277)
Proof:
This proof is from Güler, 2010, p. 277-278.
\(\forall\, y \in \mc{B} \enspace \forall\, u \in \mc{A} \quad \inf_{x \in \mc{A}} L(x, y) \leq L(u, y)\)
\(\equiv \quad \forall\, u \in \mc{A} \enspace \forall\, y \in \mc{B} \quad \inf_{x \in \mc{A}} L(x, y) \leq L(u, y)\)
\(\therefore \forall\, u \in \mc{A} \quad \sup_{y \in \mc{B}} \inf_{x \in \mc{A}} L(x, y) \leq \sup_{y \in \mc{B}} L(u, y)\)
(Bartle & Shebert, 2011, p.41, Example 2.4.2)
Since the right-hand side is greater than the left-hand side for all \(u \in \mc{A}\), we have
\(\sup_{y \in \mc{B}} \inf_{x \in \mc{A}} L(x, y) \leq \inf_{u \in \mc{A}} \sup_{y \in \mc{B}} L(u, y)\)
Replacing \(u\) with \(x\) gives us the desired expression. \(\square\)
Corollary: If \(L_0(x)\) is not bounded below on \(\mathcal{A}\), then \(\sup_{y \in \mathcal{B}} L_1(y) = -\infty\).
(Güler, 2010)
Definition: \(\inf_x L_0(x) - \sup_y L_1(y)\) is known as the duality gap. When the duality gap is zero, "the optimal objective values of the primal and dual problems coincide" (Güler, 2010, p. 278).
Saddle point theorem
Theorem (Saddle point theorem): Let \(L \in \mc{A} \times \mc{B} \to \mb{R}\) and \((x^\ast, y^\ast) \in \mc{A} \times \mc{B}\). The following are equivalent:
(a) \((x^\ast, y^\ast)\) is a saddle point of \(L(x, y)\)
(b) \(x^\ast\) is a optimal argument for the primal problem, \(y^\ast\) is a optimal argument for the dual problem, and \(\min_x L_1(x) = \max_y L_0(y)\)
Furthermore, if (a) or (b) holds, then \(\min_x L_1(x) = L(x^\ast, y^\ast) = \max_y L_0(y)\), i.e. \((x^\ast, y^\ast)\) is an optimal argument for both the primal and dual problems.
(Güler, 2010, p. 278)
Proof:
This proof is from Güler, 2010, p. 277-278, with my filling in of the details.
(\(\Longrightarrow\))
Assume (a) holds.
[Claim: \(\sup_{y \in \mc{B}} L(x^\ast, y) = \max_{y \in \mc{B}} L(x^\ast, y) = L(x^\ast, y^\ast)\).
Proof of claim:
How do we know this max is attained?
First, note that by definition, for \(f: A \to \mb{R}\), \(f\) is maximized by \(x^\ast\) on \(A\) iff \(\forall\, x \in A \quad f(x^\ast) \geq f(x)\). I.e., \(\forall\, x \in A \quad \max_{u \in A} f(u) \geq f(x)\).
Thus, \(\forall y \in \mc{B} \quad L(x^\ast, y) \leq L(x^\ast, y^\ast) \iff \max_{y \in \mc{B}} L(x^\ast, y) = L(x^\ast, y^\ast)\).
Next, note that by the definition of supremum, we must have \(\max_{y \in \mc{B}} L(x^\ast, y) \leq \sup_{y \in \mc{B}} L(x^\ast, y)\). If \(\max_{y \in \mc{B}} L(x^\ast, y) < \sup_{y \in \mc{B}} L(x^\ast, y)\), then there exists a \(y^\prime \in \mc{B} \quad \max_{y \in \mc{B}} L(x^\ast, y) < y^\prime \leq \sup_{y \in \mc{B}} L(x^\ast, y)\), a contradiction. Therefore \(\max_{y \in \mc{B}} L(x^\ast, y) = \sup_{y \in \mc{B}} L(x^\ast, y)\) (this is review of an argument from undergraduate analysis). \(\square\)]
Continuing with the main proof.
By arguments similar to the above, since \(\forall x \in \mc{A} \quad L(x^\ast, y^\ast) \leq L(x, y^\ast)\), we have \(L(x^\ast, y^\ast) = \min_{x \in \mc{A}} L(x, y^\ast) = \inf_{x \in \mc{A}} L(x, y^\ast)\).
Thus so far we have \(\sup_{y \in \mc{B}} L(x^\ast, y) = L(x^\ast, y^\ast) = \inf_{x \in \mc{A}} L(x, y^\ast)\).
Since \(x^\ast \in \mc{A}\),
Similarly (using a different \(g\) here),
where the last step is by the weak duality theorem. We have shown that
so the above holds with equality (call this string of equalities \((1)\)).
We show that \(x^\ast\) is an optimal argument for the primal problem:
i.e. the infimum is attained so the infimum equals the minimum.
Similarly, from the above, since
we have
so \(y^\ast\) is an optimal argument for the dual problem.
We also have from \((1)\) that since \(L(x^\ast, y^\ast) = \inf_{x \in \mc{A}} \sup_{y \in \mc{B}} L(x, y) = \min_{x \in \mc{A}} \sup_{y \in \mc{B}} L(x, y)\), \((x^\ast, y^\ast)\) is the optimal argument that solves the primal problem. Similarly \((x^\ast, y^\ast)\) is the optimal argument for \(L\) that solves the dual problem.
(\(\Longleftarrow\))
Now we do the other direction. Assume (b) holds.
where the first inequality is by the weak duality theorem and the equalities after that are from the assumption of (b). Denote by \(x^\ast\) the appropriate value such that \(\min_x \sup_y L(x, y) = \sup_y L(x^\ast, y)\) and \(y^\ast\) by the appropriate value such that \(\max_y \inf_x L(x, y) = \inf_x L(x, y^\ast)\). Now observe
by the definitions of \(\sup\) and \(\inf\) and where the equality is from \((2)\). Now note that again by the definitions of \(\sup\) and \(\inf\), we have \(\sup_y L(x^\ast, y) \geq L(x^\ast, y^\ast)\) and \(\inf_x L(x, y^\ast) \leq L(x^\ast, y^\ast)\). Thus we have
so \(\inf_x L(x, y^\ast) = L(x^\ast, y^\ast)\) and thus \(\inf_x L(x, y^\ast) = L(x^\ast, y^\ast) = \sup_y L(x^\ast, y)\). Thus \((x^\ast, y^\ast)\) is a saddle point for \(L\). Furthermore, we have shown that \((x^\ast, y^\ast)\) is an optimal argument for both the primal and dual problems.
\(\square\)
Considering the primal problem using the Lagrangian
This section is based on Güler (2010, p. 281), but some parts have been changed.
Considering the nonlinear program: for \(f: C \to \mb{R}\), find \(\argmin_x f(x)\) subject to \(\forall\, i \in [r] \quad g_i(x) \leq 0\) and \(\forall\, j \in [m] \quad h_j(x) = 0\). We refer to the statement "\(\forall\, i \in [r] \quad g_i(x) \leq 0\) and \(\forall\, j \in [m] \quad h_j(x) = 0\)" as "the constraints."
Let
where \(\lambda \geq 0\), \(\mu \in \mathbb{R}^m\).
Let \(E \subseteq C\) denote the set of \(x\) in \(C\) such that the constraints are satisfied.
For a moment, let us attempt to find \(\inf_{x \in C} \sup_{0 \leq \lambda \in \mb{R}^r, \mu \in \mb{R}^m} L(x, \lambda, \mu)\). Note that
and
(the above function values depend on the value \(x\)).
Note that if \(E = \emptyset\), then \(\forall x \in C \quad \sup_{0 \leq \lambda \in \mb{R}^r, \mu \in \mb{R}^m} L(x, \lambda, \mu) = +\infty\). In this case, we are stuck finding \(\inf_{x \in C} +\infty = +\infty\), and \(+\infty \not\in \mb{R}\) so the problem is not solvable).
On a side note, using the above definitions, the dual problem is
Applying the saddle point theorem to the Lagrangian
Theorem, part one: \((x^\ast, \lambda^\ast, \mu^\ast)\) is a saddle point of the Lagrangian \(L\) iff \(x^\ast\) is an optimal argument for \((P)\), \((\lambda^\ast, \mu^\ast)\) is an optimal argument for \((D)\), and \(\min_x L_0(x) = \max_{\lambda \geq 0, \mu} L_1(y)\). Furthermore, if \((x^\ast, \lambda^\ast, \mu^\ast)\) is a saddle point of \(L\), then \(\min_x L_0(x) = L(x^\ast, \lambda^\ast, \mu^\ast) = \max_{\lambda \geq 0, \mu} L_1(\lambda, \mu)\).
(Güler, 2010, p. 281 - 282)
Proof:
The statements follow directly from the saddle point theorem.
(Güler, 2010, p. 282)
Theorem, part two: If \((x^\ast, \lambda^\ast, \mu^\ast)\) is a saddle point of \(L\), then the complimentary conditions hold.
(Güler, 2010, p. 282)
Proof:
Assume \((x^\ast, \lambda^\ast, \mu^\ast)\) is a saddle point of \(L\). Then \(f(x^\ast) = L(x^\ast, \lambda^\ast, \mu^\ast)\) because the only way an optimal argument for \(\min_x L_0(x)\) can exist is if \(L_0(x) = \sup_{\lambda \geq 0, \mu} L(x, \lambda, \mu) < +\infty\).
which is equivalent to
since at this solution for all \(j\), \(h_j(x^\ast) = 0\).
Note that since all \(g_i(x^\ast) \leq 0\), for all \(i\) we must either have \(\lambda_i\) or \(g_i(x^\ast) = 0\) or both. Recall from Nonlinear optimization that these are sometimes called the "complimentary conditions." We have found here that the complimentary conditions must hold at an optimal argument for the primal problem.
Applying the above to convex programming problems
The dual problem is always a convex problem
Note that the Lagrangian is an affine function in \((\lambda, \mu)\) (see "Collection of definitions/results" below). We have for arbitrary fixed \(x\)
Note that \(L_x(0) = L_x((0,0)) = f(x)\), and \((\lambda, \mu) \mapsto \inner{(\lambda, \mu)}{v}\) is linear in \((\lambda, \mu)\). Thus \(L_x(\lambda, \mu)\) is affine in \((\lambda, \mu)\).
\(q(\lambda, \mu) := \inf_{x \in C} L(x, \lambda, \mu) = \inf_{x \in C} L_x(\lambda, \mu)\). Note that since \(q\) is the pointwise infimum of concave functions, \(q\) is concave (Güler, 2010, p. 282; see "Collection of definitions/results" below for a proof).
Thus the dual problem (finding \(\sup_{(\lambda, \mu)} \inf_x L(x, \lambda, \mu)\)) is a convex problem (Güler, 2010, p. 282).
So regardless of whether or not the primal problem is a convex problem, the dual problem is a convex problem (Güler, 2010, p. 282).
The strong duality theorem of convex programming
An affine set \(M\) in \(\mb{R}^n\) is a set of the form \(x + S\), where \(x\) is some vector and \(S\) is a subspace uniquely determined by \(M\) and called the subspace parallel to \(M\). If \(X\) is a subset of \(\mb{R}^n\), the affine hull of \(X\), denoted \(\text{aff}(x)\), is the intersection of all affine sets containing \(X\) (Bertsekas, 2003, p. 36).
Let \(C\) be a nonempty convex set. Let \(\text{ri}(C)\) denote the relative interior of \(C\), defined by
(Bertsekas, 2003, p. 39 - 40).
We call the following condition "Slater's condition for nonlinear convex programs" (my name for it to avoid confusion with the aforementioned "Slater's condition" in Nonlinear optimization):
(Güler, 2010, p. 286)
Theorem (Strong duality theorem of convex programming):
Let \(P\) be a nonlinear convex program. If Slater's condition for nonlinear convex programs holds, then there exist \((\lambda^\ast, \mu^\ast) \in \mb{R}_{+}^r \times \mb{R}_{+}^m\) such that \((\lambda^\ast, \mu^\ast)\) is an optimal argument to \(\sup_{\lambda, \mu} \inf_x L(x, \lambda, \mu)\), so \(\sup_{\lambda, \mu} \inf_x L(x, \lambda, \mu) = \max_{\lambda, \mu} \inf_x L(x, \lambda, \mu) = \inf_x L(x, \lambda^\ast, \mu^\ast)\).
(continuing) Further, we have \(\inf_x \sup_{\lambda, \mu} L(x, \lambda, \mu) = \max_{\lambda, \mu} \inf_x L(x, \lambda, \mu) = \inf_x L(x, \lambda^\ast, \mu^\ast)\). What this implies is that if we solve the dual problem for \((\lambda, \mu)\) and then consider \(\inf_x L(x, \lambda^\ast, \mu^\ast)\), if we find an optimal argument \(x^\ast\) (by attempting to perform the minimization, for example), then \((x^\ast, \lambda^\ast, \mu^\ast)\) is an optimal argument of the primal problem.
(The theorem is stated in Güler, 2010, p. 286; the above is my rewrite which has a bit more explanation.)
The theorem is proved in Güler, 2010, p. 286-287.
Making use of the strong duality theorem of convex programming
We check whether or not Slater's condition holds. If it does, then we solve \(\max_{\lambda, \mu} L(x, \lambda, \mu)\). Then, we attempt to find an optimal argument for \(L(x, \mu^\ast, \lambda^\ast)\). If we find one, we have no duality gap, and \((x^\ast, \mu^\ast, \lambda^\ast)\) is a solution for the primal problem.
Collection of definitions/results
Affine functions
Definition: \(f: E \to \mb{R}\) is affine iff \(\exists\, g: E \to \mb{R}\), \(g\) is linear and \(\exists\, \alpha \in \mb{R} \quad \forall\, x \in E \quad f(x) = g(x) + \alpha\)
Note that \(g\) linear implies \(g(0) = 0\), so \(g(0) = f(0) - \alpha = 0\) which implies \(\alpha = f(0)\).
Lemma: \(f\) is affine iff \(f\) is both convex and concave, i.e.
(Bertsimas, 1997. p. 15 and p. 34, Exercise 1.1)
\((\Longrightarrow)\)
Assume \(f\) is affine.
(\(\Longleftarrow\))
Assume \(f\) is both convex and concave.
To show \(f\) is affine we must show that there exists a linear \(g\) such that \(f(x) = g(x) + f(0)\) (this is true iff \(g(x) = f(x) - f(0)\) is linear).
This can be shown by showing \(g(\lambda x) = \lambda g(x)\) and showing \(g(x + y) = g(x) + g(y)\).
\(\square\)
Also, think about the visual: a real-valued function that is both convex and concave looks like a plane that's usually intersecting the "z" axis at some point other than 0. That's the picture we have in our minds of an affine linear function.
The pointwise infimum of a family of affine functions is concave
Note that one can visualize this result by drawing x-y axes and drawing several lines (plots of the affine functions). Then trace the pointwise infimum (minimum in this case) at each range. Note that the resulting function is concave.
Proof:
Consider \(\{f_i\}_{i=1}^n\) where \(f_i: \mb{R}^n \to \mb{R}\) are convex.
Let \(g: \mb{R}^n \to \mb{R}\), \(g(x) = \sup_{i \in [n]} f_i(x)\).
Consider \(\text{epi}(g) = \{(x, t) ; t > g(x)\}\).
Claim: \(\text{epi}(g)\) is convex.
Proof:
Each of the sets in the intersection is convex. Since the intersection of convex sets is convex, \(\text{epi}(g)\) is convex, so \(g\) is convex.
Thus
So \(-h\) is convex, which holds iff \(h(x)\) is concave.
\(h\) is a pointwise infimum of concave functions.
\(\square\)
References
Bartle, Robert G. and Donald R. Sherbert. (2011). Introduction to real analysis (Fourth ed.). John Wiley & Sons, Inc.
Bertsekas, Dimitri P., with Angelia Nedić and Asuman E. Ozdaglar. (2003). Convex analysis and optimization. Athena Scientific.
Bertsimas, Dimitris and John N. Tsitsiklis. (1997). Introduction to linear optimization. Athena Scientific and Dynamic Ideas, LLC.
Güler, Osman. (2010). Foundations of optimization. Springer Science+Business Media, Inc.
How to cite this article
Wayman, Eric Alan. (2026). Duality theory. Eric Alan Wayman's technical notes. https://ericwayman.net/notes/duality-theory/
@misc{wayman2026duality-theory,
title={Duality theory},
author={Wayman, Eric Alan},
journal={Eric Alan Wayman's technical notes},
url={https://ericwayman.net/notes/duality-theory/},
year={2026}
}