L-smoothness and strong convexity

Table of Contents

Prerequisites

Convexity

Definition: Let \(U \subseteq \mathbb{R}^n\). The epigraph of a function \(f: U \to \mathbb{R}\) is a subset of \(\mathbb{R}^{n+1}\) defined as

$$ \text{epi}(f) = \{(x, t) \mid x \in \text{dom}(f),\, f(x) \leq t\}. $$

Theorem: \(f\) is a convex function iff its epigraph is a convex set.

Proof: see Ahmadi (2025b).

Notation: Let \(A \succeq 0\) denote \(A\) is positive semidefinite, and \(A \succ 0\) denote \(A\) is positive definite.

Theorem: Let \(f: \mathbb{R}^n \to \mathbb{R}\). \(f\) is twice-differentiable over an open domain, then the following are equivalent:

  1. \(f\) is convex
  2. For all \(x, y \in \text{dom}(f)\), \(f(y) \geq f(x) + \inner{\nabla f(x)}{y - x}\)
  3. \(\nabla^2 f(x) \succeq 0\)

(Recall that \(\inner{\nabla f(x)}{y - x}\) is the directional derivative of \(f\) at \(x\) in the direction of \(y - x\).)

(2) means the first-order Taylor expansion of \(f\) at \(x\) lies beneath the function at every point in its domain. (3) means that the function has nonnegative curvature everywhere.

Proof: see Ahmadi (2025c).

First and second order conditions

(necessary first-order condition) For a continuous differentiable \(f\), if \(w^\ast\) is a local minimum of \(f\), then \(\nabla f(w^\ast) = 0\).

Recall that there are critical points \(w\) (i.e. points \(w\) for which \(\nabla f(w) = 0\)) that are not local minima or local maxima.

(second-order necessary condition) For \(f \in C^2\), then if \(w^\ast\) is a local minimum of \(f\) then \(\nabla f(w^\ast) = 0\) and \(\nabla^2 f(w^\ast) \succeq 0\).

(second-order sufficient condition) For \(f \in C^2\), if \(\nabla f(w^\ast) = 0\) and \(\nabla^2 f(w^\ast) \succ 0\) then \(w^\ast\) is a local minimum of \(f\).

(Ahmadi, 2025a; Royer, 2022)

Types of functions

Definition: \(f: \mb{R}^n \to \mb{R}^m\) is \(L\)-Lipschitz continuous with Lipschitz constant \(L > 0\) iff

$$ \forall\, (x, y) \in (\mb{R}^n)^2 \quad \norm{f(x) - f(y)} \leq L \norm{x - y} $$

(Royer, 2024, p. 8)

Note that if \(f\) is \(L\)-Lipschitz continuous, then \(f\) is continuous.

Definition: \(f: \mb{R}^d \to \mb{R}\) is continously differentiable iff its first-order derivative exists and is continuous. We denote the set of continuously differentiable functions by \(\mc{C}^{1}(\mb{R}^d)\) (Royer, 2024, p. 9).

Definition: \(f: \mb{R}^d \to \mb{R}\) is twice continously differentiable iff \(f \in \mc{C}^{1}(\mb{R}^d)\) and the secnond-order derivative of \(f\) exists and is continuous. We denote the set of twice continuously differentiable functions by \(\mc{C}^{2}(\mb{R}^d)\) (Royer, 2024, p. 9).

Definition: Given \(L > 0\), the set \(\mc{C}_{L}^{1,1}(\mb{R}^d)\) represents the set of all functions \(f\) in \(\mc{C}^{1}(\mb{R}^d)\) such that \(\nabla f\) is \(L\)-Lipschitz continuous (Royer, 2024, p. 9).

Definition: Given \(L > 0\), the set \(\mc{C}_{L}^{2,2}(\mb{R}^d)\) represents the set of all functions \(f\) in \(\mc{C}^{2}(\mb{R}^d)\) such that \(\nabla^2 f\) is \(L\)-Lipschitz continuous (Royer, 2024, p. 9).

Strong convexity

Definition: \(f: \mb{R}^n \to \mb{R}\) is strongly convex iff there exists \(\exists\, \alpha > 0\) such that \(g\) defined by \(g(x) = f(x) - \alpha ||x||^2\) is convex (Ahmadi, 2025c).

Result: Strong convexity implies convexity (Ahmadi, 2025c).

\(L\)-Smoothness

Definition: \(f: \mathbb{R}^n \to \mathbb{R}\) is \(L\)-smooth with respect to a norm \(\norm{\cdot}\) if for all \(x, y \in \text{int dom}(f)\)

$$ \norm{\nabla f(x) - \nabla f(y)}_\ast \leq L \norm{x - y} $$

where \(\norm{\cdot}_\ast\) is the sup norm of \(\norm{\cdot}\) (Fawzi, 2023). In what follows, we assume \(\norm{\cdot}\) is the Euclidean norm, so \(\norm{\cdot}_\ast = \norm{\cdot}\). In what follows we will assume the norm is Euclidean.

Using this restricted definition, observe that \(f\) is \(L\)-smooth \(\iff f \in \mc{C}_{L}^{1,1}(\mb{R}^d)\).

Lemmas for \(L\)-smooth functions

Descent lemma

Descent lemma: If \(f\) is \(L\)-smooth, then for any \(x \in \text{int dom}(f)\) and \(y \in \text{dom}(f)\),

$$ f(y) \leq f(x) + \inner{\nabla f(x)}{y - x} + \frac{L}{2} \norm{y - x}^2 $$

(Fawzi, 2023)

Proof:

This proof is directly from Fawzi (2023) with minor clarifications added.

Let \(h = y - x\) and \(\phi(t) = f(x + th) - (f(x) + t \inner{\nabla f(x)}{h})\). Since \(f\) is differentiable, \(\phi\) is differentiable.

$$ \begin{align} \phi^\prime(t) & = f^\prime(x + th) h - \inner{\nabla f(x)}{h} \notag \\ & = \inner{\nabla f(x + th) - \nabla f(x)}{h} \notag \\ & \leq \norm{\nabla f(x + th) - \nabla f(x)} \norm{h} \notag \\ & \leq L \norm{th} \norm{h} = L t \norm{h} \norm{h} = Lt \norm{h}^2 \notag \\ \end{align} $$

Using the remainder version of Taylor's theorem,

$$ \begin{align} \phi(1) & = \phi(0) + R_1 = \phi(0) + \int_0^1 \phi^\prime(s) ds \notag\\ & \leq \phi(0) + \int_0^1 L t \norm{h}^2 dt = \phi(0) + \frac{L}{2} \norm{h}^2 \notag \end{align} $$

which is what we wanted to prove.

An application of the Lemma (this is my filling in of the parts not shown of Remark 1 from Fawzi, 2023):

Let \(y = x - \frac{1}{L} \nabla f(x)\). Then

$$ \begin{align} f(y) & \leq f(x) + \inner{\nabla f(x)}{y - x} + \frac{L}{2} \norm{y - x}^2 \notag\\ & = f(x) + \inner{\nabla f(x)}{-\frac{1}{L} \nabla f(x)} + \frac{L}{2} \norm{y - x}^2 \notag\\ & = f(x) - \frac{1}{L} \inner{\nabla f(x)}{\nabla f(x)} + \frac{L}{2}\norm{-\frac{1}{L} \nabla f(x)}^2 \notag\\ & = f(x) - \frac{1}{L} \inner{\nabla f(x)}{\nabla f(x)} + \frac{1}{2L} \norm{\nabla f(x)}^2 \notag\\ & = f(x) - \frac{1}{2L} \norm{\nabla f(x)}^2 \notag\\ & < f(x) \notag \end{align} $$

This shows that when the gradient method (i.e. iterating with \(x_{k+1} = x_k - t_k \nabla f(x)\)) is used for an \(L\)-smooth function with step size \(t_k = 1/L\), each step results in a decrease of the function's value (Fawzi, 2023).

Proof that the dual norm of the Euclidean norm is the Euclidean norm itself

$$ \norm{h}_\ast = \sup_{\norm{x} = 1} \inner{h}{x} = \sup_{x ; \norm{x} = 1} \inner{h}{x} = \sup_x \{\inner{h}{x} ; \norm{x} = 1\} \qquad (\ast) $$

Note the supremum is attained and is the unit vector in the direction of \(h\), so

$$ (\ast) = \inner{h}{\frac{h}{\norm{h}}} = \frac{1}{\norm{h}} \norm{h}^2 = \norm{h} $$

Another Lemma

Lemma: Assume \(f: U \to \mathbb{R}\) with \(\text{dom}(f)\) open and \(f\) twice continuously differentiable on \(U\). Then \(f\) is \(L\)-smooth if and only if

$$ \forall\, u, v \in \mathbb{R}^n \quad \inner{\nabla^2 f(x) u}{v} \leq L \norm{u} \norm{v} $$

(Fawzi, 2023, Proposition 3.1)

Proof:

The following proof is from Fawzi (2023) with me filling in the gaps.

\((\Longleftarrow)\)

Let \(x, y \in \text{dom}(f)\). We use a theorem from Lang (1997, p. 475):

Theorem: Let \(U\) be open in \(E\) and let \(x \in U\). Let \(y \in E\). Let \(f: U \to F\) by a \(C^1\) map. Assume that the line segment \(x + ty\) with \(0 \leq t \leq 1\) is contained in \(U\). Then

$$ f(x + y) - f(x) = \int_0^1 f^\prime (x + ty) y dt $$

Replace \(y\) with \(h\):

$$ f(x + h) - f(x) = \int_0^1 f^\prime (x + th) h dt $$

Now let \(y = x + h\), so we have

For \(f\) real-valued, \(f^\prime (x + th)\) is \(\nabla f(x + th)\).

Taking the derivative of both sides with respect to \(x\) gives

$$ \nabla f(x + h) - \nabla f(x) = \int_0^1 \nabla^2 f (x + th) h dt $$

Let \(y = x + h\). Then taking the norm gives

$$ \begin{align} \norm{\nabla f(x + h) - \nabla f(x)} & = \norm{\int_0^1 \nabla^2 f (x + th) h dt} \notag\\ & \leq \int_0^1 \norm{\nabla^2 f (x + th) h} dt \notag\\ & = \int_0^1 \inner{\nabla^2 f (x + th) h}{\frac{\nabla^2 f (x + th) h}{\norm{\nabla^2 f (x + th) h}}} dt \notag\\ & \leq \int_0^1 L \norm{h} \norm{\frac{\nabla^2 f (x + th) h}{\norm{\nabla^2 f (x + th) h}}} dt \notag\\ & = L \norm{h} \notag \end{align} $$

so \(f\) is \(L\)-smooth.

\((\Longrightarrow)\)

Assume \(f\) is \(L\)-smooth. Let \(u, v \in \mathbb{R}^n\). Define \(\psi(t) := \inner{\nabla f(x + tu) - \nabla f(x)}{v}\). Note that

$$ \begin{align} \psi(t) & = \inner{\nabla f(x + tu) - \nabla f(x)}{v} \leq \norm{\nabla f(x + tu) - \nabla f(x)} \norm{v} \notag \\ & \leq L \norm{tu} \norm{v} \notag \\ & = L t \norm{u} \norm{v} \notag \end{align} $$

Therefore

$$ \frac{\psi(t) - \psi(0)}{t} \leq L \norm{u} \norm{v} $$

so

$$ \psi^\prime(t) = \lim_{t \to 0} \frac{\psi(t) - \psi(0)}{t} \leq L \norm{u} \norm{v} $$

Also since \(\psi(t) = \inner{\nabla f(x + tu) - \nabla f(x)}{v}\),

$$ \psi^\prime(t) = \inner{\frac{d}{dt} \nabla f(x + tu)}{v} + \underbrace{\inner{\nabla f(x + tu)}{\frac{d}{dt} v}}_{=0} $$

Write \(\beta(t) = x + tu = z\) and \(\alpha(z) = \nabla f(z)\). Then

$$ \begin{align} \frac{d}{dt} \nabla f(x + tu) & = \alpha^\prime(z) \beta^\prime(t) \notag\\ & = \nabla^2 f(z) u \notag\\ & = \nabla^2 f(x + tu) u \notag \end{align} $$

so \(\psi^\prime(t) = \inner{\nabla^2 f(x + tu) u}{v}\) and thus \(\psi^\prime(0) = \inner{\nabla^2 f(x) u}{v}\). Therefore \(\inner{\nabla^2 f(x) u}{v} \leq L \norm{u} \norm{v}\).

\(\square\)

Eigenvalues of the Hessian of an \(L\)-smooth function

Result: The \(L\)-smooth Lemma condition holds implies that all eigenvalues of \(\nabla^2 f(x)\) are in \([-L, L]\).

(Fawzi 2023, Remark 3)

Proof (my work):

Assume the \(L\)-smooth Lemma condition holds. Let \(u\) be an eigenvector of \(\nabla^2 f(x)\) and \(\lambda\) be the corresponding eigenvalue. Then for any vector \(v\),

\(\inner{\nabla^2 f(x) y}{v} = \inner{\lambda u}{v} = \lambda \inner{u}{v} \leq L \norm{u} \norm{v}\)

$$ \iff \lambda \frac{\inner{u}{v}}{\norm{u} \norm{v}} \leq L $$

Case 1: \(\lambda > 0\). We have

$$ \max_{u, v \in \mathbb{R}^n} \lambda \left(\frac{\inner{u}{v}}{\norm{u} \norm{v}}\right) \leq L $$

and thus \(\lambda \leq L\).

Also, certainly \(L \geq \lambda \cos(\theta) \geq \lambda (-1) \iff \lambda \geq -L\).

Case 2: \(\lambda < 0\). Again,

$$ \max_{u, v \in \mathbb{R}^n} \lambda \left(\frac{\inner{u}{v}}{\norm{u} \norm{v}}\right) \leq L \implies \lambda (-1) \leq L \iff \lambda \geq -L $$

Also, \(L \geq \lambda \cos(\theta) \geq \lambda (1) = \lambda\).

So in both cases, \(\lambda \in [-L, L]\).

\(\square\)

\(m\)-strong convexity

Definition

Let \(U \subseteq \mathbb{R}^n\). We say that \(f: \mathbb{R}^n \to \mathbb{R}\) is \(m\)-strongly convex (with respect to the norm \(\norm{\cdot}\)) iff for any \(x, y \in \text{dom}(f)\) and \(t \in [0, 1]\),

$$ f(t x + (1-t) y) \leq t f(x) + (1 - t) f(y) - \frac{m}{2} t (1 - t) \norm{x - y}^2 $$

(Fawzi, 2023, Section 3.3)

Lemma 1

Lemma: \(f\) is \(m\)-strongly convex and differentiable iff for all \(x, y \in \text{dom}(f)\),

$$ f(y) \geq f(x) + \inner{\nabla f(x)}{y - x} + \frac{m}{2} \norm{y - x}^2 $$

(Fawzi, 2023, Section 3.3)

Proof:

This was assigned as an Exercise in Fawzi (2023, Section 3.3). The following is my answer.

\((\Longrightarrow)\)

Assume \(f\) is \(m\)-convex and differentiable. Since \(f\) is differentiable, using the definition of differentiable,

$$ f(x + th) = f(x) + \inner{\nabla f(x)}{th} + \varphi(th) $$

where \(\varphi(h)\) is \(o(h)\) as \(h \to 0\).

[Claim: If \(\varphi(h)\) is \(o(h)\) as \(h \to 0\), then \(\varphi(th)\) is \(o(t)\) as \(t \to 0\) (namely, that \(\lim_{t \to 0} (\varphi(th) / \abs{t}) = 0\)).

Proof: \(\varphi(h)\) is \(o(h)\) as \(h \to 0\) is equivalent to

$$ \forall\, \alpha > 0 \enspace \exists\, \beta > 0 \quad \norm{h} < \beta \implies \abs{\varphi(h)} < \norm{h} \alpha $$

Let \(\varepsilon > 0\). We want to find a \(\delta > 0\) such that \(\abs{t} < \delta \implies \abs{\varphi(th)} < \abs{t} \cdot \varepsilon\). Note

$$ \begin{align} \abs{\varphi(th)} < \norm{th} \alpha & = \abs{t} \cdot \norm{h} \alpha \notag\\ & \stackrel{\text{want}}{<} \varepsilon \quad \left(\iff \abs{t} < \frac{\varepsilon}{\norm{h} \alpha}\right) \notag \end{align} $$

Thus letting \(\delta = \varepsilon / (\norm{h} \alpha)\) we have that \(\lim_{t \to 0} (\varphi(th) / \abs{t}) = 0\), i.e. \(\varphi(th)\) is \(o(t)\) as \(t \to 0\). \(\square\)]

Thus we have

$$ f(x + th) = f(x) + \inner{\nabla f(x)}{th} + \varphi(th) $$

Therefore, since \(f(x + th) \leq (1 - t) f(x) + t f(y) - \frac{m}{2} t (1 - t) \norm{x - y}^2\),

$$ \begin{align} & f(x) + t \inner{\nabla f(x)}{th} + \varphi(th) \leq (1 - t) f(x) + t f(y) - \frac{m}{2} t (1 - t) \norm{x - y}^2 \notag\\ & \iff t f(y) \geq t f(x) + t \inner{\nabla f(x)}{h} + \varphi(th) + \frac{m}{2} t (1 - t) \norm{x - y}^2 \notag \\ \end{align} $$

which implies

$$ f(y) \geq f(x) + \inner{\nabla f(x)}{h} + \frac{\varphi(th)}{t} + \frac{m}{2} (1 - t) \norm{x - y}^2 $$

(we have divided by \(t \neq 0\)). Now taking the limit of both sides as \(t \to 0\)

$$ f(y) \geq f(x) + \inner{\nabla f(x)}{h} + 0 + \frac{m}{2} \norm{x - y}^2 $$

which is the desired expression.

\((\Longleftarrow)\)

Assume for all \(x, y \in \text{dom}(f)\),

$$ f(y) \geq f(x) + \inner{\nabla f(x)}{h} + \frac{m}{2} \norm{x - y}^2 $$

Using the same techinque in the proof of the convexity hyperplane condition (Ahmadi, 2025c), we have for \(z = t x + (1 - t) y\),

$$ \begin{align} t f(x) + (1 - t) f(y) & \geq f(z) + \inner{f(z)}{tx + (1 - t)y - z} + \frac{m}{2} \norm{x - y}^2 \notag\\ & = f(z) + \frac{m}{2} \norm{x - y}^2 \notag\\ & = f(tx + (1 - t) y) + \frac{m}{2} \norm{x - y}^2 \notag \end{align} $$

the desired condition.

\(\square\)

Lemma 2

Lemma: If \(f\) is twice continuously differentiable on its domain (assumed open), then \(f\) is \(m\)-strongly convex if and only if \(\inner{\nabla^2 f(x) h}{h} \geq m \norm{h}^2\).

(Fawzi, 2023, Section 3.3)

Proof:

This was assigned as an Exercise in Fawzi (2023, Section 3.3). The following is my answer.

\((\Longrightarrow)\)

Assume \(f\) is \(m\)-strongly convex. Define

$$ \phi(t) := f(x + th) - \left(f(x) + t \inner{\nabla f(x)}{h} + \frac{m}{2} t^2 \norm{h}^2\right) $$

Observe that \(\phi(t) \geq 0 \iff\)

$$ f(x + th) \geq f(x) + \inner{\nabla f(x)}{th} + \frac{m}{2} \norm{th}^2 $$

so (taking \(y\) as \(x + th\) in the definition) strong convexity of \(f\) implies \(\forall t \quad \phi(t) \geq 0\).

Observe \(\phi(0) = 0\). Therefore \(\phi^{\prime\prime}(0) \geq 0\). Also,

$$ \begin{align} \phi^\prime(t) & = \inner{\nabla f(x + th)}{h} - \inner{\nabla f(x)}{h} - mt \norm{h}^2 \notag\\ \phi^{\prime\prime}(t) & = \inner{h}{\nabla^2 f(x + th) h} - m \norm{h}^2 \notag \end{align} $$

so \(\phi^{\prime\prime}(0) \geq 0\) is \(\inner{h}{\nabla^2 f(x) h} \geq m \norm{h}^2\).

\((\Longleftarrow)\)

Assume \(\forall\, z \in \text{dom}(f) \enspace \forall\, h \in \mathbb{R}^n \quad \inner{h}{\nabla^2 f(x) h} \geq 0\).

Letting \(\phi(t)\) be defined as above, we have \(\forall \, t \quad \phi^{\prime\prime}(t) \geq 0\). Therefore \(\forall \, t \quad \phi^\prime(t)\) is increasing.

Since \(\phi(0) = \phi^\prime(0) = 0\), we have \(\phi(1) \geq 0\), i.e.

$$ f(x + h) \geq f(x) + \inner{\nabla f(x)}{h} + \frac{m}{2} \norm{h}^2. $$

Lemma 3

Lemma: A convex function \(f\) is \(L\)-smooth on \(U\) iff for all \(x \in U\), \(\nabla^2 f(x) \preceq LI\) (i.e. \(LI - \nabla^2 f(x)\) is positive semidefinite), i.e. all the eigenvalues of \(f\) are \(\leq L\). Similarly, a function \(f\) is \(m\)-strongly convex on \(U\) iff for all \(x \in U\), \(\nabla^2 f(x) \succeq mI\), i.e. all the eigenvalues of \(f\) are \(\geq m\).

(Fawzi, 2023, Remark 4)

Proof:

This was not proven in Fawzi, 2023. The following is my proof.

Let \(x \in U\). \(f\) is \(L\)-smooth at \(x\) iff

\(\nabla^2 f(x) \succeq mI \iff \nabla^2 f(x) - mI \succeq 0\)

\(\iff \forall\, h \in \mathbb{R}^n \setminus \{0\} \quad \inner{h}{(\nabla^2 f(x) - mI) h} \geq 0\)

Let \(u\) be an eigenvector of \(\nabla^2 f(x)\) with \(\lambda\) its corresponding eigenvalue. Then we have

$$ \begin{align} & \inner{u}{(\nabla^2 f(x) - mI) u} = \inner{u}{\nabla^2 f(x) u - mu} \notag\\ & = \inner{u}{\lambda u - mu} = \inner{u}{(\lambda - m)u} = (\lambda - m) \norm{u}^2 \geq 0 \notag \end{align} $$

so \(\lambda \geq m\).

Similarly, \(\nabla^2 f(x) \preceq LI\) iff

$$ \begin{align} 0 & \leq \inner{u}{(LI - \nabla^2 f(x)) u} \notag\\ & = \inner{u}{Lu - \nabla^2 f(x) u} = \inner{u}{Lu - \lambda u} = \inner{u}{(L - \lambda)u} = (L - \lambda) \norm{u}^2 \notag \end{align} $$

so \(\lambda \leq L\).

References

Ahmadi, Amir Ali. (2025a). Convex and conic optimization, Lecture 3 [Lecture notes].

Ahmadi, Amir Ali. (2025b). Convex and conic optimization, Lecture 4 [Lecture notes].

Ahmadi, Amir Ali. (2025c). Convex and conic optimization, Lecture 7 [Lecture notes].

Fawzi, Hamza. (2023). Topics in convex optimization, Lecture 3: Smoothness and strong convexity [Lecture notes].

Lang, Serge. (1997). Undergraduate analysis (Second ed.). Springer Science+Business Media, Inc.

Royer, Clément W. (2022). Lecture notes on advanced gradient descent [Lecture notes].

How to cite this article

Wayman, Eric Alan. (2025). L-smoothness and strong convexity. Eric Alan Wayman's technical notes. https://ericwayman.net/notes/L-smoothness-strong-convexity/

@misc{wayman2025L-smoothness-strong-convexity,
  title={L-smoothness and strong convexity},
  author={Wayman, Eric Alan},
  journal={Eric Alan Wayman's technical notes},
  url={https://ericwayman.net/notes/L-smoothness-strong-convexity/},
  year={2025}
}

© 2025-2026 Eric Alan Wayman. All rights reserved.