Little-oh, the derivative, Taylor's formula
Table of Contents
Derivatives and little-oh
Real-valued functions on the real line
Definition: A real-valued function \(\varphi\) defined for arbitrarily small values of \(h\) is \(o(h)\) for \(h \to 0\) iff
(Lang, 1997, p. 67-68)
Result: A real-valued function \(f\) is differentiable at \(x\) if and only if there exists some number \(L\) and a function \(\varphi\) which is \(o(h)\) for \(h \to 0\) such that for all \(h\) in some neighborhood of \(x\),
(Lang, 1997, p. 67-68)
Proof:
The following proof is from Lang (1997, p. 67-68).
\((\Longrightarrow)\)
Assume \(f\) is differentiable at \(x\). Let
Observe that
so \(\varphi\) is \(o(h)\) as \(h \to 0\). For \(h \neq 0\), \(f(x+h) = f(x) + Lh + \varphi(h)\) where we have let \(L = f^\prime(x)\). For \(h = 0\), the left-hand side of \((1)\) is \(f(x)\) and the right-hand side is \(f(x) + \varphi(0) = f(x)\).
\((\Longleftarrow)\)
Assume there exists \(L \in \mathbb{R}\) and \(\varphi\) that is \(o(h)\) for \(h \to 0\) such that \(f(x + h) = f(x) + Lh + \varphi(h)\). This implies
The limit of the right-hand side as \(h \to 0\) exists and is \(L + 0 = L\), so the limit of the left-hand side exists and also must equal \(L\) since limits are unique. Taking the limit gives \(f^\prime(x) = L\), so \(f\) is differentiable at \(x\).
Note on proof of the chain rule for real-valued functions on the real line
The following discussion is from Lang (1997, p. 68):
Note that defining
we have that
If we simply define \(\varphi(0) = 0\), we then can just write \(\psi(h) h = \varphi(h)\).
Note that \(\lim_{h \to 0} \psi(h) = 0\).
This quantity is used in the proof of the chain rule in Lang (1997, p. 68).
Real-valued functions on \(\mathbb{R}^n\)
Definition: A real-valued function \(\varphi\) defined for all sufficiently small vectors \(h \in \mathbb{R}^n\), \(h \neq 0\) is said to be \(o(h)\) for \(h \to 0\) if and only if
(Lang, 1997, p. 379)
Similar to above, if we define \(\varphi(0) = 0\), we have \(\psi(h) \norm{h} = \varphi(h)\) where \(\lim_{h \to 0} \psi(h) = 0\) (Lang, 1997, p. 379).
Similar to the above, for \(U \subseteq \mathbb{R}^n\), we say \(f: U \to R\) is differentiable at \(x\) if there exists \(A \in \mathbb{R}^n\) such that
or
\(A\) is the derivative of \(f\) (the gradient in this case) (Lang, 1997, p.380).
Functions whose domain and codomain are normed vector spaces
Definition: Let \(U\) be open in \(E\) and let \(x \in U\). Let \(f: U \to F\) be a map. We say \(f\) is differentiable at \(x\) iff there exists a continuous linear map \(\lambda: E \to F\) and a map \(\psi\) defined for all sufficiently small \(h\) in \(E\), with values in \(F\), such that \(\lim_{h \to 0} \psi(h) = 0\) and such that
(Lang, 1997, p. 463)
As observed in Lang, 463, for \(h = 0\), assuming that \(\psi\) is defined at \(0\) and that \(\psi(0) = 0\) contradicts nothing we have said so far.
Similar to above, defining a map \(\varphi: E \to F\) for which
we could write \((2)\) as \(f(x + h) = f(x) + \lambda(h) + \varphi(h)\) or simply
(Lang, 1997, p. 463-464)
It can be shown (Lang, 1997, p. 463-464) that if the continuous linear map \(\lambda\) exists satisfying \((2)\), then it is uniquely determined by \(f\) and \(x\). This map is called the derivative of \(f\) at \(x\) and is denoted by \(f^\prime(x)\) or \(Df(x)\).
Taylor formula results
Let \(L(E, F)\) be the space of continuous linear maps from \(E\) into \(F\). It is a vector space (Lang, 1997, p. 456).
Definition: If \(f\) is differentiable at every point \(x\) of an open set \(U\) of \(E\), then we say \(f\) is differentiable on \(U\) and in that case, the derivative is a map from \(U\) to \(L(E, F)\) (Lang, 1997, p. 465).
Recall that the second derivative, if it exists, is a function from \(U\) into \(L(E, L(E, F))\) (Lang, 1997, p. 477). Similarly, the \(k\)th derivative, \(D^k\), defined by \(D^k f(x) = D(D^{k-1} f)(x)\) is a function from \(E\) into \(L(E, L(E, \ldots, L(E, F) \ldots))\) (Lang, 1997, p. 487-488).
Definition: Let \(E\), \(F\) be normed vector spaces, and let \(U \subseteq E\). For \(f: U \to F\), we say that \(f\) is of class \(C^p\) iff \(D^k f(x)\) exists for each \(x \in U\) and \(D^k f: U \to L^k(E, F)\) is continuous for each \(k = 0, \ldots, p\) (Lang, 1997, p. 487).
Theorem (Taylor's formula): Let \(U\) be open in \(E\) and let \(U \to F\) be of class \(C^p\). Let \(x \in U\) and \(y \in E\) such that the segment \(x + ty\), \(0 \leq t \leq 1\), is contained in \(U\). Denote by \(y^{(k)}\) the \(k\)-tuple \((y, y, \ldots, y)\). Then
where
(Lang, 1997, p. 490).
Under the same assumptions, there exists a \(t\) such that, letting \(z = x + ty\),
(Güler, 2010, p. 15)
Making use of the above two forms of Taylor's thorem, we have the following specific cases:
For \(U \subseteq \mathbb{R}^n\), \(f: U \to \mathbb{R}\), if \(f\) is differentiable, for all \(x, y \in U\) there exists \(z_1\) on the line segment between \(x\) and \(y\) such that
\(f(y) = f(x) + \inner{\nabla f(z_1)}{y - x}\)
and also
\(f(y) = f(x) + \inner{\nabla f(x)}{y - x} + \varphi(y - x)\)
where \(\varphi(y - x)\) is \(o(y - x)\) as \(y \to x\).
If \(f\) has continuous 2nd-order partial derivatives, for all \(x, y \in U\) there exists \(z_2\) on the line segment between \(x\) and \(y\) such that
\(f(y) = f(x) + \inner{\nabla f(x)}{y - x} + \frac{1}{2} (y - x)^T H f(z_2) (y - x)\),
and
\(f(y) = f(x) + \inner{\nabla f(x)}{y - x} + \frac{1}{2} (y - x)^T H f(x) (y - x) + \varphi(y - x)\)
where \(\lim_{y \to x} (\varphi(y - x) / \norm{y - x}^2)\) as \(y \to x\).
(Güler, 2010, p. 16)
One convention is to write the above two expressions that involve \(\varphi\) is something like the following: \(\varphi\) is \(o(\norm{y - x})\) as \(y \to x\), and \(\varphi\) is \(o(\norm{y - x}^2)\) as \(y \to x\). Note that this requires an extra step of intervention: for example, the second of the two expressions really means \(\lim_{y \to x} \varphi(y - x) / \norm{y - x}^2\), as opposed to \(\lim_{y \to x} \varphi(\norm{y - x}^2) / \norm{y - x}^2\), which is what we would have if we applied our usual definition of little-oh to the statement. Thus care must be taken when using this convention.
Little-oh of a sequence
Recall the fundamental result: for \(f: E \to F\), with \(a \in F\), \(L \in F\),
For a sequence \((x_n)\) and function \(\varphi\) we say \(\varphi(x_n)\) is \(o(x_n)\) as \(n \to \infty\) iff
Thus for a \(\varphi(h)\) that is \(o(h)\) as \(h \to 0\) and an \((x_n)\) where \(x_n \to 0\) as \(n \to \infty\), we have \(\lim_{n \to \infty} g(x_n)\) is \(o(x_n)\) as \(n \to \infty\).
References
Güler, Osman. (2010). Foundations of optimization. Springer Science+Business Media, Inc.
Lang, Serge. (1997). Undergraduate analysis (Second ed.). Springer Science+Business Media, Inc.
How to cite this article
Wayman, Eric Alan. (2025). Little-oh, the derivative, Taylor's formula. Eric Alan Wayman's technical notes. https://ericwayman.net/notes/little-oh-deriv-taylor/
@misc{wayman2025little-oh-deriv-taylor,
title={Little-oh, the derivative, Taylor's formula},
author={Wayman, Eric Alan},
journal={Eric Alan Wayman's technical notes},
url={https://ericwayman.net/notes/little-oh-deriv-taylor/},
year={2025}
}