Linear algebra: inner product spaces
Table of Contents
- Hilbert spaces and finite dimensional inner product spaces
- Basic facts for regarding quadratics and complex fields
- Facts regarding the orthogonal complement of a subspace
- Showing the orthogonal complement of the span of a vector is a hyperplane
- Showing the span of a vector intersection the orthogonal complement of that span is the set containing only the zero vector
- Notes on the Gram-Schmidt procedure
- Notes on showing that a vector can be decomposed into a linear combination of orthonormal basis vectors, where each coefficient is the inner product of the vector with the corresponding basis vector
- Note on proving that for any subspace, a vector space equals the direct sum of that subspace and its complement
- Notes on showing that the orthogonal complement of the orthogonal complement of a subspace is the subspace itself
- Statement and proof of the Riesz Representation Theorem
- The adjoint
- Unitary operators
- Diagonalizability of transformations in inner product spaces
- Results for projection operators
- References
- How to cite this article
Hilbert spaces and finite dimensional inner product spaces
We begin by defining a Hilbert space, and show that every finite dimensional inner product space is a Hilbert space.
A vector space together with an inner product is called an inner product space (Jain & Ahuja, 2010, p. 235).
Note that an inner product \((\bullet, \bullet)\) induces a norm \(\norm{x} = (x, x)^{1/2}\) (Jain & Ahuja, 2010, p. 236).
A vector space together with a norm is called a normed vector space (Lang, 1997, p. 132).
Definition: A Banach space is a complete normed vector space..
Definition: A Hilbert space is a Banach space over \(\mb{C}\) or \(\mb{R}\) whose norm is induced by the inner product (Jain & Ahuja, 2010, p. 239).
We now provide some details on the definition of Cauchy sequences for such spaces, and completeness.
Any norm induces a metric, namely \(d(x, y) := \norm{x - y}\). We define various properties for normed vector spaces using this metric: when doing so, any normed vector space is a metric space. For example, a normed vector space is complete iff every Cauchy sequence converges, where the definition of Cauchy sequence uses the metric induced by the norm.
Definition: Two norms \(\norm{\bullet}_1\) and \(\norm{\bullet}_2\) on \(V\) are called equivalent norms on \(V\) iff there are constants \(C_1, C_2 > 0\) such that for all \(v \in V\),
(Bube & Burke, 2021, p. 22)
Theorem: If \(V\) if a finite dimensional vector space, then any two norms on \(V\) are equivalent.
Proof: See Bube & Burke (2021, p. 22-23).
Note that if \(V\) is not finite dimensional, we cannot guarantee that any two norms on \(V\) are equivalent (Bube & Burke, 2021, p. 23).
Fact: Both \(\mb{C}\) and \(\mb{R}\) are complete in the Euclidean norm (Bube & Burke, 2021, p. 27).
Proof:
(This is the "standard" proof for \(\mb{R}^n\), which turns out to be the same for \(\mb{C}^n\) since the definition of absolute value for complex numbers works in the inequalities in the same way as absolute value for real numbers.)
Let \(\mb{F}\) be either \(\mb{R}\) or \(\mb{C}\). Denoting \(z = (z_1, \ldots, z_n) \in \mb{F}\), define the Euclidean norm on \(\mb{F}^n\) as \(\norm{z}_2 := \sqrt{\sum_{i=1}^{n} \abs{z_i}^2}\).
Let \((z^{(n)})\) be a Cauchy sequence in \(\mb{F}^n\), namely
Since for every coordinate \(i \in [n]\), \(\abs{z_i^{(m)} - z_i^{(n)}} \leq \norm{z^{(m)} - z^{(n)}}_2\), we have that every coordinate sequence \((z_i^{(n)})\) is Cauchy. Since both \(\mb{R}\) and and \(\mb{C}\) are complete, each \((z_i^{(n)})\) converges to some \(z_i \in \mb{F}\) as \(n \to \infty\). Define \(z = (z_1, \ldots, z_n)\).
For an arbitrary \(\varepsilon > 0\), there thus exists an \(N \in \mb{N}\) greater than or equal to the maximum of all \(N_i\) used in the proofs of convergence of each of the coordinates (using \(\varepsilon / \sqrt{n}\) as the "epsilon" variable in each of those proofs), such that for all \(m \geq N\) we have
so \(\lim_{m \to \infty} \norm{z^{(m)} - z}_2 = 0\) which holds iff \(\lim_{m \to \infty} z^{(m)} = z\).
\(\square\)
Result: If two norms \(\norm{\bullet}_1\) and \(\norm{\bullet}_2\) on \(V\) are equivalent, then \((V, \norm{\bullet}_1)\) is complete iff \((V, \norm{\bullet}_2)\) is complete (Bube & Burke, 2021, p. 27).
Theorem: Every finite dimensional normed vector space over \(\mb{C}\) or \(\mb{R}\) is complete.
Proof: Let \(V\) be a finite dimensional vector space over \(\mb{F}\) where \(\mb{F} = \mb{C}\) or \(\mb{F} = \mb{R}\). Let \(\mc{A} = (v_1, \ldots, v_n)\) be a basis for \((V, \norm{\bullet})\) where \(\norm{\bullet}\) is an arbitrary norm. Consider the coordinate map \((\bullet)_{\mc{A}}\) where \((x)_\mc{A} = (c_1 x_1 + \cdots + c_n x_n)_\mc{A} = (c_1, ..., c_n) \in \mb{F}^n\).
On \(V\), define \(\norm{\bullet}_2\) by \(\norm{x}_2 = \norm{(x)_{\mc{A}}}_2\), the latter being the Euclidean norm in \(\mb{F}^n\).
Since \(V\) is finite-dimensional, by norm equivalence there exists \(C > 0\) such that for all \(x \in V\), \(\norm{x}_2 \leq C \norm{x}\).
Let \((a^{(k)})\) be a Cauchy sequence in \((V, \norm{\bullet})\).
By the norm equivalence relationship, for all \(k, j \geq N\) in the definition of Cauchy sequence, we have
so \((a^{(k)}) \in (V, \norm{\bullet}_2)\) is Cauchy.
Recall that \(\norm{a^{(k)} - a^{(j)}}_2 = \norm{(a^{(k)} - a^{(j)})_{\mc{A}}}_2\). Therefore defining \(b^{(k)} = (a^{(k)})_{\mc{A}}\), we have that \(b^{(k)} \in (\mb{F}^n, \norm{\bullet}_2)\) is Cauchy.
\(\mb{F}^n\) is complete. Therefore since \(b^{(k)}\) is Cauchy, it converges to a limit \(L\).
Therefore \(a^{(k)} \in (V, \norm{\bullet}_2)\) converges. Therefore \((V, \norm{\bullet}_2)\) is complete. By the norm equivalence result, \((V, \norm{\bullet})\) is complete.
\(\square\)
We conclude from the above that any finite-dimensional inner product space over \(\mb{C}\) or \(\mb{R}\) is a Hilbert space.
Basic facts for regarding quadratics and complex fields
Discriminant of a quadratic
The following two results are used in the proof of the Cauchy-Schwarz Inequality (Surowski, 1997, p. 70):
Result: \(\text{disc}(q(t)) = 4(\text{Re}(w, v))^2 - 4 \norm{w}^2 \norm{v}^2 \leq 0\):
(since \(\overline{x} + x = (a - bi) + (a + bi) = 2a = 2\text{Re}(x)\)).
Result: Recall that for a polynomial \(q(t)\), if \(q(t)\) has a positive leading coefficient, \(q(t) \geq 0 \iff \text{disc}(q(t)) \leq 0\).
Complex field-related
The following two facts are used in the proof of the Cauchy-Schwarz Inequality (Surowski, 1997, p. 70):
Fact: For \(\alpha \in \mb{C}\), \(x \in V\), \(\norm{\alpha x} = \abs{\alpha} \norm{x}\).
Proof: Consider
where for \(\alpha = a + bi\), \(\abs{\alpha} = \sqrt{a^2 + b^2}\). \(\square\)
Therefore \(\norm{\alpha x} = \sqrt{(\alpha x, \alpha x)} = \abs{\alpha} (x, x)^{1/2} = \abs{\alpha} \norm{x}\). \(\square\)
Fact: \((v, w) + \overline{(v, w)} \leq 2 \abs{(v, w)}\).
Proof: \((v, w) + \overline{(v, w)} = 2\text{Re}(v, w)\), and \(\abs{\text{Re}(v, w)} \leq \abs{(v, w)}\). \(\square\)
Facts regarding the orthogonal complement of a subspace
Claim (Surowski, 1997, p. 72): Let \(W\) be a subspace of \(V\). Then \(W^\perp\) is a subspace of \(V\) .
Proof: Let \(x, y \in W^\perp\). Let \(w \in W\).
so \(x + y \in W^\perp\).
Let \(x \in W^\perp\), \(w \in W\).
so \(\alpha x \in W^\perp\).
\(\square\)
Claim (Surowski, 1997, p. 72): \(W^\perp = \bigcap_{w \in W} \text{ker } (w, \bullet)\)
Proof: Let \(x \in W^\perp\)
\(\square\)
Claim (Surowski, 1997, p. 73): \(w \neq 0 \implies \text{ker }(w, \bullet)\) is a hyperplane.
Proof: Fix \(w \neq 0\). Let \(f(x) = (w, x)\). \(f: V \to \mb{C}\).
Since \(\text{dim}(\mb{C}) = 1\), we either have \(\text{dim } \text{Im } f = 1\) or \(\text{dim } \text{Im } f = 0\).
Note that \(f(w) = (w, w) > 0\), so \(f \neq 0\).
Therefore \(\text{rank } f > 0\). Thus \(\text{rank } f = 1\).
Therefore \(\text{dim }\text{ker } f = n - 1\).
Therefore \(\{x \in V;\, (w, x) = 0\} = \text{ker } f = \text{ker } (w, \bullet)\) is a hyperplane.
\(\square\)
Showing the orthogonal complement of the span of a vector is a hyperplane
A Lemma
Lemma: For \(u_1 \neq 0\), \(\cap_{w \in \langle u_1 \rangle} \text{ker } (w, \bullet) = \text{ker } (u_1, \bullet)\).
Proof:
Case 1: \(w = 0\). Then \(\text{ker }(w, \bullet) = V\).
Case 2: \(w \neq 0\) \(::\, \exists\, \alpha \in \mb{F} \quad \alpha \neq 0 \,\land\, w = \alpha u_1\).
Then
where we could eliminate \(\overline{\alpha}\) from both sides because \(\mb{F}\) is a field (recall: a field is an integral domain, where "integral" means commutative and a domain is a ring in which the zero product property holds).
Therefore
\(\square\)
The proof
Claim: If \(u_1 \in V, u_1 \neq 0\), then \(u_1^\perp\) is a hyperplane (Surowski, 1997, p. 73).
Proof: Recall that by definition, \(u_1^\perp = \langle u_1 \rangle^\perp\).
We have
where the last step uses the Lemma.
We have shown \(\langle u_1 \rangle^\perp = \text{ker }(u_1, \bullet)\).
Above, we showed that for a vector \(w \neq 0\), \(\text{dim }\text{ker }(w, \bullet) = n - 1\).
Thus, \(\langle u_1 \rangle^\perp\) is a hyperplane.
\(\square\)
Showing the span of a vector intersection the orthogonal complement of that span is the set containing only the zero vector
Claim: \(\langle u_1 \rangle \cap \langle u_1 \rangle^\perp = \{0\}\) (Surowski, 1997, p. 73).
Proof:
First of all, since both \(\langle u_1 \rangle\) and \(\langle u_1 \rangle^\perp\) are subspaces, the zero vector is in both of them, so we certainly have \(\{0\} \subseteq \langle u_1 \rangle \cap \langle u_1 \rangle^\perp\). We now show the reverse inclusion.
Let \(y \in \langle u_1 \rangle \cap \langle u_1 \rangle^\perp\).
Case 1: \(y = 0\). Then the inclusion is satisfied.
Case 2: \(y \neq 0\), so \(\exists\, \alpha \in \mb{C} \quad \alpha \neq 0 \,\land\, y = \alpha x\).
Note that \(y \in \langle u_1 \rangle\) implies \(\exists\, \alpha \in \mb{C} \quad y = \alpha x\).
Since \(y \in \langle u_1 \rangle^\perp\), by an earlier result we have \(y \in \text{ker } (u_1, \bullet)\), so \(0 = (u_1, y) = (u_1, \alpha u_1) = \overline{\alpha} (u_1, u_1)\).
Using the zero product property of fields, since \(\overline{\alpha}(u_1, u_1) = 0 \land \alpha \neq 0\) we conclude \((u_1, u_1) = 0\). Therefore \(u_1 = 0\), a contradiction. Thus Case 2 cannot happen.
We have thus shown that \(y \in \langle u_1 \rangle \cap \langle u_1 \rangle^\perp\) implies \(y = 0\).
Therefore \(\langle u_1 \rangle \cap \langle u_1 \rangle^\perp \subseteq \{0\}\).
Thus the two inclusions show that \(\langle u_1 \rangle \cap \langle u_1 \rangle^\perp = \{0\}\).
\(\square\)
Taking the orthogonal complement of both sides of a set inclusion
Fact (Axler, 2024, p. 211): If \(G\) and \(H\) are subsets of \(V\) and \(G \subseteq H\), then \(H^\perp \subseteq G^\perp\).
Proof (Axler, 2024, p. 212): Let \(G\) and \(H\) be subsets of \(V\) where \(G \subseteq H\). Let \(v \in H^{\perp}\). Then for all \(u \in H\), \((u, v) = 0\), so \((v, u) = 0\) for all \(u \in G\). Thus \(v \in G^{\perp}\). We have thus shown that \(H^{\perp} \subseteq G^{\perp}\). \(\square\)
Notes on the Gram-Schmidt procedure
Starting with \((v_1, \ldots, v_n)\) linearly independent vectors, the procedure produces an orthonormal list of vectors \((u_1, \ldots, u_n)\) one at a time, where
\(\langle u_1 \rangle = \langle v_1 \rangle\)
\(\langle u_1, u_2 \rangle = \langle v_1, v_2 \rangle\)
and so on, where for example
and so on.
It is a simple matter to check that \((u_2, u_1) = 0\).
We note:
Claim: \(\langle u_1, u_2 \rangle = \langle v_1, v_2 \rangle\).
Proof: \(u_2 \in \langle v_1, v_2 \rangle\) by the definition of \(u_2\). \(u_1 \in \langle v_1, v_2 \rangle\) since \(u_1 = v_1\). Therefore \(\langle u_1, u_2 \rangle \subseteq \langle v_1, v_2 \rangle\). Since the dimensions of the two are equal, we have \(\langle u_1, u_2 \rangle = \langle v_1, v_2 \rangle\). \(\square\)
Checking the orthonormality of \(u_3\) and \(u_1\) vectors in the step that defines \(u_3\):
Notes on showing that a vector can be decomposed into a linear combination of orthonormal basis vectors, where each coefficient is the inner product of the vector with the corresponding basis vector
This fact is Corollary 3.1.4.2 of Surowski (1997, p. 74).
The proof of Surowski makes use of several results, which we prove here.
Claim: \(\langle u_1 \rangle^\perp \cap \cdots \cap \langle u_n \rangle^\perp = V^\perp\).
Proof:
(\(\subseteq\))
Let \(x \in \langle u_1 \rangle^\perp \cap \cdots \cap \langle u_n \rangle^\perp\). Therefore \(\forall\, i \in [n] \quad (u_i, x) = 0\).
Let \(v \in V\). It follows that \(\exists\, \alpha_1, \ldots, \alpha_n \in \mb{C} \quad v = \sum_{i=1}^{n} \alpha_i u_i\). Therefore
We have shown that \(\forall v \in V \quad (v, x) = 0\). Therefore, \(x \in V^\perp\).
(\(\supseteq\))
Let \(x \in V^\perp\), so \(\forall\, v \in V \quad (v, x) = 0\). Therefore \(\forall\, i \in [n] \quad (u_i, x) = 0\). Thus \(\forall\, i \in [n] \enspace \forall\, \alpha \in \mb{C} \quad (\alpha u_i, x) = \overline{\alpha}(u_i, x) = 0\), so \(\forall\, i \in [n] \enspace \forall\, y \in \langle u_i \rangle \quad (y, x) = 0\)
Therefore \(\forall\, i \in [n] \quad x \in \langle u_1 \rangle^\perp\), so \(x \in \cap_{i=1}^n \langle u_1 \rangle^\perp\).
Putting the above two results together, we have \(\cap_{i=1}^n \langle u_i \rangle^\perp = V^\perp\).
\(\square\)
Claim: \(V^\perp = \{0\}\).
(\(\subseteq\))
Let \(x \in V^\perp\). Thus \((x, x) = 0\), so \(x = 0\). Therefore \(V^\perp \subseteq \{0\}\).
(\(\supseteq\))
\(V^\perp\) is a subspace of \(V\), so \(\{0\} \subseteq V^\perp\).
Putting the above two together, we have \(V^\perp = \{0\}\).
\(\square\)
Note on the last step of the proof
In the proof, it is shown that \(v - \sum_{i=1}^{n} (u_i, v) u_i \in \{0\}\). This implies \(v = \sum_{i=1}^{n} (u_i, v) u_i\).
Note on proving that for any subspace, a vector space equals the direct sum of that subspace and its complement
This is Corollary 3.1.4.3 of Surowski (1997, p. 75).
Let \(W \subseteq V\), \(W\) a subspace of \(V\). Let \((w_1, \ldots, w_k)\) be an orthonormal basis of \(W\).
For any vector \(v \in V\), let \(v^\prime = \sum_{j=1}^{k} (w_j, v) w_j \in W\).
Claim: \(v - v^\prime \in W^\perp\) implies \(v \in v^\prime + W^\perp\) (Surowski, 1997, p. 75).
Proof: Let \(x = v - v^\prime \in W^\perp\). \(x = v - v^\prime :: v = x + v^\prime\).
\(x \in W^\perp\) and \(v^\prime \in W\), so
\(\square\)
Claim: \(v^\prime + W^\perp \subseteq W + W^\perp\) (Surowski, 1997, p. 75).
Proof: Let \(x \in v^\prime + W^\perp\). Therefore \(\exists\, w \in W^\perp \quad x = v^\prime + w \in W + W^\perp\). Thus \(v^\prime + W^\perp \subseteq W + W^\perp\). \(\square\)
Notes on showing that the orthogonal complement of the orthogonal complement of a subspace is the subspace itself
This fact is Corollary 3.1.4.4 of Surowski (1997, p. 75).
Corollary: If \(W\) is a subspace of \(V\), then \(W^{\perp\perp} = W\).
Proof:
Assume \(\text{dim } V = n\) and \(\text{dim } W = k\).
From an above result, we have \(V = W \oplus W^\perp\). Therefore by Lang, Chapter I, Theorem 4.3 (1987, p. 20), \(\text{dim } V = \text{dim } W + \text{dim } W^\perp\). Therefore \(\text{dim } W^\perp = n - k\).
\(W^\perp = \{v \in V;\, \forall w \in W \quad (w, v) = 0\}\)
\(W^{\perp\perp} := (W^\perp)^\perp = \{v \in V;\, \forall\, z \in W^\perp \quad (z, v) = 0\}\)
We will now show that \(\text{dim } W^{\perp\perp} = k\)
\(V = W^\perp \oplus W^{\perp\perp}\). Using the same result from Lang again, we have \(n = (n - k) + \text{dim } W^{\perp\perp}\). Therefore \(\text{dim } W^{\perp\perp} = k\).
We show that \(W \subseteq W^{\perp\perp}\): let \(x \in W\). Begin sub-proof: let \(z \in W^\perp\). Therefore \(\forall y \in W \quad (y, z) = 0\) which implies that for all \(y \in W\) we have \((z, y) = \overline{(y, z)} = (y, z) = 0\), so in particular, \((z, x) = 0\). End sub-proof.
We just showed \(\forall\, z \in W^\perp \quad (z, x) = 0\), which implies \(x \in W^{\perp\perp}\). Therefore \(W \subseteq W^{\perp\perp}\).
Since \(W \subseteq W^{\perp\perp}\) and \(\text{dim } W = \text{dim } W^{\perp\perp}\), we have \(W = W^{\perp\perp}\).
\(\square\)
Statement and proof of the Riesz Representation Theorem
Let \(V\) be a vector space over \(\mb{C}\) with \(\text{dim } V = n\). We consider the function \(\varphi: V \to V^\ast\) defined by \(\varphi(v) = (v, \bullet)\).
We note that this function is anti-linear, namely that for all \(v_1, v_2 \in V\),
but for all \(\alpha \in \mb{C}\), \(v \in V\),
We claim that the rank-nullity theorem holds for anti-linear functions (its initial statement was for linear functions). We redo that proof here for the case of anti-linear functions:
Theorem (rank-nullity, but for anti-linear functions): Let \(T: V \to W\) be an anti-linear transformation. Then \(\text{rank } T = \text{dim } V - \text{nullity } V\).
Proof (Surowski, 1997, p. 16-17): We first find a subspace \(V_1 \subseteq V\) such that \(V = \text{ker}(T) \oplus V_1\). We restrict \(T\) to \(V_1\), i.e. \(T|_{V_1}: V_1 \to T(V_1)\).
We show that \(T|_{V_1}\) is invertible. Namely, if \(v_1 \in \text{ker}(T|_{V_1})\), then \(T(v_1) = 0\) implies \(v_1 \in V_1 \cap \text{ker } T = \{0\}\), so \(v_1 = 0\). Therefore \(\text{ker } T|_{V_1} = \{0\}\) so \(T|_{V_1}\) is injective.
Let \(v \in V\). Clearly \(T(v) \in T(V)\). Write \(v = x + v_1\) for suitable \(x \in \text{ker } T\), \(v_1 \in V_1\). Then
so \(T|_{V_1}\) is surjective. Therefore \(V_1 \cong T(V)\).
Let \(\{x_1, \ldots, x_r\}\) be a basis for \(\text{ker } T\) and \(\{v_1, \ldots, v_m\}\) be a basis for \(V_1\). Then \(V = \text{ker}(T) \oplus V_1\) implies \(\{x_1, \ldots, x_r, v_1, \ldots, v_m\}\) is a basis for \(V\). Therefore
\(\text{dim } V = r + m = \text{dim } \text{ker } T + \text{dim } V_1\). Since \(V_1 \cong T(V)\), we have \(\text{dim } V = \text{nullity } T + \text{rank } T\).
\(\square\)
Theorem: Let \(V\) be a vector space over \(\mb{C}\) with \(\text{dim } V = n\). Then for every \(f \in V^\ast\) there exists a unique vector \(v \in v\) such that \(f = (v, \bullet)\).
Proof:
Let \(V\) be as in the statement of the theorem, and define \(\varphi: V \to V^\ast\) by \(\varphi(v) = (v, \bullet)\). We find \(\text{ker } \varphi\). Let \(v \in V\). Consider the case where \(\varphi(v) = (v, \bullet) = 0\) (the zero functional). There are two possible values for \(v\), \(v = 0\) and \(v \neq 0\). Assume \(v \neq 0\). Applying this functional to \(v\) we get \((v, v) = 0\) which implies that \(v = 0\), a contradiction. Thus \(\varphi(v) = 0\) implies \(v = 0\). Thus \(\text{ker } \varphi = \{0\}\).
Applying the above rank-nullity theorem to anti-linear \(\varphi\) gives \(\text{rank } \varphi = n - 0 = n\), so \(\varphi\) is surjective. Thus \(\varphi\) is a bijection.
We thus have that for every \(v \in V\) there exists a unique functional in \(V^\ast\) of the form \((v, \bullet)\).
\(\square\)
The adjoint
Let \(T \in \mc{L}(V, W)\). Recall the following definition (Axler, 2024, p. 107; Surowski, 1997, p. 39-41; Jain & Ahuja, p. 205):
Definition: \(T^{\text{dual}}: W^\ast \to V^\ast\) defined by \(T^{\text{dual}}(f) = f \circ T\) is called the dual map.
(for \(v \in V\), \(T(v) \in W\), since \(f \in W^\ast\), \(f \circ T: V \to \mb{F}\), so indeed \(f \circ T \in V^\ast\)).
Recall that the composition of linear functions is linear, so \(T^{\text{dual}}\) is linear and we can write \(T^{\text{dual}} \in \mc{L}(W^{\ast}, V^{\ast})\).
Let \(\varphi_1: V \to V^\ast\). and \(\varphi_2: W \to W^\ast\) be the bijections used in the Riesz representation theorem.
Let \(T: \mc{L}(V, W)\). We define the adjoint (Benyattou, 2026, p. 54; Jain & Ahuja, 2010, p.302) of \(T\), denoted by \(T^\ast: W \to V\), by \(T^\ast = \inv{\varphi_1} \circ T^{\text{dual}} \circ \varphi_2\).
We observe that if \(W = V\), \(\varphi_1 = \varphi_2\) and things are simpler.
Theorem: \(T^\ast\) is linear.
Proof: Let \(w_1, w_2 \in W\). Let \(v \in V\). Observe that
Since \(v \in V\) was arbitrary, it follows that for all \(w_1, w_2 \in W\), \(T^{\ast}(w_1 + w_2) = T^{\ast}(w_1) + T^{\ast}(w_2)\).
Now let \(w \in W\) and let \(a \in \mb{F}\). Observe that
\(v \in V\) was arbitrary, so for all \(w \in W\), \(a \in \mb{F}\), \(T^{\ast} (\alpha w) = \alpha T^{\ast} w\). \(\square\)
In particular, we note that although \(T^{\ast}\) is formed by composing three functions, two of which are not linear, \(T^{\ast}\) is linear.
Next, we observe an intriguing property of this function \(T^\ast\):
Theorem (Benyattou, 2026, p. 54; Surowski, 1997, p. 77): For \(T \in \mc{L}(V, W)\), for all \(w \in W\), \((T^{\ast} w, \bullet)_V = (w, T(\bullet))_W\).
Proof: Observe that
Also observe that
Thus we have for all \(w \in W\), \((T^{\ast}(w), \bullet)_V = (w, T(\bullet))_W\).
\(\square\)
A particular matrix representation of the adjoint
Theorem (Axler, 2024, p. 232): Let \(T \in \mc{L}(V, W)\) (so \(T^\ast \in \mc{L}(W, V)\)). Let \(\mc{A} = (v_1, \ldots, v_n)\) be an orthonormal basis for \(V\) and \(\mc{B} = (w_1, \ldots, w_m)\) be an orthonormal basis for \(W\). Then \({[T]_{\mc{B}\mc{A}}}^\top = \overline{[T^{\ast}]_{\mc{A}\mc{B}}}\).
Proof: Since \(\mc{B}\) is an orthonormal basis of \(W\), we can write for each \(i \in [m]\), \(Tv_i = (Tv_i, w_1) w_1 + \cdots + (Tv_i, w_m) w_m\). Therefore \(([T]_{\mc{B}\mc{A}})_{jk} = (Tv_k, w_j)\).
Similarly, since \(\mc{A}\) is an orthonormal basis of \(V\), we can write for each \(i \in [n]\), \(T^\ast w_i = (T^\ast w_i, v_1) v_1 + \cdots + (T^\ast w_i, v_n) v_n\), so \(([T^{\ast}]_{\mc{A}\mc{B}})_{jk} = (T^\ast w_k, v_j) = (w_k, T v_j) = \overline{(Tv_j, w_k)}\).
Note that \(({[T]_{\mc{B}\mc{A}}}^\top)_{jk} = ([T]_{\mc{B}\mc{A}})_{kj} = (Tv_j, w_k) = (\overline{[T^{\ast}]_{\mc{A}\mc{B}}})_{jk}\), so we have \({[T]_{\mc{B}\mc{A}}}^\top = \overline{[T^{\ast}]_{\mc{A}\mc{B}}}\). \(\square\)
Note that if \(\mc{A}\) and \(\mc{B}\) are not orthonormal bases for \(V\) and \(W\) respectively, then the conclusion of the Theorem does not necessarily hold.
Properties of the adjoint
Theorem (Axler, 2024, p. 230): For \(T \in \mc{L}(V, W)\), it follows that for all finite-dimensional inner product spaces \(U\)
- for all \(S: W \to U\), \((ST)^\ast = T^\ast S^\ast\)
- \((T^{\ast})^{\ast} = T\)
Proof: :
((2) is from Axler, 2024, p. 230.)
(1)
Let \(v \in V, w \in W\). By the above Theorem,
and thus for all \(v \in V\), \((T^{\ast})^{\ast} v = Tv\), so \((T^{\ast})^{\ast} = T\).
(2)
Let \(S: W \to U\), \(u \in U\), \(v \in V.\) Then
\((u, (ST)v) = ((ST)^\ast u, v)\) and also
Thus for all \(u \in U\), for all \(v \in V\), \(((ST)^\ast u, v) = ((T^{\ast} S{^\ast}) u, v)\), which implies for all \(u \in U\), \((ST)^\ast u = T^{\ast} S{^\ast} u\), which implies \((ST)^\ast = T^{\ast} S^{\ast}\).
Theorem (Axler, 2024, p. 231): Let \(T \in \mc{L}(V, W)\) be a linear transformation between finite dimensional \(V\) and \(W\). Then the following hold:
- \(\text{ker } T^\ast = (\text{Im } T)^\perp\)
- \(\text{Im } T^\ast = (\text{ker } T)^\perp\)
- \(\text{ker } T = (\text{Im } T^\ast)^\perp\)
- \(\text{Im } T = (\text{ker } T^\ast)^\perp\)
Proof (Axler, 2024, p. 231):
We first consider (1). \(\text{Im }T\) is a subspace of \(W\). By the definition of the orthogonal complement,
Now, let \(w \in W\). Then
where we used the fact that \(\forall v \in V \quad (w, Tv) = 0 \iff \forall v \in V \quad (Tv, w) = 0\). This holds because if we start with \(\forall v \in V \quad (w, Tv) = 0\), we find that \((Tv, w) = \overline{(w, Tv)} = \overline{0} = 0 = (w, Tv)\), and vice versa.
We have shown (1). Taking the orthogonal complement of both sides gives (4). Since \((T^{\ast})^\ast = T\), replacing \(T\) with \(T^{\ast}\) in (1) gives (3), and replacing \(T\) with \(T^{\ast}\) in (4) gives (2).
Unitary operators
Definition (Axler, 2024, p. 258): \(S \in \mc{L}(V, W)\) is called an isometry iff \(\forall\, v \in V \quad \norm{S v} = \norm{v}\).
Result: If \(S\) is an isometry, \(S^\ast S = \text{id}\).
Proof (Axler, 2024, p. 259): Let \(S\) be an isometry. Let \(v \in V\). We have
Thus \(\text{id} - S^\ast S = 0 \enspace :: \enspace S^\ast S = \text{id}\). \(\square\)
Definition (Axler, 2026, p. 260): \(S \in \mc{L}(V)\) is called unitary iff \(S\) is an invertible isometry.
We see that if \(S\) is unitary, then \(S^\ast = \inv{S}\).
Matrix form
Theorem: If \(S \in \mc{L}(V)\) is unitary and \(\mc{A}\) is an orthonormal basis for \(V\), then \([S]_{\mc{A}}\) has orthonormal columns (and thus \({\overline{[S]_{\mc{A}}}}^\top [S]_{\mc{A}} = I\), thus \({\overline{[S]_{\mc{A}}}}^\top = \inv{[S]_{\mc{A}}}\)).
Proof: Let \(S\) and \(\mc{A} = (v_1, \ldots, v_n)\) be as in the statement of the theorem. We must show that for any \(i, j \in [n]\), \(([S(v_i)]_{\mc{A}}, [S(v_j)]_{\mc{A}}) = \delta_{ij}\).
Let \(([S]_{\mc{A}})_{ij} = s_{ij}\).
Note that by definition, \(S(v_j) = \sum_{i=1}^n s_{ij} v_i\). Therefore
Since \(S\) is unitary, \((S(v_j), S(v_k)) = (v_j, v_k) = \delta_{jk}\).
Therefore \(\sum_{i=1}^{n} \overline{s_{ij}} s_{ik} = \delta_{jk}\). Since \([S(v_j)]_{\mc{A}} = (s_{1j}, \ldots, s_{nj})\) and \([S(v_k)]_{\mc{A}} = (s_{1k}, \ldots, s_{nk})\), we have that the columns of \([S]_{\mc{A}}\) are orthonormal.
\(\square\)
A Theorem
Lemma: For orthonormal bases \(\mc{A} = (v_1, \ldots, v_n)\) and \(\mc{B} = (w_1, \ldots, w_n)\) of \(V\), letting \(P = [\text{id}]_{\mc{B}\mc{A}}\), we have that \((P)_{kj} = (w_k, v_j)\).
Proof:
\(P = [[\text{id}(v_1)]_{\mc{B}}, \ldots, [\text{id}(v_n)_{\mc{B}}]] = [(v_1)_{\mc{B}}, \ldots, (v_n)_{\mc{B}}]\).
For any \(v \in V\), by an earlier Theorem we can write \(v = \sum_{i=1}^{n} (w_i, v) w_i\). Apply this to \(\text{id}(v_j) = v_j\). Thus the \(k\)th coordinate of \(v_j\) in basis \(\mc{B}\) is \((w_k, v_j)\).
Thus we have \((P)_{kj} = (w_k, v_j)\). \(\square\)
Lemma: For orthonormal bases \(\mc{A} = (v_1, \ldots, v_n)\) and \(\mc{B} = (w_1, \ldots, w_n)\) of \(V\), \((v_j, v_k) = \sum_{i=1}^{n} \overline{(w_i, v_j)} (w_i, v_k)\).
Proof:
Theorem: For orthonormal bases \(\mc{A} = (v_1, \ldots, v_n)\) and \(\mc{B} = (w_1, \ldots, w_n)\) of \(V\), \([\text{id}]_{\mc{B}\mc{A}}\) is a unitary matrix.
Proof: For simplicity denote \(P = [\text{id}]_{\mc{B}\mc{A}}\).
By our Lemma, we have \((P)_{ij} = (w_i, v_j)\).
Consider
Thus \(\overline{P}^\top P = I\).
\(\square\)
Diagonalizability of transformations in inner product spaces
In the last section, we noted that if \(T \in \mc{}L(V)\) is self-adjoint, then \(T\) is diagonalizable in a basis of eigenvectors \(\mc{A}\) where the vectors of \(\mc{A}\) are orthonormal. There are other \(T \in \mc{L}(V)\) that are diagonalizable in an orthonormal basis of eigenvectors. We discuss this general case now.
Notes on self-adjoint transformations
Definition: \(T \in \mc{L}(V)\) is self-adjoint iff \(T^\ast = T\).
Observe that for \(T\)-self adjoint, for all \(v \in V\) we have \((Tv, \bullet) = (v, T(\bullet))\).
Also oberve that for \(T\)-self adjoint, for orthonormal basis \(\mc{A}\) of \(V\) by a previous Theorem we have we have \({[T]_{\mc{A}}}^\top = \overline{[T^{\ast}]_{\mc{A}}} = \overline{[T]_{\mc{A}}}\). This means \([T]_{\mc{A}} = {\overline{[T]_{\mc{A}}}}^\top\), so \(([T]_{\mc{A}})_{ij} = ({\overline{[T]_{\mc{A}}}}^\top)_{ij} = (\overline{[T]_{\mc{A}}})_{ji} = \overline{([T]_{\mc{A}})_{ji}}\).
Result (Surowski, 1997, p. 79-80, Proposition 3.2.4): If \(T \in \mc{L}(V)\) is self-adjoint, then \(T\) is diagonalizable in an orthonormal basis of eigenvectors \(\mc{A}\).
Result: If \(\lambda\) is an eigenvalue of \(T\) and \(T\) is self-adjoint, then \(\lambda \in \mb{R}\).
Results for projection operators
Let \(W \subseteq V\), \(W\) a subspace. Let \(P = \text{proj}_W: V \to W\).
Claim: \(\text{ker } P = W^\perp\)
Proof: \(\text{ker } \text{proj}_W = \{v \in V;\, \text{proj}_W(v) = 0\}\)
Since \(V = W \oplus W^\perp\),
\(v \in W^\perp \iff v = 0 + w^\prime \iff \text{proj}_W(v) = 0 \iff v \in \text{ker } P\).
\(\therefore\, \text{ker } P = W^\perp\). \(\square\)
Claim: \(\text{im } P = W\).
Proof: \(v \in W \iff v = w + 0 \iff \text{proj}_W(v) = w = v \iff v \in \text{im } P\)
\(\therefore\, W = \text{im } P\). \(\square\)
Claim: \(P|_{W} = I_W\)
Proof: Let \(w \in W\). \(P|_{W}(w) = w\), so \(P|_{W} = I_W\). \(\square\)
Claim: \(P\) is idempotent.
Proof: Let \(v \in V\). \(\text{proj}_W(v) = w\). \(\text{proj}_W(w) = w\), so \(P^2 = P\). \(\square\)
Claim: \(\text{ker } P \perp \text{im } P\).
Proof: \(\text{ker } P = W^\perp\), and \(\text{im } P = W\). \(\square\)
References
Benyattou, Khallil Ebrahim. (2026). Linear Alkebra. Self-published.
Bube, Ken and James Burke. (2021). Math 554 Linear Analysis Autumn 2006 Lecture Notes [Lecture notes].
Jain, Pawan K. and Om P. Ahuja. (2010). Functional analysis (second ed.). New Age International (P) Limited.
Lang, Serge. (1987). Linear algebra (Third edition). Springer.
Lang, Serge. (1997). Undergraduate analysis (Second ed.). Springer Science+Business Media, Inc.
Surowski, David. (1997). Advanced Linear Algebra [Lecture notes].
How to cite this article
Wayman, Eric Alan. (2026). Linear algebra: inner product spaces. Eric Alan Wayman's technical notes. https://ericwayman.net/notes/linear-algebra-inner-product-spaces/
@misc{wayman2026linear-algebra-inner-product-spaces,
title={Linear algebra: inner product spaces},
author={Wayman, Eric Alan},
journal={Eric Alan Wayman's technical notes},
url={https://ericwayman.net/notes/linear-algebra-inner-product-spaces/},
year={2026}
}