Linear algebra: inner product spaces

Table of Contents

Hilbert spaces and finite dimensional inner product spaces

We begin by defining a Hilbert space, and show that every finite dimensional inner product space is a Hilbert space.

A vector space together with an inner product is called an inner product space (Jain & Ahuja, 2010, p. 235).

Note that an inner product \((\bullet, \bullet)\) induces a norm \(\norm{x} = (x, x)^{1/2}\) (Jain & Ahuja, 2010, p. 236).

A vector space together with a norm is called a normed vector space (Lang, 1997, p. 132).

Definition: A Banach space is a complete normed vector space..

Definition: A Hilbert space is a Banach space over \(\mb{C}\) or \(\mb{R}\) whose norm is induced by the inner product (Jain & Ahuja, 2010, p. 239).

We now provide some details on the definition of Cauchy sequences for such spaces, and completeness.

Any norm induces a metric, namely \(d(x, y) := \norm{x - y}\). We define various properties for normed vector spaces using this metric: when doing so, any normed vector space is a metric space. For example, a normed vector space is complete iff every Cauchy sequence converges, where the definition of Cauchy sequence uses the metric induced by the norm.

Definition: Two norms \(\norm{\bullet}_1\) and \(\norm{\bullet}_2\) on \(V\) are called equivalent norms on \(V\) iff there are constants \(C_1, C_2 > 0\) such that for all \(v \in V\),

$$ \frac{1}{C_1}\norm{v}_2 \leq \norm{v}_1 \leq C_2 \norm{v}_2 $$

(Bube & Burke, 2021, p. 22)

Theorem: If \(V\) if a finite dimensional vector space, then any two norms on \(V\) are equivalent.

Proof: See Bube & Burke (2021, p. 22-23).

Note that if \(V\) is not finite dimensional, we cannot guarantee that any two norms on \(V\) are equivalent (Bube & Burke, 2021, p. 23).

Fact: Both \(\mb{C}\) and \(\mb{R}\) are complete in the Euclidean norm (Bube & Burke, 2021, p. 27).

Proof:

(This is the "standard" proof for \(\mb{R}^n\), which turns out to be the same for \(\mb{C}^n\) since the definition of absolute value for complex numbers works in the inequalities in the same way as absolute value for real numbers.)

Let \(\mb{F}\) be either \(\mb{R}\) or \(\mb{C}\). Denoting \(z = (z_1, \ldots, z_n) \in \mb{F}\), define the Euclidean norm on \(\mb{F}^n\) as \(\norm{z}_2 := \sqrt{\sum_{i=1}^{n} \abs{z_i}^2}\).

Let \((z^{(n)})\) be a Cauchy sequence in \(\mb{F}^n\), namely

$$ \forall\, \varepsilon > 0 \quad \exists\, N \in \mb{N} \quad \forall\, m, n \geq N \quad \norm{z^{(m)} - z^{(n)}}_2 < \varepsilon $$

Since for every coordinate \(i \in [n]\), \(\abs{z_i^{(m)} - z_i^{(n)}} \leq \norm{z^{(m)} - z^{(n)}}_2\), we have that every coordinate sequence \((z_i^{(n)})\) is Cauchy. Since both \(\mb{R}\) and and \(\mb{C}\) are complete, each \((z_i^{(n)})\) converges to some \(z_i \in \mb{F}\) as \(n \to \infty\). Define \(z = (z_1, \ldots, z_n)\).

For an arbitrary \(\varepsilon > 0\), there thus exists an \(N \in \mb{N}\) greater than or equal to the maximum of all \(N_i\) used in the proofs of convergence of each of the coordinates (using \(\varepsilon / \sqrt{n}\) as the "epsilon" variable in each of those proofs), such that for all \(m \geq N\) we have

$$ \norm{z^{(m)} - z}_2 = \sqrt{\sum_{i=1}^{n} \abs{z_i^{(m)} - z}^2} < \sqrt{n \cdot \frac{\varepsilon^2}{n}} = \varepsilon $$

so \(\lim_{m \to \infty} \norm{z^{(m)} - z}_2 = 0\) which holds iff \(\lim_{m \to \infty} z^{(m)} = z\).

\(\square\)

Result: If two norms \(\norm{\bullet}_1\) and \(\norm{\bullet}_2\) on \(V\) are equivalent, then \((V, \norm{\bullet}_1)\) is complete iff \((V, \norm{\bullet}_2)\) is complete (Bube & Burke, 2021, p. 27).

Theorem: Every finite dimensional normed vector space over \(\mb{C}\) or \(\mb{R}\) is complete.

Proof: Let \(V\) be a finite dimensional vector space over \(\mb{F}\) where \(\mb{F} = \mb{C}\) or \(\mb{F} = \mb{R}\). Let \(\mc{A} = (v_1, \ldots, v_n)\) be a basis for \((V, \norm{\bullet})\) where \(\norm{\bullet}\) is an arbitrary norm. Consider the coordinate map \((\bullet)_{\mc{A}}\) where \((x)_\mc{A} = (c_1 x_1 + \cdots + c_n x_n)_\mc{A} = (c_1, ..., c_n) \in \mb{F}^n\).

On \(V\), define \(\norm{\bullet}_2\) by \(\norm{x}_2 = \norm{(x)_{\mc{A}}}_2\), the latter being the Euclidean norm in \(\mb{F}^n\).

Since \(V\) is finite-dimensional, by norm equivalence there exists \(C > 0\) such that for all \(x \in V\), \(\norm{x}_2 \leq C \norm{x}\).

Let \((a^{(k)})\) be a Cauchy sequence in \((V, \norm{\bullet})\).

By the norm equivalence relationship, for all \(k, j \geq N\) in the definition of Cauchy sequence, we have

$$ \norm{a^{(k)} - a^{(j)}}_2 \leq C \norm{a^{(k)} - a^{(j)}} $$

so \((a^{(k)}) \in (V, \norm{\bullet}_2)\) is Cauchy.

Recall that \(\norm{a^{(k)} - a^{(j)}}_2 = \norm{(a^{(k)} - a^{(j)})_{\mc{A}}}_2\). Therefore defining \(b^{(k)} = (a^{(k)})_{\mc{A}}\), we have that \(b^{(k)} \in (\mb{F}^n, \norm{\bullet}_2)\) is Cauchy.

\(\mb{F}^n\) is complete. Therefore since \(b^{(k)}\) is Cauchy, it converges to a limit \(L\).

Therefore \(a^{(k)} \in (V, \norm{\bullet}_2)\) converges. Therefore \((V, \norm{\bullet}_2)\) is complete. By the norm equivalence result, \((V, \norm{\bullet})\) is complete.

\(\square\)

We conclude from the above that any finite-dimensional inner product space over \(\mb{C}\) or \(\mb{R}\) is a Hilbert space.

Basic facts for regarding quadratics and complex fields

Discriminant of a quadratic

The following two results are used in the proof of the Cauchy-Schwarz Inequality (Surowski, 1997, p. 70):

Result: \(\text{disc}(q(t)) = 4(\text{Re}(w, v))^2 - 4 \norm{w}^2 \norm{v}^2 \leq 0\):

$$ \begin{align*} \text{disc}(q(t)) & = (-((w, v) + (v, w)))^2 - 4 \norm{w}^2 \norm{v}^2 \\ & = ((w, v) + (v, w))^2 - 4 \norm{w}^2 \norm{v}^2 \\ & = (2 \text{Re}(w, v))^2 - 4 \norm{w}^2 \norm{v}^2 \end{align*} $$

(since \(\overline{x} + x = (a - bi) + (a + bi) = 2a = 2\text{Re}(x)\)).

Result: Recall that for a polynomial \(q(t)\), if \(q(t)\) has a positive leading coefficient, \(q(t) \geq 0 \iff \text{disc}(q(t)) \leq 0\).

Complex field-related

The following two facts are used in the proof of the Cauchy-Schwarz Inequality (Surowski, 1997, p. 70):

Fact: For \(\alpha \in \mb{C}\), \(x \in V\), \(\norm{\alpha x} = \abs{\alpha} \norm{x}\).

Proof: Consider

$$ \begin{align*} (\alpha x, \alpha x) & = \alpha (\alpha x, x) = \alpha \overline{(x, \alpha x)} = \alpha \overline{\alpha (x, x)} \\ & = \alpha \overline{\alpha} \overline{(x, x)} \\ & = \alpha \overline{\alpha} (x, x) \\ & = \abs{\alpha}^2 (x, x) \end{align*} $$

where for \(\alpha = a + bi\), \(\abs{\alpha} = \sqrt{a^2 + b^2}\). \(\square\)

Therefore \(\norm{\alpha x} = \sqrt{(\alpha x, \alpha x)} = \abs{\alpha} (x, x)^{1/2} = \abs{\alpha} \norm{x}\). \(\square\)

Fact: \((v, w) + \overline{(v, w)} \leq 2 \abs{(v, w)}\).

Proof: \((v, w) + \overline{(v, w)} = 2\text{Re}(v, w)\), and \(\abs{\text{Re}(v, w)} \leq \abs{(v, w)}\). \(\square\)

Facts regarding the orthogonal complement of a subspace

Claim (Surowski, 1997, p. 72): Let \(W\) be a subspace of \(V\). Then \(W^\perp\) is a subspace of \(V\) .

Proof: Let \(x, y \in W^\perp\). Let \(w \in W\).

$$ \begin{align*} (x + y, w) & = \overline{(w, x + y)} = \overline{(w, x)} + \overline{(w, y)} \\ & = (x, w) + (y, w) = 0 + 0 \end{align*} $$

so \(x + y \in W^\perp\).

Let \(x \in W^\perp\), \(w \in W\).

$$ \begin{align*} (\alpha x, w) & = \overline{\alpha} (x, w) \\ & = \overline{\alpha} 0 = 0 \end{align*} $$

so \(\alpha x \in W^\perp\).

$$ \begin{align*} (0, w) & = (0 + 0, w) = \overline{(w, 0 + 0)} = \overline{(w, 0)} + \overline{(w, 0)} \\ & = (0, w) + (0, w) \iff 0 = (0, w). \end{align*} $$

\(\square\)

Claim (Surowski, 1997, p. 72): \(W^\perp = \bigcap_{w \in W} \text{ker } (w, \bullet)\)

Proof: Let \(x \in W^\perp\)

$$ \begin{align*} ::\enspace & \forall\, w \in W \quad (w, x) = 0 \\ ::\enspace & \forall\, w \in W \quad x \in \text{ker }(w, \bullet) \\ ::\enspace & x \in \bigcap_{w \in W} \text{ker }(w, \bullet) \end{align*} $$

\(\square\)

Claim (Surowski, 1997, p. 73): \(w \neq 0 \implies \text{ker }(w, \bullet)\) is a hyperplane.

Proof: Fix \(w \neq 0\). Let \(f(x) = (w, x)\). \(f: V \to \mb{C}\).

Since \(\text{dim}(\mb{C}) = 1\), we either have \(\text{dim } \text{Im } f = 1\) or \(\text{dim } \text{Im } f = 0\).

Note that \(f(w) = (w, w) > 0\), so \(f \neq 0\).

Therefore \(\text{rank } f > 0\). Thus \(\text{rank } f = 1\).

Therefore \(\text{dim }\text{ker } f = n - 1\).

Therefore \(\{x \in V;\, (w, x) = 0\} = \text{ker } f = \text{ker } (w, \bullet)\) is a hyperplane.

\(\square\)

Showing the orthogonal complement of the span of a vector is a hyperplane

A Lemma

Lemma: For \(u_1 \neq 0\), \(\cap_{w \in \langle u_1 \rangle} \text{ker } (w, \bullet) = \text{ker } (u_1, \bullet)\).

Proof:

Case 1: \(w = 0\). Then \(\text{ker }(w, \bullet) = V\).

Case 2: \(w \neq 0\) \(::\, \exists\, \alpha \in \mb{F} \quad \alpha \neq 0 \,\land\, w = \alpha u_1\).

Then

$$ \begin{align*} \text{ker } (w, \bullet) & = \{v \in V ;\, (\alpha u_1, \bullet) = 0\} \\ & = \{v \in V ;\, \overline{\alpha} (u_1, \bullet) = 0)\} \\ & = \{v \in V ;\, (u_1, \bullet) = 0)\} \\ & = \text{ker } (u_1, \bullet) \end{align*} $$

where we could eliminate \(\overline{\alpha}\) from both sides because \(\mb{F}\) is a field (recall: a field is an integral domain, where "integral" means commutative and a domain is a ring in which the zero product property holds).

Therefore

$$ \begin{align*} \bigcap_{w \in \langle u_1 \rangle} \text{ker } (w, \bullet) & = V \bigcap \left(\bigcap_{w \in \langle u_1 \rangle;\, w \neq 0} \text{ker } (w, \bullet)\right) \\ & = \bigcap_{w \in \langle u_1 \rangle;\, w \neq 0} \text{ker } (w, \bullet) \\ & = \bigcap_{w \in \langle u_1 \rangle;\, w \neq 0} \text{ker } (u_1, \bullet) \\ & = \text{ker } (u_1, \bullet) \end{align*} $$

\(\square\)

The proof

Claim: If \(u_1 \in V, u_1 \neq 0\), then \(u_1^\perp\) is a hyperplane (Surowski, 1997, p. 73).

Proof: Recall that by definition, \(u_1^\perp = \langle u_1 \rangle^\perp\).

We have

$$ \begin{align*} x \in \langle u_1 \rangle^\perp & ::\enspace \forall\, w \in \langle u_1 \rangle \quad (w, x) = 0 \\ & ::\enspace \forall\, w \in \langle u_1 \rangle \quad x \in \text{ker }(w, \bullet) \\ & ::\enspace x \in \bigcap_{w \in \langle u_1 \rangle} \text{ker } (w, \bullet) \\ & ::\enspace x \in \text{ker }(u_1, \bullet) \end{align*} $$

where the last step uses the Lemma.

We have shown \(\langle u_1 \rangle^\perp = \text{ker }(u_1, \bullet)\).

Above, we showed that for a vector \(w \neq 0\), \(\text{dim }\text{ker }(w, \bullet) = n - 1\).

Thus, \(\langle u_1 \rangle^\perp\) is a hyperplane.

\(\square\)

Showing the span of a vector intersection the orthogonal complement of that span is the set containing only the zero vector

Claim: \(\langle u_1 \rangle \cap \langle u_1 \rangle^\perp = \{0\}\) (Surowski, 1997, p. 73).

Proof:

First of all, since both \(\langle u_1 \rangle\) and \(\langle u_1 \rangle^\perp\) are subspaces, the zero vector is in both of them, so we certainly have \(\{0\} \subseteq \langle u_1 \rangle \cap \langle u_1 \rangle^\perp\). We now show the reverse inclusion.

Let \(y \in \langle u_1 \rangle \cap \langle u_1 \rangle^\perp\).

Case 1: \(y = 0\). Then the inclusion is satisfied.

Case 2: \(y \neq 0\), so \(\exists\, \alpha \in \mb{C} \quad \alpha \neq 0 \,\land\, y = \alpha x\).

Note that \(y \in \langle u_1 \rangle\) implies \(\exists\, \alpha \in \mb{C} \quad y = \alpha x\).

Since \(y \in \langle u_1 \rangle^\perp\), by an earlier result we have \(y \in \text{ker } (u_1, \bullet)\), so \(0 = (u_1, y) = (u_1, \alpha u_1) = \overline{\alpha} (u_1, u_1)\).

Using the zero product property of fields, since \(\overline{\alpha}(u_1, u_1) = 0 \land \alpha \neq 0\) we conclude \((u_1, u_1) = 0\). Therefore \(u_1 = 0\), a contradiction. Thus Case 2 cannot happen.

We have thus shown that \(y \in \langle u_1 \rangle \cap \langle u_1 \rangle^\perp\) implies \(y = 0\).

Therefore \(\langle u_1 \rangle \cap \langle u_1 \rangle^\perp \subseteq \{0\}\).

Thus the two inclusions show that \(\langle u_1 \rangle \cap \langle u_1 \rangle^\perp = \{0\}\).

\(\square\)

Taking the orthogonal complement of both sides of a set inclusion

Fact (Axler, 2024, p. 211): If \(G\) and \(H\) are subsets of \(V\) and \(G \subseteq H\), then \(H^\perp \subseteq G^\perp\).

Proof (Axler, 2024, p. 212): Let \(G\) and \(H\) be subsets of \(V\) where \(G \subseteq H\). Let \(v \in H^{\perp}\). Then for all \(u \in H\), \((u, v) = 0\), so \((v, u) = 0\) for all \(u \in G\). Thus \(v \in G^{\perp}\). We have thus shown that \(H^{\perp} \subseteq G^{\perp}\). \(\square\)

Notes on the Gram-Schmidt procedure

Starting with \((v_1, \ldots, v_n)\) linearly independent vectors, the procedure produces an orthonormal list of vectors \((u_1, \ldots, u_n)\) one at a time, where

\(\langle u_1 \rangle = \langle v_1 \rangle\)

\(\langle u_1, u_2 \rangle = \langle v_1, v_2 \rangle\)

and so on, where for example

$$ u_2 := \frac{v_2 - \overline{(v_2, u_1)} u_1}{\norm{v_2 - \overline{(v_2, u_1)} u_1}} $$
$$ u_3 := \frac{v_3 - \overline{(v_3, u_2)} u_2 - \overline{(v_3, u_1)} u_1}{\norm{v_3 - \overline{(v_3, u_2)} u_2 - \overline{(v_3, u_1)}}} $$

and so on.

It is a simple matter to check that \((u_2, u_1) = 0\).

We note:

$$ (\overline{(v_2, u_1)}, u_1) = (u_1, (v_2, u_1) u_1) = (v_2, u_1) $$

Claim: \(\langle u_1, u_2 \rangle = \langle v_1, v_2 \rangle\).

Proof: \(u_2 \in \langle v_1, v_2 \rangle\) by the definition of \(u_2\). \(u_1 \in \langle v_1, v_2 \rangle\) since \(u_1 = v_1\). Therefore \(\langle u_1, u_2 \rangle \subseteq \langle v_1, v_2 \rangle\). Since the dimensions of the two are equal, we have \(\langle u_1, u_2 \rangle = \langle v_1, v_2 \rangle\). \(\square\)

Checking the orthonormality of \(u_3\) and \(u_1\) vectors in the step that defines \(u_3\):

$$ \begin{align*} (u_3, u_1) & = \overline{c} (v_3 - \overline{(v_3, u_2)} u_2 - \overline{(v_3, u_1)}u_1, u_1) \\ & = (v_3, u_1) - (v_3, u_2) \underbrace{(u_2, u_1)}_{=0} - (v_3, u_1)\underbrace{(u_1, u_1)}_{=1} \\ & = 0 \end{align*} $$

Notes on showing that a vector can be decomposed into a linear combination of orthonormal basis vectors, where each coefficient is the inner product of the vector with the corresponding basis vector

This fact is Corollary 3.1.4.2 of Surowski (1997, p. 74).

The proof of Surowski makes use of several results, which we prove here.

Claim: \(\langle u_1 \rangle^\perp \cap \cdots \cap \langle u_n \rangle^\perp = V^\perp\).

Proof:

(\(\subseteq\))

Let \(x \in \langle u_1 \rangle^\perp \cap \cdots \cap \langle u_n \rangle^\perp\). Therefore \(\forall\, i \in [n] \quad (u_i, x) = 0\).

Let \(v \in V\). It follows that \(\exists\, \alpha_1, \ldots, \alpha_n \in \mb{C} \quad v = \sum_{i=1}^{n} \alpha_i u_i\). Therefore

$$ (v, x) = \left(\sum_{i=1}^{n} \alpha_i u_i, x\right) = \sum_{i=1}^{n} \overline{\alpha_i} (u_i, x) = \sum_{i=1}^{n} \overline{\alpha_i} 0 = 0. $$

We have shown that \(\forall v \in V \quad (v, x) = 0\). Therefore, \(x \in V^\perp\).

(\(\supseteq\))

Let \(x \in V^\perp\), so \(\forall\, v \in V \quad (v, x) = 0\). Therefore \(\forall\, i \in [n] \quad (u_i, x) = 0\). Thus \(\forall\, i \in [n] \enspace \forall\, \alpha \in \mb{C} \quad (\alpha u_i, x) = \overline{\alpha}(u_i, x) = 0\), so \(\forall\, i \in [n] \enspace \forall\, y \in \langle u_i \rangle \quad (y, x) = 0\)

Therefore \(\forall\, i \in [n] \quad x \in \langle u_1 \rangle^\perp\), so \(x \in \cap_{i=1}^n \langle u_1 \rangle^\perp\).

Putting the above two results together, we have \(\cap_{i=1}^n \langle u_i \rangle^\perp = V^\perp\).

\(\square\)

Claim: \(V^\perp = \{0\}\).

(\(\subseteq\))

$$ V^\perp = \{v \in V;\, \forall\, y \in V \quad (y, v) = 0\} $$

Let \(x \in V^\perp\). Thus \((x, x) = 0\), so \(x = 0\). Therefore \(V^\perp \subseteq \{0\}\).

(\(\supseteq\))

\(V^\perp\) is a subspace of \(V\), so \(\{0\} \subseteq V^\perp\).

Putting the above two together, we have \(V^\perp = \{0\}\).

\(\square\)

Note on the last step of the proof

In the proof, it is shown that \(v - \sum_{i=1}^{n} (u_i, v) u_i \in \{0\}\). This implies \(v = \sum_{i=1}^{n} (u_i, v) u_i\).

Note on proving that for any subspace, a vector space equals the direct sum of that subspace and its complement

This is Corollary 3.1.4.3 of Surowski (1997, p. 75).

Let \(W \subseteq V\), \(W\) a subspace of \(V\). Let \((w_1, \ldots, w_k)\) be an orthonormal basis of \(W\).

For any vector \(v \in V\), let \(v^\prime = \sum_{j=1}^{k} (w_j, v) w_j \in W\).

Claim: \(v - v^\prime \in W^\perp\) implies \(v \in v^\prime + W^\perp\) (Surowski, 1997, p. 75).

Proof: Let \(x = v - v^\prime \in W^\perp\). \(x = v - v^\prime :: v = x + v^\prime\).

\(x \in W^\perp\) and \(v^\prime \in W\), so

$$ x + v^\prime \in \{z \in V;\, \exists\, w \in W^\perp \quad z = v^\prime + w\} = v^\prime + W^\perp . $$

\(\square\)

Claim: \(v^\prime + W^\perp \subseteq W + W^\perp\) (Surowski, 1997, p. 75).

Proof: Let \(x \in v^\prime + W^\perp\). Therefore \(\exists\, w \in W^\perp \quad x = v^\prime + w \in W + W^\perp\). Thus \(v^\prime + W^\perp \subseteq W + W^\perp\). \(\square\)

Notes on showing that the orthogonal complement of the orthogonal complement of a subspace is the subspace itself

This fact is Corollary 3.1.4.4 of Surowski (1997, p. 75).

Corollary: If \(W\) is a subspace of \(V\), then \(W^{\perp\perp} = W\).

Proof:

Assume \(\text{dim } V = n\) and \(\text{dim } W = k\).

From an above result, we have \(V = W \oplus W^\perp\). Therefore by Lang, Chapter I, Theorem 4.3 (1987, p. 20), \(\text{dim } V = \text{dim } W + \text{dim } W^\perp\). Therefore \(\text{dim } W^\perp = n - k\).

\(W^\perp = \{v \in V;\, \forall w \in W \quad (w, v) = 0\}\)

\(W^{\perp\perp} := (W^\perp)^\perp = \{v \in V;\, \forall\, z \in W^\perp \quad (z, v) = 0\}\)

We will now show that \(\text{dim } W^{\perp\perp} = k\)

\(V = W^\perp \oplus W^{\perp\perp}\). Using the same result from Lang again, we have \(n = (n - k) + \text{dim } W^{\perp\perp}\). Therefore \(\text{dim } W^{\perp\perp} = k\).

We show that \(W \subseteq W^{\perp\perp}\): let \(x \in W\). Begin sub-proof: let \(z \in W^\perp\). Therefore \(\forall y \in W \quad (y, z) = 0\) which implies that for all \(y \in W\) we have \((z, y) = \overline{(y, z)} = (y, z) = 0\), so in particular, \((z, x) = 0\). End sub-proof.

We just showed \(\forall\, z \in W^\perp \quad (z, x) = 0\), which implies \(x \in W^{\perp\perp}\). Therefore \(W \subseteq W^{\perp\perp}\).

Since \(W \subseteq W^{\perp\perp}\) and \(\text{dim } W = \text{dim } W^{\perp\perp}\), we have \(W = W^{\perp\perp}\).

\(\square\)

Statement and proof of the Riesz Representation Theorem

Let \(V\) be a vector space over \(\mb{C}\) with \(\text{dim } V = n\). We consider the function \(\varphi: V \to V^\ast\) defined by \(\varphi(v) = (v, \bullet)\).

We note that this function is anti-linear, namely that for all \(v_1, v_2 \in V\),

$$ \begin{align*} \varphi(v_1 + v_2) & = (v_1 + v_2, \bullet) \\ & = (v_1, \bullet) + (v_2, \bullet) \\ & = \varphi(v_1) + \varphi(v_2) \end{align*} $$

but for all \(\alpha \in \mb{C}\), \(v \in V\),

$$ \begin{align*} \varphi(\alpha v) & = (\alpha v, \bullet) \\ & = \overline{\alpha} (v, \bullet) \\ & = \overline{\alpha} \varphi(v) \end{align*} $$

We claim that the rank-nullity theorem holds for anti-linear functions (its initial statement was for linear functions). We redo that proof here for the case of anti-linear functions:

Theorem (rank-nullity, but for anti-linear functions): Let \(T: V \to W\) be an anti-linear transformation. Then \(\text{rank } T = \text{dim } V - \text{nullity } V\).

Proof (Surowski, 1997, p. 16-17): We first find a subspace \(V_1 \subseteq V\) such that \(V = \text{ker}(T) \oplus V_1\). We restrict \(T\) to \(V_1\), i.e. \(T|_{V_1}: V_1 \to T(V_1)\).

We show that \(T|_{V_1}\) is invertible. Namely, if \(v_1 \in \text{ker}(T|_{V_1})\), then \(T(v_1) = 0\) implies \(v_1 \in V_1 \cap \text{ker } T = \{0\}\), so \(v_1 = 0\). Therefore \(\text{ker } T|_{V_1} = \{0\}\) so \(T|_{V_1}\) is injective.

Let \(v \in V\). Clearly \(T(v) \in T(V)\). Write \(v = x + v_1\) for suitable \(x \in \text{ker } T\), \(v_1 \in V_1\). Then

$$ \begin{align*} T(v) & = T(x + v_1) \\ & = T(x) + T(v_1) \\ & = 0 + T(v_1) \\ & = T(v_1) \end{align*} $$

so \(T|_{V_1}\) is surjective. Therefore \(V_1 \cong T(V)\).

Let \(\{x_1, \ldots, x_r\}\) be a basis for \(\text{ker } T\) and \(\{v_1, \ldots, v_m\}\) be a basis for \(V_1\). Then \(V = \text{ker}(T) \oplus V_1\) implies \(\{x_1, \ldots, x_r, v_1, \ldots, v_m\}\) is a basis for \(V\). Therefore

\(\text{dim } V = r + m = \text{dim } \text{ker } T + \text{dim } V_1\). Since \(V_1 \cong T(V)\), we have \(\text{dim } V = \text{nullity } T + \text{rank } T\).

\(\square\)

Theorem: Let \(V\) be a vector space over \(\mb{C}\) with \(\text{dim } V = n\). Then for every \(f \in V^\ast\) there exists a unique vector \(v \in v\) such that \(f = (v, \bullet)\).

Proof:

Let \(V\) be as in the statement of the theorem, and define \(\varphi: V \to V^\ast\) by \(\varphi(v) = (v, \bullet)\). We find \(\text{ker } \varphi\). Let \(v \in V\). Consider the case where \(\varphi(v) = (v, \bullet) = 0\) (the zero functional). There are two possible values for \(v\), \(v = 0\) and \(v \neq 0\). Assume \(v \neq 0\). Applying this functional to \(v\) we get \((v, v) = 0\) which implies that \(v = 0\), a contradiction. Thus \(\varphi(v) = 0\) implies \(v = 0\). Thus \(\text{ker } \varphi = \{0\}\).

Applying the above rank-nullity theorem to anti-linear \(\varphi\) gives \(\text{rank } \varphi = n - 0 = n\), so \(\varphi\) is surjective. Thus \(\varphi\) is a bijection.

We thus have that for every \(v \in V\) there exists a unique functional in \(V^\ast\) of the form \((v, \bullet)\).

\(\square\)

The adjoint

Let \(T \in \mc{L}(V, W)\). Recall the following definition (Axler, 2024, p. 107; Surowski, 1997, p. 39-41; Jain & Ahuja, p. 205):

Definition: \(T^{\text{dual}}: W^\ast \to V^\ast\) defined by \(T^{\text{dual}}(f) = f \circ T\) is called the dual map.

(for \(v \in V\), \(T(v) \in W\), since \(f \in W^\ast\), \(f \circ T: V \to \mb{F}\), so indeed \(f \circ T \in V^\ast\)).

Recall that the composition of linear functions is linear, so \(T^{\text{dual}}\) is linear and we can write \(T^{\text{dual}} \in \mc{L}(W^{\ast}, V^{\ast})\).

Let \(\varphi_1: V \to V^\ast\). and \(\varphi_2: W \to W^\ast\) be the bijections used in the Riesz representation theorem.

Let \(T: \mc{L}(V, W)\). We define the adjoint (Benyattou, 2026, p. 54; Jain & Ahuja, 2010, p.302) of \(T\), denoted by \(T^\ast: W \to V\), by \(T^\ast = \inv{\varphi_1} \circ T^{\text{dual}} \circ \varphi_2\).

We observe that if \(W = V\), \(\varphi_1 = \varphi_2\) and things are simpler.

Theorem: \(T^\ast\) is linear.

Proof: Let \(w_1, w_2 \in W\). Let \(v \in V\). Observe that

$$ \begin{align*} (v, T^{\ast}(w_1 + w_2) & = (Tv, w_1 + w_2) \\ & = (Tv, w_1) + (Tv, w_2) \\ & = (v, T^{\ast} w_1) + (v, T^{\ast} w_2) \end{align*} $$

Since \(v \in V\) was arbitrary, it follows that for all \(w_1, w_2 \in W\), \(T^{\ast}(w_1 + w_2) = T^{\ast}(w_1) + T^{\ast}(w_2)\).

Now let \(w \in W\) and let \(a \in \mb{F}\). Observe that

$$ \begin{align*} (v, T^{\ast} (\alpha w)) & = (Tv, \alpha w) \\ & = \alpha (Tv, w) \\ & = \alpha (v, T^{\ast} w) \\ & = (v, \alpha T^{\ast} w) \end{align*} $$

\(v \in V\) was arbitrary, so for all \(w \in W\), \(a \in \mb{F}\), \(T^{\ast} (\alpha w) = \alpha T^{\ast} w\). \(\square\)

In particular, we note that although \(T^{\ast}\) is formed by composing three functions, two of which are not linear, \(T^{\ast}\) is linear.

Next, we observe an intriguing property of this function \(T^\ast\):

Theorem (Benyattou, 2026, p. 54; Surowski, 1997, p. 77): For \(T \in \mc{L}(V, W)\), for all \(w \in W\), \((T^{\ast} w, \bullet)_V = (w, T(\bullet))_W\).

Proof: Observe that

$$ \begin{align*} T^{\text{dual}}((w, \bullet)_W) & = ((\varphi_1 \circ \inv{\varphi_1}) \circ T^{\text{dual}})((w, \bullet)_W) \\ & = ((\varphi_1 \circ \inv{\varphi_1}) \circ T^{\text{dual}})(\varphi_2(w)) \\ & = (\varphi_1 \circ T^\ast)(w) \\ & = \varphi_1 (T^{\ast}(w)) \\ & = (T^{\ast}(w), \bullet)_V \end{align*} $$

Also observe that

$$ T^{\text{dual}}((w, \bullet)_W) = (w, \bullet_W) \circ T = (w, T(\bullet))_W $$

Thus we have for all \(w \in W\), \((T^{\ast}(w), \bullet)_V = (w, T(\bullet))_W\).

\(\square\)

A particular matrix representation of the adjoint

Theorem (Axler, 2024, p. 232): Let \(T \in \mc{L}(V, W)\) (so \(T^\ast \in \mc{L}(W, V)\)). Let \(\mc{A} = (v_1, \ldots, v_n)\) be an orthonormal basis for \(V\) and \(\mc{B} = (w_1, \ldots, w_m)\) be an orthonormal basis for \(W\). Then \({[T]_{\mc{B}\mc{A}}}^\top = \overline{[T^{\ast}]_{\mc{A}\mc{B}}}\).

Proof: Since \(\mc{B}\) is an orthonormal basis of \(W\), we can write for each \(i \in [m]\), \(Tv_i = (Tv_i, w_1) w_1 + \cdots + (Tv_i, w_m) w_m\). Therefore \(([T]_{\mc{B}\mc{A}})_{jk} = (Tv_k, w_j)\).

Similarly, since \(\mc{A}\) is an orthonormal basis of \(V\), we can write for each \(i \in [n]\), \(T^\ast w_i = (T^\ast w_i, v_1) v_1 + \cdots + (T^\ast w_i, v_n) v_n\), so \(([T^{\ast}]_{\mc{A}\mc{B}})_{jk} = (T^\ast w_k, v_j) = (w_k, T v_j) = \overline{(Tv_j, w_k)}\).

Note that \(({[T]_{\mc{B}\mc{A}}}^\top)_{jk} = ([T]_{\mc{B}\mc{A}})_{kj} = (Tv_j, w_k) = (\overline{[T^{\ast}]_{\mc{A}\mc{B}}})_{jk}\), so we have \({[T]_{\mc{B}\mc{A}}}^\top = \overline{[T^{\ast}]_{\mc{A}\mc{B}}}\). \(\square\)

Note that if \(\mc{A}\) and \(\mc{B}\) are not orthonormal bases for \(V\) and \(W\) respectively, then the conclusion of the Theorem does not necessarily hold.

Properties of the adjoint

Theorem (Axler, 2024, p. 230): For \(T \in \mc{L}(V, W)\), it follows that for all finite-dimensional inner product spaces \(U\)

  1. for all \(S: W \to U\), \((ST)^\ast = T^\ast S^\ast\)
  2. \((T^{\ast})^{\ast} = T\)

Proof: :

((2) is from Axler, 2024, p. 230.)

(1)

Let \(v \in V, w \in W\). By the above Theorem,

$$ ((T^{\ast})^{\ast} v, w) = (v, T^{\ast} w) = \overline{(T^{\ast} w, v)} = \overline{(w, Tv)} = (Tv, w) $$

and thus for all \(v \in V\), \((T^{\ast})^{\ast} v = Tv\), so \((T^{\ast})^{\ast} = T\).

(2)

Let \(S: W \to U\), \(u \in U\), \(v \in V.\) Then

\((u, (ST)v) = ((ST)^\ast u, v)\) and also

$$ \begin{align*} (u, (ST)v) & = (u, S(T(v))) \\ & = (S^{\ast}, Tv) \\ & = (T^{\ast} S{^\ast} u, v) \end{align*} $$

Thus for all \(u \in U\), for all \(v \in V\), \(((ST)^\ast u, v) = ((T^{\ast} S{^\ast}) u, v)\), which implies for all \(u \in U\), \((ST)^\ast u = T^{\ast} S{^\ast} u\), which implies \((ST)^\ast = T^{\ast} S^{\ast}\).

Theorem (Axler, 2024, p. 231): Let \(T \in \mc{L}(V, W)\) be a linear transformation between finite dimensional \(V\) and \(W\). Then the following hold:

  1. \(\text{ker } T^\ast = (\text{Im } T)^\perp\)
  2. \(\text{Im } T^\ast = (\text{ker } T)^\perp\)
  3. \(\text{ker } T = (\text{Im } T^\ast)^\perp\)
  4. \(\text{Im } T = (\text{ker } T^\ast)^\perp\)

Proof (Axler, 2024, p. 231):

We first consider (1). \(\text{Im }T\) is a subspace of \(W\). By the definition of the orthogonal complement,

$$ \begin{align*} (\text{Im } T)^\perp & = \{w \in W ;\, \forall\, z \in \text{Im } T \quad (z, w) = 0\} \\ & = \{w \in W ;\, \forall\, v \in V \quad (Tv, w) = 0\} \end{align*} $$

Now, let \(w \in W\). Then

$$ \begin{align*} w \in \text{ker } T^\ast & \iff T^\ast w = 0 \\ & \iff \forall\, v \in V \quad (T^\ast w, v) = 0 \\ & \iff \forall\, v \in V \quad (w, Tv) = 0 \\ & \iff \forall\, v \in V \quad (Tv, w) = 0 \\ & \iff w \in (\text{Im } T)^{\perp} \end{align*} $$

where we used the fact that \(\forall v \in V \quad (w, Tv) = 0 \iff \forall v \in V \quad (Tv, w) = 0\). This holds because if we start with \(\forall v \in V \quad (w, Tv) = 0\), we find that \((Tv, w) = \overline{(w, Tv)} = \overline{0} = 0 = (w, Tv)\), and vice versa.

We have shown (1). Taking the orthogonal complement of both sides gives (4). Since \((T^{\ast})^\ast = T\), replacing \(T\) with \(T^{\ast}\) in (1) gives (3), and replacing \(T\) with \(T^{\ast}\) in (4) gives (2).

Unitary operators

Definition (Axler, 2024, p. 258): \(S \in \mc{L}(V, W)\) is called an isometry iff \(\forall\, v \in V \quad \norm{S v} = \norm{v}\).

Result: If \(S\) is an isometry, \(S^\ast S = \text{id}\).

Proof (Axler, 2024, p. 259): Let \(S\) be an isometry. Let \(v \in V\). We have

$$ \begin{align*} ((\text{id} - S^\ast S v), v) & = (\text{id}(v), v) - (S^\ast S v, v) = (v, v) - (Sv, Sv) \\ & = \norm{v}^2 - \norm{Sv}^2 = 0 \end{align*} $$

Thus \(\text{id} - S^\ast S = 0 \enspace :: \enspace S^\ast S = \text{id}\). \(\square\)

Definition (Axler, 2026, p. 260): \(S \in \mc{L}(V)\) is called unitary iff \(S\) is an invertible isometry.

We see that if \(S\) is unitary, then \(S^\ast = \inv{S}\).

Matrix form

Theorem: If \(S \in \mc{L}(V)\) is unitary and \(\mc{A}\) is an orthonormal basis for \(V\), then \([S]_{\mc{A}}\) has orthonormal columns (and thus \({\overline{[S]_{\mc{A}}}}^\top [S]_{\mc{A}} = I\), thus \({\overline{[S]_{\mc{A}}}}^\top = \inv{[S]_{\mc{A}}}\)).

Proof: Let \(S\) and \(\mc{A} = (v_1, \ldots, v_n)\) be as in the statement of the theorem. We must show that for any \(i, j \in [n]\), \(([S(v_i)]_{\mc{A}}, [S(v_j)]_{\mc{A}}) = \delta_{ij}\).

Let \(([S]_{\mc{A}})_{ij} = s_{ij}\).

Note that by definition, \(S(v_j) = \sum_{i=1}^n s_{ij} v_i\). Therefore

$$ \begin{align*} (S(v_j), S(v_k)) & = \left(\sum_{i=1}^{n} s_{ij} v_i, \sum_{l=1}^{n} s_{lk} v_l\right) \\ & = \sum_{i=1}^{n} \sum_{l=1}^{n} \overline{s_{ij}} s_{lk} (v_i, v_l) \\ & = \sum_{i=1}^{n} \overline{s_{ij}} s_{ik} \end{align*} $$

Since \(S\) is unitary, \((S(v_j), S(v_k)) = (v_j, v_k) = \delta_{jk}\).

Therefore \(\sum_{i=1}^{n} \overline{s_{ij}} s_{ik} = \delta_{jk}\). Since \([S(v_j)]_{\mc{A}} = (s_{1j}, \ldots, s_{nj})\) and \([S(v_k)]_{\mc{A}} = (s_{1k}, \ldots, s_{nk})\), we have that the columns of \([S]_{\mc{A}}\) are orthonormal.

\(\square\)

A Theorem

Lemma: For orthonormal bases \(\mc{A} = (v_1, \ldots, v_n)\) and \(\mc{B} = (w_1, \ldots, w_n)\) of \(V\), letting \(P = [\text{id}]_{\mc{B}\mc{A}}\), we have that \((P)_{kj} = (w_k, v_j)\).

Proof:

\(P = [[\text{id}(v_1)]_{\mc{B}}, \ldots, [\text{id}(v_n)_{\mc{B}}]] = [(v_1)_{\mc{B}}, \ldots, (v_n)_{\mc{B}}]\).

For any \(v \in V\), by an earlier Theorem we can write \(v = \sum_{i=1}^{n} (w_i, v) w_i\). Apply this to \(\text{id}(v_j) = v_j\). Thus the \(k\)th coordinate of \(v_j\) in basis \(\mc{B}\) is \((w_k, v_j)\).

Thus we have \((P)_{kj} = (w_k, v_j)\). \(\square\)

Lemma: For orthonormal bases \(\mc{A} = (v_1, \ldots, v_n)\) and \(\mc{B} = (w_1, \ldots, w_n)\) of \(V\), \((v_j, v_k) = \sum_{i=1}^{n} \overline{(w_i, v_j)} (w_i, v_k)\).

Proof:

$$ \begin{align*} (v_j, v_k) & = \left(\sum_{i=1}^{n} (w_i, v_j) w_i, \sum_{m=1}^{n} (w_m, v_k) w_m\right) \\ & = \sum_{i=1}^{n} \sum_{m=1}^{n} \overline{(w_i, v_j)} (w_m, v_k) (w_i, w_m) \\ & = \sum_{i=1}^{n} \overline{(w_i, v_j)} (w_i, v_k) \end{align*} $$

Theorem: For orthonormal bases \(\mc{A} = (v_1, \ldots, v_n)\) and \(\mc{B} = (w_1, \ldots, w_n)\) of \(V\), \([\text{id}]_{\mc{B}\mc{A}}\) is a unitary matrix.

Proof: For simplicity denote \(P = [\text{id}]_{\mc{B}\mc{A}}\).

By our Lemma, we have \((P)_{ij} = (w_i, v_j)\).

Consider

$$ \begin{align*} (\overline{P}^\top P)_{jk} & = \sum_{l=1}^{n} (\overline{P})_{lj} (P)_{lk} = \sum_{l=1}^{n} \overline{(w_l, v_j)} (w_l, v_k) \\ & = (v_j, v_k) \end{align*} $$

Thus \(\overline{P}^\top P = I\).

\(\square\)

Diagonalizability of transformations in inner product spaces

In the last section, we noted that if \(T \in \mc{}L(V)\) is self-adjoint, then \(T\) is diagonalizable in a basis of eigenvectors \(\mc{A}\) where the vectors of \(\mc{A}\) are orthonormal. There are other \(T \in \mc{L}(V)\) that are diagonalizable in an orthonormal basis of eigenvectors. We discuss this general case now.

Notes on self-adjoint transformations

Definition: \(T \in \mc{L}(V)\) is self-adjoint iff \(T^\ast = T\).

Observe that for \(T\)-self adjoint, for all \(v \in V\) we have \((Tv, \bullet) = (v, T(\bullet))\).

Also oberve that for \(T\)-self adjoint, for orthonormal basis \(\mc{A}\) of \(V\) by a previous Theorem we have we have \({[T]_{\mc{A}}}^\top = \overline{[T^{\ast}]_{\mc{A}}} = \overline{[T]_{\mc{A}}}\). This means \([T]_{\mc{A}} = {\overline{[T]_{\mc{A}}}}^\top\), so \(([T]_{\mc{A}})_{ij} = ({\overline{[T]_{\mc{A}}}}^\top)_{ij} = (\overline{[T]_{\mc{A}}})_{ji} = \overline{([T]_{\mc{A}})_{ji}}\).

Result (Surowski, 1997, p. 79-80, Proposition 3.2.4): If \(T \in \mc{L}(V)\) is self-adjoint, then \(T\) is diagonalizable in an orthonormal basis of eigenvectors \(\mc{A}\).

Result: If \(\lambda\) is an eigenvalue of \(T\) and \(T\) is self-adjoint, then \(\lambda \in \mb{R}\).

Results for projection operators

Let \(W \subseteq V\), \(W\) a subspace. Let \(P = \text{proj}_W: V \to W\).

Claim: \(\text{ker } P = W^\perp\)

Proof: \(\text{ker } \text{proj}_W = \{v \in V;\, \text{proj}_W(v) = 0\}\)

Since \(V = W \oplus W^\perp\),

\(v \in W^\perp \iff v = 0 + w^\prime \iff \text{proj}_W(v) = 0 \iff v \in \text{ker } P\).

\(\therefore\, \text{ker } P = W^\perp\). \(\square\)

Claim: \(\text{im } P = W\).

Proof: \(v \in W \iff v = w + 0 \iff \text{proj}_W(v) = w = v \iff v \in \text{im } P\)

\(\therefore\, W = \text{im } P\). \(\square\)

Claim: \(P|_{W} = I_W\)

Proof: Let \(w \in W\). \(P|_{W}(w) = w\), so \(P|_{W} = I_W\). \(\square\)

Claim: \(P\) is idempotent.

Proof: Let \(v \in V\). \(\text{proj}_W(v) = w\). \(\text{proj}_W(w) = w\), so \(P^2 = P\). \(\square\)

Claim: \(\text{ker } P \perp \text{im } P\).

Proof: \(\text{ker } P = W^\perp\), and \(\text{im } P = W\). \(\square\)

References

Benyattou, Khallil Ebrahim. (2026). Linear Alkebra. Self-published.

Bube, Ken and James Burke. (2021). Math 554 Linear Analysis Autumn 2006 Lecture Notes [Lecture notes].

Jain, Pawan K. and Om P. Ahuja. (2010). Functional analysis (second ed.). New Age International (P) Limited.

Lang, Serge. (1987). Linear algebra (Third edition). Springer.

Lang, Serge. (1997). Undergraduate analysis (Second ed.). Springer Science+Business Media, Inc.

Surowski, David. (1997). Advanced Linear Algebra [Lecture notes].

How to cite this article

Wayman, Eric Alan. (2026). Linear algebra: inner product spaces. Eric Alan Wayman's technical notes. https://ericwayman.net/notes/linear-algebra-inner-product-spaces/

@misc{wayman2026linear-algebra-inner-product-spaces,
  title={Linear algebra: inner product spaces},
  author={Wayman, Eric Alan},
  journal={Eric Alan Wayman's technical notes},
  url={https://ericwayman.net/notes/linear-algebra-inner-product-spaces/},
  year={2026}
}

© 2025-2026 Eric Alan Wayman. All rights reserved.