Linear algebra: basics

Table of Contents

Introduction and notation

This article clarifies certain points of Chapter 1 of Surowski (1997). The focus is on the relationship between linear transformations and matrices, in particular the identity transformation and change of basis transformations, as well as some definitions and theorems regarding dual spaces.

For vectors \(v_1, \ldots, v_n \in V\), denote by \(\langle v_1, \ldots, v_n \rangle\) the span of \(v_1, \ldots, v_n\).

For vector spaces \(V\), \(W\), denote by \(\mc{L}(V, W)\) the vector space of all linear transformations from \(V\) to \(W\). Denote by \(\mc{L}(V)\) the vector space of all linear transformations from \(V\) to \(V\).

FIX: DELETE THIS For \(T: V \to W\) a linear transformation, if \(T\) is an isomorphism we write \(T: V \stackrel{\cong}{\longrightarrow} W\).

Linear transformations and their matrices

Theorem: Let \(\mc{A} = (v_1, \ldots, v_n)\) be an ordered basis of \(V\) and let \(\mc{B} = (w_1, \ldots, w_m)\) be an ordered basis of \(W\).

Choose scalars \(a_{ij}\). Let \(T \in \mc{L}(V, W)\) where for all \(j \in [n]\), \(T(v_j) = \sum_{i=1}^m a_{ij} w_i\). Then there is no other transformation \(T_2\) such that for all \(j \in [n]\), \(T_2(v_j) = T(v_j)\).

Proof: Let \(T_2 \in \mc{L}(V, W)\) where for all \(j \in [n]\), \(T_2(v_j) = T(v_j)\). Let \(S = T_2 - T\). Observe that for all \(j \in [n]\), \(S(v_j) = T_2(v_j) - T(v_j) = 0\), so for any \(x \in V\), writing \(x = c_1 v_1 + \cdots + c_n v_n\), we have \(S(x) = c_1 S(v_1) + \cdots + c_n S(v_n) = 0\), so \(S\) is the zero transformation. Therefore \(T_2 = T\). \(\square\)

The above theorem shows that defining a linear transformation on a basis defines one and only one linear transformation from \(V\) to \(W\).

Definition: Let \(\mc{A} = (v_1, \ldots, v_n)\) be an ordered basis of \(V\) and let \(\mc{B} = (w_1, \ldots, w_m)\) be an ordered basis of \(W\). Define a linear transformation \(T\) by specifying its actions on vectors in \(\mc{A}\) in terms of vectors in \(\mc{B}\), namely, \(T(v_j) = \sum_{i=1}^m c_{ij} w_i\). Define the matrix \([T]_{\mc{B}\mc{A}}\) by \(([T]_{\mc{B}\mc{A}})_{ij} = c_{ij}\). We do not ascribe any meaning to this matrix at this point, it is just an array of the numbers we used to define the transformation \(T\) from ordered basis \(\mc{A}\) of \(V\) to ordered basis \(\mc{B}\) of \(W\).

For \(V\) of dimension \(n\), define \((\bullet)_\mc{A}: V \to \mb{F}^n\) to be the function that takes \(v = \alpha_1 v_1 + \cdots + \alpha_n v_n\) to \((v_1, \ldots, v_n)\) (Surowski 1997, p. 20).

Consider the following diagram:

(Surowski, 1997, p. 22).

We will discuss the above diagram in the following Theorem:

Theorem: Let ordered bases \(\mc{A} = (v_1, \ldots, v_n)\) and \(\mc{B} = (w_1, \ldots, w_m)\). Define a linear transformation \(T\) by specifying its actions on vectors in \(\mc{A}\) in terms of vectors in \(\mc{B}\), namely, \(T(v_j) = \sum_{i=1}^m c_{ij} w_i\). Then we have: \([T]_{\mc{B}\mc{A}} \circ (\bullet)_{\mc{A}} = (\bullet)_{\mc{B}} \circ T\) (or, for all \(x \in V\), \([T]_{\mc{B}\mc{A}} (x)_{\mc{A}} = (T(x))_{\mc{B}}\)) (Surowski, 1997, p. 22).

Before proving the theorem, we will discuss its meaning. The meaning of the conclusion of the theorem is: if we take a vector \(x \in V\), write it as a coordinate vector and then left-multiply it by the mysterious matrix we have defined, we will get the same answer as taking \(x\), applying the transformation \(T\), and writing the result as a coordinate vector in basis \(\mc{B}\).

In particular, this will mean that our definition of \([T]_{\mc{B}\mc{A}}\) is useful in taking vectors written in coordinates w.r.t. \(\mc{A}\), transforming them by \(T\) and getting a vector written in coordinates w.r.t. \(\mc{B}\).

We now prove this Theorem.

Proof:

(from Surowski, 1997, p. 22-23).

Let \(x \in V\), where \(x = \sum_{j=1}^{n} d_j v_j\). Then

$$ \begin{align*} \text{RHS} & = \left((\bullet)_{\mc{B}} \circ T\right)(x) = (\bullet)_{\mc{B}} (T(x)) = (T(x))_{\mc{B}} \\ & = \left(T \left(\sum_{j=1}^{n} d_j v_j\right)\right)_{\mc{B}} = \left(\sum_{j=1}^{n} d_j T(v_j)\right)_{\mc{B}} \\ & = \left(\sum_{j=1}^{n} d_j \left(\sum_{i=1}^{m} c_{ij} w_i\right)\right)_{\mc{B}} = \left(\sum_{i=1}^{m} \left(\sum_{j=1}^{n} d_j c_{ij}\right) w_i\right)_{\mc{B}} \end{align*} $$
$$ \begin{align*} & = \begin{pmatrix}\sum_{j=1}^{n} d_j c_{1j} \\ \vdots \\ \sum_{j=1}^{n} d_j c_{nj}\end{pmatrix} = \begin{pmatrix}c_{11} & \cdots & c_{1n} \\ \vdots & \vdots & \vdots \\ c_{m1} & \cdots & c_{mn}\end{pmatrix} \begin{pmatrix}d_1 \\ \vdots \\ d_n\end{pmatrix} \\ & = [T]_{\mc{B}\mc{A}} (x)_{\mc{A}} \\ & = ([T]_{\mc{B}\mc{A}} \circ (\bullet)_{\mc{A}})(x) \\ & = \text{LHS} \end{align*} $$

Thus we have shown that for all \(x \in V\), \([T]_{\mc{B}\mc{A}} (x)_{\mc{A}} = (T(x))_{\mc{B}}\). \(\square\)

Perhaps

The Theorem confirms that our definition of \([T]_{\mc{B}\mc{A}}\) is a useful one.

We further note that our definition ends up having

$$ \begin{equation*} \begin{pmatrix}c_{11} & \cdots & c_{1n} \\ \vdots & \vdots & \vdots \\ c_{m1} & \cdots & c_{mn}\end{pmatrix} = \begin{pmatrix}(T(v_1))_{\mc{B}} & \cdots & (T(v_n))_{\mc{B}} \end{pmatrix} \end{equation*} $$

i.e. the \(j\)th column is \((T(v_j))_{\mc{B}}\).

The identity transform

(The content of this section was inspired by pages 22-27 of Surowski (1997).)

Next, consider \(T = \text{id}\). From the above Theorem, we get \([\text{id}]_{\mc{B}\mc{A}} \circ (\bullet)_{\mc{A}} = (\bullet)_{\mc{B}} \circ \text{id}\)

Writing it out for an arbitrary \(x \in V\), \((\text{id}(x))_{\mc{B}} = [\text{id}]_{\mc{B}\mc{A}} (x)_{\mc{A}}\).

Note that \((\text{id}(x))_{\mc{B}} = (x)_{\mc{B}}\), so we have \((x)_{\mc{B}} = [\text{id}]_{\mc{B}\mc{A}} (x)_{\mc{A}}\)

We thus call \([\text{id}]_{\mc{B}\mc{A}}\) the "change of basis matrix."

Theorem: \([\text{id}]_{\mc{B}\mc{A}} [\text{id}]_{\mc{A}\mc{B}} = I\)

Proof:

Assume that the scalars \(c_{ij}\) are such that \(v_j = \text{id}(v_j) = \sum_{i=1}^{n} c_{ij} w_i\), so we have \(([\text{id}]_{\mc{B}\mc{A}}) = c_{ij}\)

Similarly, assume that the scalars \(d_{ij}\) are such that \(w_j = \text{id}(w_j) = \sum_{i=1}^{n} d_{ij} v_i\), so \(([\text{id}]_{\mc{A}\mc{B}}) = d_{ij}\)

Note that

$$ \begin{align*} v_j & = \sum_{i=}^{n} c_{ij} w_i = \sum_{i=1}^{n} c_{ij} \left(\sum_{l=1}^{n} d_{l i} v_{l}\right) \\ & = \sum_{l=1}^{n} \left(\sum_{i=1}^{n} d_{l i} c_{ij}\right)v_l \end{align*} $$

This implies that, by the definition of matrix multiplication and the definitions of \([\text{id}]_{\mc{A}\mc{B}}\) and \([\text{id}]_{\mc{B}\mc{A}}\), that \(([\text{id}]_{\mc{A}\mc{B}} [\text{id}]_{\mc{B}\mc{A}})_{lj} = \sum_{i=1}^{n} d_{l i} c_{ij}\).

Also, observe that

$$ \begin{equation*} \sum_{i=1}^{n} d_{l i} c_{ij} = \begin{cases} 1, & l = j \\ 0, & \text{o.w.} \end{cases} \end{equation*} $$

This implies that \([\text{id}]_{\mc{A}\mc{B}} [\text{id}]_{\mc{B}\mc{A}} = I\).

Therefore \([\text{id}]_{\mc{B}\mc{A}} = \inv{[\text{id}]_{\mc{A}\mc{B}}}\) \(\square\)

Finding an ordered basis from an ordered basis and a matrix

Theorem: Let \(\mc{A} = (v_1, \ldots, v_n)\) be an ordered basis of \(V\). Let \(p_{ij}\) for \(i \in [n], j \in [n]\) be a set of scalars such that the matrix \(P\) defined by \((P)_{ij} = p_{ij}\) is invertible. For all \(j \in [n]\), let \(w_j = \sum_{i=1}^n p_{ij} v_i\). Then it follows that \(\mc{B} = (w_1, \ldots, w_n)\) is an ordered basis of \(V\).

Proof: We must show that \(w_i\) are linearly independent and span the space. First, since \(P\) is invertible, let \(P^{-1}\) be denoted \((P^{-1})_{ij} = q_{ij}\). Then, since \(I = P \inv{P}\), we have for arbitrary \(i \in [n]\)

$$ \begin{align*} v_i & = \sum_{k=1}^{n} \delta_{ki} v_k \\ & = \sum_{k=1}^{n} \left(\sum_{l=1}^{n} p_{kl} q_{li}\right) v_k \\ & = \sum_{l=1}^{n} q_{li} \left(\sum_{k=1}^n p_{kl} v_k\right) \\ & = \sum_{l=1}^{n} q_{li} w_l \end{align*} $$

Thus each vector in \(\mc{A}\) is in \(\langle w_1, \ldots, w_n\rangle\).

By Proposition 1.2.1 of Surowski, since \((w_1, \ldots, w_n)\) spans \(\mc{A}\) it must contain a basis of \(\mc{A}\). Since \(\text{dim }V = n\), we have that \(\mc{B}\) is already a basis of \(V\). \(\square\)

Note further that if we start with ordered bases \(\mc{A}\) and \(\mc{B}\), take \(P\) to be an arbitrary collection of \(n^2\) numbers, and let for all \(i \in [n] \quad w_j = \sum_{i=1}^n p_{ij} v_i\), since \(w_j = \text{id}(w_j)\) we have by our definition of the square bracket subscript notation that \(P = [\text{id}]_{\mc{A}\mc{B}}\). (And, since this \(P\) is actually \([\text{id}]_{\mc{A}\mc{B}}\) we further have that for all \(x \in V\), \((x)_{\mc{A}} = P \enspace (x)_{\mc{B}}\).)

Representing a linear transformation in different to and from bases

(The content of this section was inspired by pages 22-27 of Surowski (1997).)

Theorem: Let \(T \in \mc{L}(V, W)\) and let \(\mc{A} = (v_1, \ldots, v_n)\) and \(\mc{A}^\prime = (v_1^\prime, \ldots, v_n^\prime)\) be ordered bases of \(V\). Similarly let \(\mc{B} = (w_1, \ldots, w_m)\) and \(\mc{B}^\prime = (w_1^\prime, \ldots, w_m^\prime)\) be ordered bases of \(W\). Then we have \([T]_{\mc{B}^\prime \mc{A}^\prime} = [\text{id}]_{\mc{B}^\prime \mc{B}} [T]_{\mc{B} \mc{A}} [\text{id}]_{\mc{A} \mc{A}^\prime}\).

Proof: For all \(j \in [n]\), define \(T(v_j) = \sum_{i=1}^{m} c_{ij} w_i\).

Similarly, for all define \(j \in [n]\), \(T(v_j^\prime) = \sum_{i=1}^{m} c_{ij}^\prime w_i^\prime\).

Letting \(w_j = \sum_{i=1}^m q_{ij} w_i^\prime\), we have \([\text{id}]_{\mc{B}^\prime \mc{B}}\) as: \(([\text{id}]_{\mc{B}^\prime \mc{B}})_{ij} = q_{ij}\) .

Letting \(v_j^\prime = \sum_{i=1}^n p_{ij} v_i\), we have \([\text{id}]_{\mc{A} \mc{A}^\prime}\) as: \(([\text{id}]_{\mc{A} \mc{A}^\prime})_{ij} = p_{ij}\) .

Denote the matrix resulting from the right-hand side multiplications as \(M\):

$$ \begin{equation*} (M)_{ij} = \sum_{l=1}^m \sum_{k=1}^n q_{il} c_{lk} p_{kj} \end{equation*} $$

Now observe:

$$ \begin{align*} T(v_j^\prime) & = T \left(\sum_{k=1}^{n} p_{kj} v_k\right) \\ & = \sum_{k=1}^{n} p_{kj} T(v_k) = \sum_{k=1}^{n} p_{kj} \sum_{l=1}^{m} c_{lk} w_l \\ & = \sum_{k=1}^{n} p_{kj} \sum_{l=1}^{m} c_{lk} \sum_{i=1}^m q_{il} w_i^\prime \\ & = \sum_{i=1}^m \left(\sum_{l=1}^{m} \sum_{k=1}^{n} q_{il} c_{lk} p_{kj} \right) w_i^\prime \end{align*} $$

This equation says that \(([T]_{\mc{B}^\prime \mc{A}^\prime})_{ij} = (M)_{ij}\) . Thus the Theorem is proved. \(\square\)

Finding an ordered basis from an ordered basis and a matrix

Let \(T \in \mc{L}(V)\) and let \(\mc{A} = (v_1, \ldots, v_n)\) be an ordered basis of \(V\).

Result: Choose an arbitrary set of numbers \(p_{ij}\) such that \(P^{-1}\) defined by \((P^{-1})_{ij} = p_{ij}\) is invertible. For all \(j \in [n]\), let \(w_j = \sum_{i=1}^n p_{ij} v_i\). By an above result, we have that \(\mc{B} = (w_1, \ldots, w_n)\) is an ordered basis of \(V\). Also by the above, we have that \(P^{-1} = [\text{id}]_{\mc{B}\mc{A}}\). Note that then if we write \(P^{-1} [T]_{\mc{A}\mc{A}} P\), by the Theorem we just proved, we have \(P^{-1} [T]_{\mc{A}\mc{A}} P = [T]_{\mc{B}\mc{B}}\). Thus if we start off with a matrix \([T]_{\mc{A}\mc{A}}\) and we make up an invertible matrix \(P\), we get not only a new ordered basis for \(V\) but also a new matrix \(M = [T]_{\mc{B}\mc{B}}\) that is similar to \([T]_{\mc{A}\mc{A}}\).

Theorem: Let \(T \in \mc{L}(V)\) and let \(\mc{A} = (v_1, \ldots, v_n)\) be an ordered basis of \(V\). Let \(M\) be an arbitrary \(n \times n\) matrix. Assume there exists an invertible \(n \times n\) matrix \(P\) such that \(P^{-1} [T]_{\mc{A}\mc{A}} P = M\). Then (1) there exists an ordered basis \(\mc{B}\) such that \(P^{-1} = [\text{id}]_{\mc{B}\mc{A}}\), and (2) \(M = [T]_{\mc{B}\mc{B}}\).

Proof: Assume there exists such a \(P\). We can write for all \(j \in [n]\), let \(w_j = \sum_{i=1}^n p_{ij} v_i\). By the above Result, \(\mc{B} = (w_1, \ldots, w_n)\) is a basis for \(V\), and \(P^{-1} = [\text{id}]_{\mc{B}\mc{A}}\). Using this with the above Theorem we get \(P^{-1} [T]_{\mc{A}\mc{A}} P = [T]_{\mc{B}\mc{B}}\). We already have \(P^{-1} [T]_{\mc{A}\mc{A}} P\) by assumption so we have \(M = [T]_{\mc{B}\mc{B}}\). \(\square\)

The import of the above Theorem is that if we have one ordered basis and the matrix of a particular linear transformation to and from that basis, and we have another matrix and an invertible \(P\) that gets us there, the other matrix must be a representation of the same transformation in a particular other basis (and we can find that other basis from \(P\)). This is why we care about similar matricies: if we start off with one matrix that we have a known interpretation for and we find another matrix that is similar to it, we have found another representation of our transformation.

Dual-related

Annihilators

Definition: Let \(W \subseteq V\), \(W\) a subspace. The annihilator of \(W\) is

$$ \text{Ann}(W) := \{f \in V^\ast;\, f(W) = 0\} \subseteq V^{\ast}. $$

Claim: \(\text{Ann}(W)\) is a subspace of \(V^\ast\).

Proof: Let \(f_1, f_2 \in \text{Ann}(W)\). \((f_1 + f_2)(w) = f_1(w) + f_2(w) = 0 + 0 = 0\), so \(f_1 + f_2 \in \text{Ann}(W)\).

Let \(f_1 \in \text{Ann}(W)\). \((\alpha f_1)(w) = \alpha \cdot f_1(w) = \alpha \cdot 0 = 0\), so \(\alpha f_1 \in \text{Ann}(W)\).

Also clearly the zero functional is in \(\text{Ann}(W)\). From these three facts, it follows that, \(\text{Ann}(W)\) is a subspace of \(V^\ast\). \(\square\)

Claim: \(W_1 \subseteq W_2 \subseteq V \implies \text{Ann}(W_1) \supseteq \text{Ann}(W_2)\).

Proof: Let \(x \in \text{Ann}(W_2)\). Then \(x \in V^\ast\) and \(x(W_2) = 0\). Since \(W_2 \supseteq W_1\), \(x(W_1) = 0\). So indeed \(x \in \text{Ann}(W_1)\). Therefore \(\text{Ann}(W_2) \subseteq \text{Ann}(W_1)\). \(\square\)

Proof of the dimensionalities of annihilators and pre-annilihators

Proposition 1.7.3 (Surowski, 1997, p. 38): Assume that \(\text{dim } V = n < \infty\) and that \(W\) is a \(k\)-dimensional subspace of \(V\). Then, \(\text{dim } \text{Ann}(W) = n - k\). Likewise, if \(L \subseteq V^\ast\) has dimension \(m\), then \(\text{dim } \text{Ann}^{\ast}(L) = n - m\).

Proof (Surowski, 1997, p. 38):

Let \((v_1, \ldots, v_k)\) be an ordered basis of \(W\). Extend it to \((v_1, \ldots, v_k, v_{k+1}, \ldots, v_n)\), an ordered basis of \(V\). Let \((v_1^\ast, \ldots, v_k^\ast)\) be the dual basis of \(V^\ast\).

Let \(f \in \text{Ann}(W)\). Then \(f = \sum_{i=1}^{n} \alpha_i v_i^\ast\).

Since \(f \in \text{Ann}(W)\), \(\forall\, w \in W \quad f(w) = 0\). Thus for all \(j \in [k]\),

$$ \begin{align*} 0 & = f(v_j) = \left(\sum_{i=1}^{n} \alpha_i v_i^\ast\right)(v_j) \\ & = 0 + \cdots + 0 + \alpha_j \cdot 1 + 0 + \cdots + 0 \\ & = \alpha_j \end{align*} $$

Therefore \(f = \sum_{i=k+1}^{n} \alpha_i v_i^\ast\). Therefore \(\text{Ann}(W) \subseteq \langle v_{k+1}^\ast, \ldots, v_n^\ast \rangle\).

Now we show the reverse inclusion.

Let \(f \in \langle v_{k+1}^\ast, \ldots, v_n^\ast \rangle\). Let \(w \in W\), so there exist \(b_1, \ldots, b_k\) such that \(w = b_1 v_1 + \cdots + b_k v_k\). Then

$$ \begin{align*} f(w) & = \left(\sum_{i=k+1}^{n} \alpha_i v_i^\ast\right)(b_1 v_1 + \cdots + b_k v_k) \\ & = (0 + \cdots + 0) + (0 + \cdots + 0) + \cdots + (0 + \cdots + 0) \\ & = 0 \end{align*} $$

Therefore \(f \in \text{Ann}(W)\). Therefore from the above we have \(\langle v_{k+1}^\ast, \ldots, v_n^\ast \rangle = \text{Ann}(W)\).

Therefore, \(\text{dim } \text{Ann}(W) = n - k\). \(\square\)

Generalizing the notation, we have shown: for \(W \subseteq V\), \(W\) a subspace, we have

$$ \text{dim } V = \text{dim } \text{Ann}(W) + \text{dim } W. $$

Showing the result for pre-annihilators

Let \(L \subseteq V^\ast\), \(L\) a subspace. Define the "pre-annihilator" (Erdman, 2021, p. 40; Surowski, 1997, p. 37) as

$$ \text{Ann}^\ast(L) = \{v \in V ; \forall f \in L \quad f(v) = 0\}. $$

We show that \(\text{Ann}^\ast(L)\) is a subspace of \(V\): let \(a, b \in \mb{F}\) and \(v_1, v_2 \in \text{Ann}^\ast(L)\). Let \(f \in L\). Since \(f(v_1) = 0\), \(f(v_2) = 0\), and \(f\) linear, \(f(a v_1 + b v_2) = a f(v_1) + b f(v_2) = 0 + 0 = 0\). Since this was for an arbitrary \(f \in L\) we have \(a v_1 + b v_2 \in \text{Ann}^\ast(L)\). \(\checkmark\)

Then by Proposition 1.7.3 we have

$$ \text{dim } V^\ast = \text{dim } \text{Ann}^\ast(L) + \text{dim } L . $$

(We could have redone the entire proof starting with an ordered basis of \(L\) and extending it to a basis of \(V^\ast\), then using the dual basis in \(V^{\ast\ast}\) and showing the two inclusions.)

Proving that the annihilator is a bijection whose inverse is the pre-annihilator

We will prove:

Corollary 1.7.3.1 (Surowski, 1997, p. 38): If \(\text{dim } V = n < \infty\), then \(\text{Ann}: \{\text{subspaces of } V\} \to \{\text{subspaces of } V^\ast\}\) is a bijection with inverse \(\text{Ann}^\ast\).

First, we show two necessary results:

Claim 1: Letting \(X = \{\text{subspaces of } V\}\), we have \(\text{Ann}^\ast \circ \text{Ann} = \text{id}_X\) (where \(\text{id}_X\) denotes the identity mapping on \(X\)).

Proof:

Let \(V\) be a vector space with \(\text{dim } V = n\). Let \(W\) be a subspace of \(V\), with \(\text{dim } W = k < n\).

Consider \(\text{Ann}^\ast(\text{Ann}(W)) = \{v \in V ;\, \forall\, f \in \text{Ann}(W) \quad f(v) = 0\}\).

First, note that since \(\text{Ann}(W)\) is a subspace of \(V^\ast\), applying Proposition 1.7.3 we have

$$ \text{dim } V^\ast = \text{dim } \text{Ann}^\ast(\text{Ann}(W)) + \text{dim } \text{Ann}(W) \\ $$
$$ :: \enspace \text{dim } \text{Ann}^\ast(\text{Ann}(W)) = n - \text{dim } \text{Ann}(W) = n - (n - k) = k $$

Next, we observe that since \(\text{Ann}(W)\) is a vector space, and since we showed above that for any subspace \(L\) of \(V^\ast\), \(\text{Ann}^\ast(L)\) is a subspace of \(V\), we have that \(\text{Ann}^\ast(\text{Ann}(W))\) is a subspace of \(V\).

Let \(w \in W\). Then immediately \(w \in \text{Ann}^\ast(\text{Ann}(W))\).

Therefore \(W \subseteq \text{Ann}^\ast(\text{Ann}(W))\).

[Claim: If \(A \subseteq B\) and \(A\) and \(B\) are subspaces of \(V\), then \(A\) is a subspace of \(B\).

Proof: \(A\) and \(B\) share the same zero vector. \(A\) is closed under scalar multiplication and addition; since \(B\) is a vector space, \(A\) is a subspace of \(B\).]

By the claim, since both \(W\) and \(\text{Ann}^\ast(\text{Ann}(W))\) are subspaces of \(V\) and \(W \subseteq \text{Ann}^\ast(\text{Ann}(W))\), \(W\) is a subspace of \(\text{Ann}^\ast(\text{Ann}(W))\). Since further \(\text{dim } W = \text{dim } \text{Ann}^\ast(\text{Ann}(W))\), by Lang Chapter I, Section 3, Corollary 3.5 (1987, p. 18), \(W = \text{Ann}^\ast(\text{Ann}(W))\).

Therefore \(\text{Ann}^\ast \circ \text{Ann} = \text{id}_X\).

\(\square\)

Claim 2: Letting \(Y = \{\text{subspaces of } V^\ast\}\), we have \(\text{Ann} \circ \text{Ann}^\ast = \text{id}_Y\).

Proof:

Let \(V\) be a vector space with \(\text{dim } V = n\). Let \(L\) be a subspace of \(V^\ast\) with dimension \(m < n\). By definition,

$$ \text{Ann}(\text{Ann}^\ast(L)) = \{f \in V^\ast ;\, \forall\, v \in \text{Ann}^\ast(L) \quad f(v) = 0\}. $$

Let \(\gamma \in L\). Let \(v \in \text{Ann}^\ast(L)\). Since \(\gamma(v) = 0\), it follows that \(\gamma \in \text{Ann}(\text{Ann}^\ast(L))\), so \(L \subseteq \text{Ann}(\text{Ann}^\ast(L))\).

Above we showed that \(\text{Ann}^\ast(L)\) is a subspace of \(V\). Using an above result, we have

$$ \begin{align*} \text{dim }\text{Ann}(\text{Ann}^\ast(L)) & = \text{dim } V - \text{dim }\text{Ann}^\ast(L) \\ & = n - (n - m) = m. \end{align*} $$

Since \(\text{Ann}(\text{Ann}^\ast(L)) \subseteq L\), and both are subspaces of \(V\), we have that \(\text{Ann}(\text{Ann}^\ast(L))\) is a subspace of \(L\). Since \(\text{dim } \text{Ann}(\text{Ann}^\ast(L)) = \text{dim } L\), we have that \(L = \text{Ann}(\text{Ann}^\ast(L))\).

Therefore \(\text{Ann} \circ \text{Ann}^\ast = \text{id}_Y\).

\(\square\)

Proof of Theorem:

Let \(W \subseteq V\), \(W\) a subspace. Let \(\text{dim } W = k\), \(\text{dim } V = n\). Let \(L \subseteq V^\ast\) with \(\text{dim } L = m\).

By Claim 1 and Claim 2, we have that \(\text{Ann}^\ast = \inv{\text{Ann}}\).

\(\square\)

Showing that the kernel of a nonzero member of the dual of a vector space is a hyperplane of the vector space (and that any hyperplane is the kernel of such a functional)

Before proving the theorem, we will recall the definition of hyperplane, and then show two results.

Definition (Surowski, 1997, p. 38): Let \(V\) be a vector space (possibly infinite dimensional). If \(H \subseteq V\) is a subspace of \(V\) such that \(\text{dim}(V / H) = 1\), we call \(H\) a hyperplane of \(V\).

Result 1: Let \(\text{dim } V = n\). For \(f \in V^\ast\), \(f \neq 0\), it follows that \(f\) is surjective.

Proof: Let \(f \in V^\ast\), \(f \neq 0\). Since \(f \neq 0\), \(\exists\, v \in V \enspace \exists\, a \in \mb{F} \quad f(v) = a \neq 0\).

We will show that for all \(y \in \mb{F}\) there exists \(u \in V\) such that \(f(u) = y\). Let \(y \in \mb{F}\). Then \(\exists\, c \quad ca = y\). Therefore \(f(cv) = cf(v) = ca = y\), so \(f\) is surjective. \(\square\)

Before stating and showing result 2, recall the following definitions:

Definition (Surowski, 1997, p. 27): For \(V\) a vector space and \(W\) a subspace of \(V\), for a \(v \in V\), the coset determined by \(v\) is \(v + W := \{v + w ;\, w \in W\} \subseteq V\).

Recall that the zero vector in \(V / W\) is \(0 + W\) (Surowski, 1997, p. 31).

Definition (Surowski, 1997, p. 30-31): For \(V\) a vector space and \(W\) a subspace of \(V\), the canonical projection \(\pi_W: V \to V / W\) is defined by \(\pi_W(v) = v + W\) for all \(v \in V\).

Result 2: Let \(H \subseteq V\) be a hyperplane. Let \(T: V / H \stackrel{\cong}{\longrightarrow} \mb{F}\). Let \(f = T \circ \pi_H\). It follows that \(\text{ker } f = \text{ker } \pi_H\).

Proof:

(\(\Longrightarrow\))

Let \(x \in \text{ker } f \,::\, T(\pi_H(x)) = 0\). By definition, \(\pi_H(x) = x + H\), so \(T(x + H) = 0\). Since \(T: V / H \stackrel{\cong}{\longrightarrow} \mb{F}\), it follows that \(\text{rank}(T) = 1\).

By the rank-nullity theorem, \(\text{rank}(T) = \text{dim}(V / H) - \text{dim } \text{ker } T\), i.e. \(1 = 1 - \text{dim } \text{ker } T\), so \(\text{dim } \text{ker } T = 0\), so \(\text{ker } T = \{0\}\).

Therefore \(x + H = 0_{V/H}\), which implies \(\pi_H(x) = x + H = 0_{V/H}\). Therefore \(x \in \text{ker } \pi_H\).

(\(\Longleftarrow\))

Let \(x \in \text{ker } \pi_H\). Then \(f(x) = T(\pi_H(x)) = T(0_{V/H}) = 0\) where the final equality holds because \(T\) is a linear transformation. Thus we have shown that \(x \in \text{ker } f\).

\(\square\)

Now we prove the actual theorem. This theorem and its proof are from Surowski (1997, p. 38-39).

Theorem 1(a): Let \(V\) be a vector space with \(\text{dim } V = n\). Let \(f \in V^\ast\) where \(f \neq 0\). Then \(\text{ker } f\) is a hyperplane of \(V\).

Proof: Let \(V\) be a vector space with \(\text{dim } V = n\). Let \(f \in V^\ast\) where \(f \neq 0\). By Result 1, \(f\) is surjective.

Recall the Fundamental Homomorphism Theorem (Surowski, 1997, p.32): if \(T: V \to W\) is a surjective linear transformation, then \(W \cong V / \text{ker } T\). Therefore \(V / \text{ker } f \cong \mb{F}\), so \(\text{dim}(V / \text{ker } f) = 1\), so \(\text{ker } f\) is a hyperplane. \(\square\)

Theorem 1(b): Let \(H\) be a hyperplane of \(V\). Then there exists an \(f \in V^\ast\), \(f \neq 0\) such that \(\text{ker } f = H\).

Proof: As shown in Result 2, there exists a \(T\) such that \(f = T \circ \pi_H\) is such that \(\text{ker } f = \text{ker } \pi_H = H\) (where the last step is because for any \(W\) subspace of \(V\) we have that \(\text{ker } \pi_W = W\), see Surowski Proposition 1.6.3). \(\square\)

Proving that every subspace of codimension \(m\) is the intersection of a particular set of \(m\) hyperplanes

We first show two preliminary results.

Result 1: Let \(W \subseteq V\), \(W\) a subspace, \(\text{codim } W = m\), \(\text{dim } V = n\). It follows that \(\text{dim } \text{Ann}(W) = \text{codim } W\).

Proof: Let \(W \subseteq V\), \(W\) a subspace, \(\text{codim } W = m\), \(\text{dim } V = n\).

Recall the definition of codimension: \(\text{codim } W = \text{dim}(V/W)\). Also recall Corollary 1.6.3.1 (Surowski, 1997, p. 31), namely that if \(\text{dim } V < \infty\), then \(\text{dim }(V / W) = \text{dim } V - \text{dim } W\). Now observe that

$$ \begin{align*} \text{dim } \text{Ann}(W) & = n - \text{dim } W = n - (\text{dim } V - \text{dim}(V/W)) \\ & = \text{dim}(V/W) = \text{codim } W = m. \end{align*} $$

\(\square\)

Result 2: Let \((f_1, \ldots, f_m)\) be an ordered basis of \(\text{Ann}(W) \subseteq V^\ast\). It follows that

$$ \begin{align*} & \{v \in V;\, \forall\, f \in \text{Ann}(W) \quad f(v) = 0\} \\ & = \{v \in V;\, f_1(v) = f_2(v) = \cdots = f_m(v) = 0\}. \end{align*} $$

Proof:

Denote the LHS set by \(S_1\) and the RHS set by \(S_2\) for readability.

\((\subseteq)\)

Let \(v \in S_1\). Assume \(\forall\, f \in \langle f_1, \ldots, f_m \rangle \quad f(v) = 0\). Then \(f = f_1 \implies f_1(v) = 0\), \(f = f_2 \implies f_2(v) = 0\), and so on, through \(f = f_m \implies f_m(v) = 0\). Therefore \(v \in S_2\).

\((\supseteq)\)

Let \(v \in S_2\), so \(f_1(v) = \cdots = f_m(v) = 0\). \(\forall\, f \in \langle f_1, \ldots, f_m\rangle \quad \exists\, a_1, \ldots, a_m \quad f = a_1 f_1 + \cdots + a_m f_m\). Then clearly \(f(v) = 0\). Thus \(\forall\, f \in \langle f_1, \ldots, f_m\rangle \quad f(v) = 0\).

Thus \(S_1 = S_2\).

We now show the proposition.

Proposition 1.7.4.1 (Surowski, 1997, p. 39): If \(W \subseteq V\) has codimension \(m\), then there exist hyperplanes \(H_1, \ldots, H_m\) such that \(W = \cap_{i=1}^m H_i\).

Proof: By Result 1, \(\text{dim }\text{Ann}(W) = \text{codim } W\), so let \((f_1, \ldots, f_m)\) be an ordered basis of \(\text{Ann}(W)\). Observe that

$$ \begin{align*} W & = \text{Ann}^{\ast}(\text{Ann}(W)) \\ & = \{v \in V;\, \forall\, f \in \text{Ann}(W) \quad f(v) = 0\} \\ & = \{v \in V;\, f_1(v) = f_2(v) = \cdots = f_m(v) = 0\} \\ & = \bigcap_{i=1}^m \{v \in V;\, f_i(v) = 0\} \\ & = \bigcap_{i=1}^m \text{Ann}^{\ast}(\{f_i\}) \end{align*} $$

where the last step is because

$$ \{v \in V;\, f_i(v) = 0\} = \{v \in V;\, \forall\, f \in \{f_i\} \quad f(v) = 0\} = \text{Ann}^{\ast}(\{f_i\}) $$

Now observe that \(\text{Ann}^{\ast}(\{f_i\}) = \text{ker } f_i\). Since each \(f_i\) is non-zero, each \(f_i\) is a hyperplane by an earlier result. Denoting \(H_i = \text{ker } f_i\), we have shown the desired result.

\(\square\)

Note on a typo in Surowski, Chapter 1

On page 39, before the diagram, it should read "We define a linear transformation \(T^\ast: {V^{\prime}}^\ast \to V^\ast\) by the following diagram" (he left off the prime in the domain of \(T^\ast\)).

Results regarding invertibility

A lemma

Lemma: For any basis \(\mc{A}\) of \(V\) and any \(T \in \mc{L}(V)\), \([T(v_1)]_{\mc{A}}, \ldots, [T(v_n)]_{\mc{A}}\) are linearly independent iff \(T(v_1), \ldots, T(v_n)\) are linearly independent.

Proof: Recall that \((\bullet)_{\mc{A}}\) is an invertible linear transformation.

(\(\Longrightarrow\))

Consider \(c_1 T(v_1) + \cdots c_n T(v_n) = 0\). Assume \(T(v_1), \ldots, T(v_n)\) are linearly independent, so \(c_1 = \cdots = c_n = 0\). Then applying \((\bullet)_{\mc{A}}\) to both sides gives

$$ \begin{align*} & (c_1 T(v_1) + \cdots c_n T(v_n))_{\mc{A}} = (0)_{\mc{A}} \\ :: \quad & c_1 (T(v_1))_{\mc{A}} + \cdots + c_n (T(v_n))_{\mc{A}} = (0)_{\mc{A}} \end{align*} $$

Since \(c_1 = \cdots = c_n = 0\), we have that \([T(v_1)]_{\mc{A}}, \ldots, [T(v_n)]_{\mc{A}}\) are linearly independent.

(\(\Longleftarrow\))

Assume that \([T(v_1)]_{\mc{A}}, \ldots, [T(v_n)]_{\mc{A}}\) are linearly independent. Consider

$$ c_1 (T(v_1))_{\mc{A}} + \cdots + c_n (T(v_n))_{\mc{A}} = 0_{\mc{A}} $$

This equals

$$ (c_1 T(v_1) + \cdots + c_n T(v_n))_{\mc{A}} = (0)_{\mc{A}} $$

Applying the inverse of the isomorphism, we have

$$ c_1 T(v_1) + \cdots + c_n T(v_n) = 0 $$

and thus \(T(v_1), \ldots, T(v_n)\) are linearly independent.

A Theorem

Theorem: For a matrix \(A \in M_n(\mb{F})\) and for a \(V\) with \(\text{dim } V = n\), note that each choice of basis \(\mc{A}\) leads to the matrix representing a \(T \in \mc{L}(V)\), namely that \(A = [T]_{\mc{A}}\). If \(A\) is invertible, then this \(T\) is invertible. Conversely, if \(T \in \mc{L}(V)\) is invertible, then for any choice of \(\mc{A}\) we have that \([T]_{\mc{A}}\) is invertible.

Proof:

(\(\Longrightarrow\))

Let \((A)_{ij} = a_{ij}\). Choose \(\mc{A} = (v_1, \ldots, v_n)\). Taking \(A\) to mean \([T]_{\mc{A}}\), we have \(\forall\, j \in [n] \quad T(v_j) = \sum_{i=1}^n a_{ij} v_i\). With this definition, we have

$$ A = \left([T(v_1)]_{\mc{A}}, \ldots, [T(v_n)]_{\mc{A}}\right) $$

\(A\) is invertible iff its columns are linearly independent, so by the Lemma, \(T(v_1), \ldots, T(v_n)\) are linearly independent. This defines \(T\) (Lang, p. 56).

Further, since \(\text{dim } V = n\), any set of \(n\) vectors that are linearly independent span \(V\) by Lang, Chapter 1, Theorem 3.4 (1987, p. 18). We now show that \(T\) is surjective and injective.

Surjective: Let \(\mc{B} = (T(v_1), \ldots, T(v_n))\) be an ordered basis for \(V\). Let \(w \in V\), so \(w = d_1 T(v_1) + \cdots + d_n T(v_n) = T(d_1 v_1 + \cdots d_n v_n)\). Thus \(T\) is surjective.

Injective: Let \(x \in \text{ker} T\). Write \(x = \sum_{i=1}^{n} c_i v_i\). Note that \(0 = T(x) = T(\sum_{i=1}^{n} c_i v_i) = \sum_{i=1}^{n} c_i T(v_i)\). Since the \(T(v_i)\) are linearly independent, \(c_1 = \cdots = c_n = 0\), so \(x = 0\). Thus \(\text{ker } T = \{0\}\).

(note we could have used the rank-nullity theorem and shown only one of these, but I wanted to show both directly.)

Thus \(T\) is both surjective and injective, so \(T\) is a bijection.

(\(\Longleftarrow\))

Let \(T \in \mc{L}(V)\) be invertible. Then \(\text{dim } \text{Im } T = n\). Choose a basis \(\mc{A} = (v_1, \ldots, v_n)\) for \(V\). Consider \(T(v_1), \ldots, T(v_n)\). These must be linearly independent by a similar argument to the above. Then using the coordinate isomorphism, we have that \(\left([T(v_1)]_{\mc{A}}, \ldots, [T(v_n)]_{\mc{A}}\right)\) are linearly independent, so the matrix whose columns are these vectors is linearly independent. That matrix is \([T]_{\mc{A}}\).

\(\square\)

Projections

This section is based on Jain & Ahuja (2010, p. 32-33) and is informed by Axler's material on orthogonal projections (2024, p. 214-215).

Let \(V\) be a vector space and let \(W\), \(U\) be subspaces of \(V\) such that \(V = W \oplus U\) (i.e. for all \(v \in V\) there exist \(w \in W\), \(u \in U\) such that \(v = w + u\). Define \(P: V \to V\) by \(Pv = w\). The following three facts hold:

We call \(P\) the linear projection on \(W\) along \(U\).

References

Axler, Sheldon. (2024). Linear algebra done right (Fourth edition). Self-published.

Erdman, John M. (2021). Elements of linear algebra. World Scientific Publishing.

Jain, Pawan K. and Om P. Ahuja. (2010). Functional analysis (second ed.). New Age International (P) Limited.

Lang, Serge. (1987). Linear algebra (Third edition). Springer.

Surowski, David. (1997). Advanced Linear Algebra [Lecture notes].

How to cite this article

Wayman, Eric Alan. (2026). Linear algebra: basics. Eric Alan Wayman's technical notes. https://ericwayman.net/notes/linear-algebra-basics/

@misc{wayman2026linear-algebra-basics,
  title={Linear algebra: basics},
  author={Wayman, Eric Alan},
  journal={Eric Alan Wayman's technical notes},
  url={https://ericwayman.net/notes/linear-algebra-basics/},
  year={2026}
}

© 2025-2026 Eric Alan Wayman. All rights reserved.