
Prerequisite Skill sets for Artificial Intelligence Development. PART I
There is hardly any theory which is more elementary than linear algebra, inspite of the fact that generations of
Data scientist and Mathematicians have obscured its simplicity by preposterous calculations with matrices
-Jean Dieudonnie French Mathematician
Mathematics requires a small dose, not of genius but of an imaginative freedom which in a larger dose would be insanity
– August K Rodgers
It’ll be about linear algebra, which as a lot of you know is one of these subjects that’s required knowledge
for just about any technical education, but it’s also I have noticed generally poorly understood by IT professionals
taking it for the first time. An Data Scientist might go through a class and learn how to compute lost of things, like determinant,
matrices multiplication or cross product which use the determinant or eigenvalues, but they might come out
without really understanding why matrix multiplication is defined the way that it is, why cross product has
anything to do with the determinant or what an eigenvalues really represent.
In this blog I have tried to help all those data scientist to re look of Linear algebra in altogether in a different
perspective in terms of Analytical Geometry or 3 Dimensional Spatial Geometry and help them understand how
this can be easily put to use in Computer vision , Image processing, Speech recognition, and Robotics.
So Let’s start with some basics viz; Vector Space.
the linear algebra presented in our AAS course, generalising from R to an arbitrary field as domain
from which coefficients are to be taken. Fields are treated in the companion course, Rings and
Arithmetic, but to make this course and these notes self-contained we begin with a definition of
what we mean by a field.
Axioms for Fields
A field is a set F with distinguished elements 0, 1, with a unary operation −, and with two binary operations +
and × satisfying the axioms below (the ‘axioms of arithmetic’).
Conventionally, for the image of (a, b) under the function + : F × F → F we write a + b; for the image of (a, b)
under the function × : F × F → F we write a b; and x − y means x + (−y), x + y z means x + (y z).
The axioms of arithmetic
(1) a + (b + c) = (a + b) + c [+ is associative]
(2) a + b = b + a [+ is commutative]
(3) a +0= a
(4) a + (−a) = 0
(5) a (b c) = (a b) c [× is associative]
(6) ab = ba [× is commutative]
(7) a 1 = a
(8) a 6 = 0 ⇒ ∃ x ∈ F : a x = 1
(9) a (b + c) = a b + a c [× distributes over +]
(10) 0 ≠ 1
Note 1: All axioms are understood to have ∀ a, b, . . . ∈ F in front.
Note 2: See my Notes on Rings and Arithmetic for more discussion.
Note 3: Examples are Q, R, C, Z 2 or more generally Z p , the field of integers modulo p for any prime number p.
Vector Spaces
Let F be a field. A vector space over F is a set V with distinguished element 0, with a unary operation −,
with a binary operation +, and with a function × : F × V → V satisfying the axioms below.
Conventionally: for the image of (a, b) under the function + : V × V → V
we write a + b; for a + (−b) we write a − b; and for the image of (α, v)
under the function × : F × V → V we write α v .
Vector space axioms
Note: All axioms are understood to have appropriate quantifiers ∀ u, v, . . . ∈ V
and/or ∀ α, β, . . . ∈ F in front.
EXAMPLES:
- the polynomial ring F [x] is a vector space over F ;
- if K is a field and F a sub-field then K is a vector space over F ;
Exercise 1. Let X be any set, F any field. Define
Exactly as for rings, or for vector spaces over R, one can prove important “trivialities”.
Of course, if we could not prove them then we would add further axioms until we
had captured all properties that we require, or at least expect, of the algebra of vectors.
But the fact is that this set of axioms, feeble though it seems, is enough. For example:
PROPOSITION: Let V be a vector space over the field F . For any v ∈ V and any α ∈ F we have
Proof. For v ∈ V , from Field Axiom (3) and Vector Space Axiom (7) we have
0 v = (0 + 0) v = 0 v + 0 v.
Then adding −(0 v) to both sides of this equation and using Vector Space Axioms (4)
on the left and (1), (4), (3) on the right, we get that 0 = 0 v , as required for (i). The
reader is invited to give a proof of (ii).
For (iii), suppose that α v = 0 and α 6 = 0: our task is to show that v = 0. By Field
Axioms (8) and (6) there exists β ∈ F such that β α = 1. Then
v = 1 v = (β α) v = β (α v) = β 0
by Vector Space Axioms (8) and (5). But β 0 = 0 by (ii), and so v = 0, as required.
Clause (iv), like Clause (ii), is offered as an exercise:
Subspaces
Let V be a vector space over a field F . A subspace of V is a subset U such that
(1) 0 ∈ U and u + v ∈ U whenever u, v ∈ U
(2) if u ∈ U , α ∈ F then α u ∈ U . and −u ∈ U whenever u ∈ U ;
Note 1: Condition (1) says that U is an additive subgroup of V . Condition (2) is closure under
multiplication by scalars.
EXAMPLES:
QUOTIENT SPACES
Suppose that U <= V where V is a vector space over a field F . Define the quotient
space V /U as follows:
set := {x + U | x ∈ V } [additive cosets]
0 := U
additive inverse: −(x + U ) := (−x) + U
addition: (x + U ) + (y + U ) := (x + y) + U
Note: The notion of quotient space is closely analogous with the notion of quotient
of a group by a normal subgroup or of a ring by an ideal. It is not in the Part A syllabus,
nor will it play a large part in this course. Nevertheless, it is an important and useful
construct which is well worth becoming familiar with.
Dimension
Although technically new, the following ideas and results translate so simply from
the case of vector spaces over R to vector spaces over an arbitrary field F that I propose
simply to list headers for revision:
(1) spanning sets;
linear dependence and independence;
bases;
(2) dimension;
(4) any linearly independent set may be extended (usually in many ways) to a basis;
(5) intersection U ∩ W of subspaces; sum U + W ;
(6) dim (U + W ) = dim U + dim W − dim (U ∩ W );
LINEAR TRANSFORMATION
T 0 = 0,
T (−x) = −T x,
T (x + y) = T x + T y,
and T (λ x) = λ (T x)
for all x, y ∈ V and all λ ∈ F .
Note 1: The definition is couched in terms which are intended to emphasise that
what should be required of a linear transformation is
that it preserves all the ingredients, 0, +, − and multiplication by scalars, that go into
linear if and only if T (α x + β y) = α T x + β T y for all x, y ∈ V and all α, β ∈ F .
Note 2:
For a linear transformation T : V → W we define the kernel or null-space by Ker T := {x ∈ V | Tx = 0}.
RANK NULLITY THEOREM:
nullity T + rank T = dim V
Note: The Rank-Nullity Theorem is a version of the First Isomorphism Theorem
for vector spaces, which states that
DIRECT SUMS AND PROJECTION OPERATORS
The vector space V is said to be the direct sum of its subspaces U and W , and we
write V = U ⊕ W , if V = U + W and U ∩ W = {0}.
Lemma: Let U , W be subspaces of V . Then V = U ⊕ W if and only if for every
v ∈ V there exist unique vectors u ∈ U and w ∈ W such that v = u + w.
Proof. Suppose first that for every v ∈ V there exist unique vectors u ∈ U and
w ∈ W such that v = u + w . Certainly then V = U + W and what we need to prove
is that U ∩ W = {0}. So let x ∈ U ∩ W . Then x = x + 0 with x ∈ U and 0 ∈ W .
Equally, x = 0 + x with 0 ∈ U and x ∈ W . But the expression x = u + w with u ∈ U
and w ∈ W is, by assumption, unique, and it follows that x = 0. Thus U ∩ W = {0}, as required.
Now suppose that V = U ⊕ W . If v ∈ V then since V = U + W there certainly exist vectors u ∈ U
and w ∈ W such that v = u + w . The point at issue therefore is: are u, w uniquely determined by v?
Suppose that u + w = u ′ + w ′ , where u, u ′ ∈ U and w, w ′ ∈ W . Then u − u ′ = w ′ − w .
This vector lies both in U and in W .
By assumption, U ∩ W = {0} and so u − u ′ = w ′ − w = 0. Thus u = u ′ and w = w ′ ,
so the decomposition of a vector v as u + w with u ∈ U and w ∈ W is unique, as required.
Note: What we are discussing is sometimes (but rarely) called the “internal” direct
sum to distinguish it from the natural construction which starts from two vector spaces
the direct product of groups or of rings. This is (equally rarely) called the “external”
then V is naturally isomorphic with the external direct sum of U and W .
We come now to projection operators. Suppose that V = U ⊕ W .
Define P : V → V as follows.
For v ∈ V write v = u + w where u ∈ U and w ∈ W and then define
P v := u. Strictly P depends on the ordered pair (U, W ) of summands of V , but to
keep things simple we will not build this dependence into the notation.
OBSERVATIONS:
(1) P is well-defined;
(2) P is linear;
(3) Im P = U , Ker P = W ;
Proofs. That P is well-defined is an immediate consequence of the existence and
uniqueness of the decomposition v = u + w with u ∈ U , v ∈ V .
For (3) it is clear from the definition that Im P ⊆ U ; but if u ∈ U then u = P u,
and therefore Im P = U . Similarly, it is clear that W ⊆ Ker P ;
but if v ∈ Ker P then
v = 0 + w for some w ∈ W , and therefore Ker P = W .
Finally, if v ∈ V and we write v = u + w with u ∈ U and w ∈ W then
Terminology: the operator (linear transformation) P is called the projection of V onto U along W.
Note 2. If P is the projection onto U along W then I − P , where I : V → V is the identity transformation,
is the projection onto W along U .
Note 3. If P is the projection onto U along W then u ∈ U if and only if P u = u. The fact that if u ∈ U then
P u = u is immediate from the definition of P , while if P u = u then obviously u ∈ Im P = U .
Our next aim is to characterise projection operators algebraically. It turns out that Observation (4) above is the key:
TERMINOLOGY :
THEOREM : A linear transformation is a projection operator if and only if it is idempotent.
Proof: We have seen already that every projection is idempotent, so the problem is to prove that an idempotent
linear transformation is a projection operator. Suppose that P : V → V is linear and idempotent. Define
To finish the proof we need to convince ourselves that P is the projection onto U along W .
For v ∈ V write v = u + w where u ∈ U and w ∈ W . Since u ∈ U there must exist x ∈ V such that
u = P x. Then
and therefore P is indeed the projection onto U along W , as we predicted.
The next result is a theorem which turns out to be very useful in many situations.
It is particularly important in applications of linear algebra in Quantum Mechanics.
THEOREM :
Proof : Suppose first that P T = T P . If u ∈ U , so that P u = u, then T u = T P u = P T u ∈ U , so T U 6 U .
Now suppose conversely that U and W are T -invariant. Let v ∈ V and write
v = u + w with u ∈ U and w ∈ W . Then
P T v = P T (u + w) = P (T u + T w) = T u ,
since T u ∈ U and T w ∈ W . Also,
T P v = T P (u + w) = T u .
Thus P T v = T P v for all v ∈ V and therefore P T = T P , as asserted.
Note 1: If k = 2 then this reduces to exactly the same concept as we have just been
studying.
Note 2:
satisfy
U 1 ∩ U 2 = U 1 ∩ U 3 = U 2 ∩ U 3 = {0}
and yet it is clearly not true that
is their direct sum.
EXAMPLE : If P is a projection then {P, I − P } is a partition of the identity.
THEOREM:
PROOF:
LINEAR FUNCTIONS AND DUAL SPACES
Note: It is a matter of important routine to check that the vector space axioms are
satisfied. It is also important that, when invited to define the dual space of a vector
space, you specify not only the set, but also the operations which make that set into a vector space.
THEOREM :
To finish the proof we must show that they form a basis of V ′ .
To see that they are independent suppose that
Let V be a vector space over the field F . For a subset X of V the annihilator is defined by
THEOREM
Let V be a finite-dimensional vector space over a field F and let U be
a subspace. Then
Proof:
To finish this study of dual spaces we examine the second dual, that is, the dual of
the dual. It turns out that if V is a finite-dimensional vector space then the second dual
V ′′ can be naturally identified with V itself.
THEOREM
In the next article I will be writing about 3 dimensional implementation of Dual Spaces and Eigen Values & Eigenvectors.
to be continued.