Matrices

<< Click to Display Table of Contents >>

Navigation:  Key functions and expressions >

Matrices

In the following subsections we provide a very brief summary of the main types of matrices and matrix expressions. For more details readers should refer to any good introductory text on matrix algebra. Basic matrix algebra is relatively simple, but manually performing the computations is tedious even for small matrices. However, the matrix format is well-suited to computer processing, and in general all matrix operations most readers are likely to encounter will involve the explicit or implicit manipulation of matrices by pre-built software tools.

Statistical methods make extensive use of matrix algebra, especially in connection with the various different forms of regression models, in the computation of variance-covariance matrices, and in the context of more specialized procedures such as Markov Chain analysis. Various forms of random matrices are also encountered, as are matrices with positive fractional row entries that sum to 1 (known as row standardization). These occur in a number of situations, mostly notably as stochastic matrices (which are square) and row-standardized. Statistical software packages, as opposed to general purpose mathematical tools, perform matrix algebra as an intrinsic part of their operations, but generally do not provide low-level matrix functions such as matrix inversion.

Matrices are rectangular arrangements of numerical data. They provide a very convenient format for summarizing key elements of algebraic expressions, in particular, the coefficients in systems of linear equations. The variables in such systems of equations are typically represented by a matrix with one row or column, usually described as a vector, with entries that are variables rather than numbers. Bold font is generally used to denote matrices and vectors, indicating that they contain many elements. The density of a matrix is the proportion of non-zero entries — a sparse matrix is one with a relatively low density and special data storage and computational procedures exist for sparse matrices. This is particularly important when processing large datasets, where matrix sparsity is common.

To clarify the basic operations, let A, B and C be three matrices, each having the same number of rows, i, and columns, j, as one another. For example:

Then the following definitions and rules apply:

A+B and A-B : the cell entries are the sum (or difference) of the matching cell entries of A and B, i.e. [aij]+[bij]=[aij+bij]

A(BC)=AB(C) : the Associative property

A(B+C)=AB+AC : the Distributive property

ABBA : the Commutative property may not hold (in general it will not be valid) and thus the order of such operations is important

Multiplication of the matrix by a fixed scalar, s, sA, is defined as each element of A being multiplied by s, i.e. sA=[saij]

AA-1=A-1A=I : this rules states that if you pre- or post-multiply a matrix by its inverse (assuming its inverse exists and can be found) then the result is the unit matrix or Identity matrix — essentially the matrix equivalent of the number 1 in conventional algebra.

A one column or one row matrix is called a vector. If x is a column vector of n elements and I is an nxn identity matrix, then Ix=x.

Suppose we have a system of 3 linear equations, of the form:

then this can be represented in matrix form as:

Ax=b or b=Ax

where A is a 3x3 matrix, and x and b are column vectors, typically with A and b known and x as unknown:

The multiplication rule for the Ax element is defined such that the elements in each of the rows of A are multiplied by the column elements of x. For example:

matrixmultiply

The solution to the matrix equation (if one exists) provides values for the elements of x, and is of the form:

x=A-1b where A-1 is the matrix inverse of A. This follows from the process of finding A-1 and then pre-multiplying both sides of the expression by A-1 to give A-1Ax=A-1b

Although a small set of such functions are provided in spreadsheet packages, such as Excel, these are awkward to use and it is preferable to employ a more powerful package when conducting analyses that require this type of operation. Statistical toolsets, such as R and S, and general purpose packages such as MATLab, Maple and Mathematica, provide a wide range of matrix functions. For example, in most instances a function like Det(A) or Inv(A) will be provided to directly compute the determinant or inverse of a matrix A. For example, if A is the matrix illustrated above, Det(A)=12 and Inv(A) is:

If computed manually, or using a symbolic mathematics package such as Maple to compute the inverse, the exact (fractional) inverse may be obtained:

General purpose statistical packages, such as SPSS and STATA, do provide basic matrix operations such as addition and multiplication by a scalar, but are not designed to handle operations like inversion as a distinct function. Such operations take place within the framework of specific statistical modeling facilities, such as regression modeling. The size of problem that can be addressed in all instances is rarely stated, but is a very real issue. Whilst it is entirely possible for most packages to support basic operations on large datasets, more complex operations such as matrix inversion, may prove to be extremely slow, impossible or not useful in the context of statistical analysis. The current consensus appears to be that analysis of large datasets (perhaps millions of records and hundreds or thousands of variables) require the use of very different approaches, particularly focusing on: data mining; identification of outliers; data quality issues; data abstraction and summary tools; advanced visualization; and data subsetting (sampling).

Identity Matrix and Zero Matrix

The identity matrix is a square matrix with diagonal elements 1 and off-diagonal elements 0

The empty, zero or null matrix is a square matrix with all elements 0.

Matrix powers

A square matrix, A, can be raised to a power, by straightforward multiplication: AA=A2. More generally any integer power may be generated, to yield the exponent n: An. When such matrices are stochastic, i.e. their rows sum to 1, results of particular interest to statistical analysis are obtained, as for example apply in Markov Chain analysis. The powers of a stochastic matrix identify probabilistic paths, as we explain below, by reference to simple results from applying matrix algebra to planar graphs (a graph that can be drawn in the plane without its lines crossing).

A square matrix can be used to represent origin-destination information, such as the distances or times between a set of location pairs, or the binary connectivity between these pairs. Thus there is a relationship between a square matrix and a network diagram or graph. This observation is significant, because algebraic operations on matrices can equate to meaningful operations on graphs, and vice versa.

In the simplest (non-statistical) case the cell entries could be binary, with a 1 indicating a direct connection exists between two nodes of the graph and a 0 indicating no direct connection. We can use this idea to represent the structure of a road or telecommunications network, as shown below. In this case each entry in the matrix provides information on the network links (or edges) — if two places (or vertices) are linked they are given a value of 1, or a value of 0 to indicate that they are not directly connected (in this example places are not regarded as linked to themselves, so the diagonal entries are all set to 0). Rows indicate From and the columns indicate To. Because there are no one-way links shown the connectivity matrix is symmetric — its upper and lower sections either side of the diagonal from top left (shown shaded) are mirror images of each other. We can use this matrix representation to identify the shortest paths through a road network.

Graph and matrix representation of a road network

Simple road network

Binary connectivity matrix

clip0032

0

1

0

0

1

0

1

0

1

0

1

0

0

1

0

1

1

1

0

0

1

0

0

1

1

1

1

0

0

1

0

0

1

1

1

0

 

Let us denote the binary connectivity matrix by the letter C and calculate C2=C*C. First we multiply row 1 (all routes from location 1) by column 1 (all routes to location 1) to get cell position c11=0*0+1*1+0*0+0*0+1*1+0*0=2. This gives us the number of ways in which we can leave location 1 and get back to it in 2 steps — which can be confirmed from the diagram (from 121 and 151). The non-zero elements of the cross-multiplication identify the path elements.

Now we calculate the next cell entry in the same way, row 1 column 2, or c12=0*1+1*0+0*1+0*0+1*1+0*0=1, so there is just 1 way of getting from location 1 to location 2 in two steps — this is the route from 152. More generally, if we compute the whole of C2 we get a matrix showing all the possible two-step routes from every location to every other one. Likewise, from C3 we get all 3-step routes and we can continue this process until there are no cells in C that contain 0’s. This occurs when every point is reached from every other point, and is called the diameter of the network (or sometimes the solution time). The results are shown below:

C1

C2

C3

0

1

0

0

1

0

1

0

1

0

1

0

0

1

0

1

1

1

0

0

1

0

0

1

1

1

1

0

0

1

0

0

1

1

1

0

2

1

2

0

1

1

1

3

1

1

2

2

2

1

4

1

2

2

0

1

1

2

2

1

1

2

2

2

4

1

1

2

2

1

1

3

2

5

3

3

6

3

5

4

8

3

7

4

3

8

6

6

9

7

3

3

6

2

3

5

6

7

9

3

6

8

3

4

7

5

8

4

By the time 3-steps links have been made all the cells contain non-zero values, so the diameter of the network is 3 and we stop there. In these matrices we have shaded cells if they contain a value>0 having had a 0 entry in the previous matrix. These identify the length (in terms of steps) of the shortest paths from any location to any other. For example, consider the set of routes from location 1 in our original diagram. There are 3 different ways in which the route from location 1 to location 4 can be achieved (1234, 1534 and 1564) and this is indicated by the number 3 in the cell of C3 that has been shaded. If the three matrices are summed, the entries show the number of ways of arriving at a given node in 1, 2 or 3 steps. So for node 3, starting at node 3, the total number of paths is 0+4+6=10 (some of these are the same path in the reverse direction).

Computing higher order powers of large matrices is not, in practice, a particularly efficient means of identify the number and location of alternative routes through a network, but it does show that matrix powers can provide useful information about paths and also, that the processing of powering a matrix can have a meaningful stopping point — in this example, at the 3rd power. Again, this has applications to stochastic matrices, assuming they meet various criteria.

Determinant

Determinants are a numeric value obtained from cross-multiplication of all the elements in a matrix. They are only defined for square matrices. Let A be an n by n matrix with elements [aij]. The matrix Mij here is a subset of A known as the minor, formed by eliminating row i and column j from A. An n by n matrix, A, with Det=0 is described as singular, and such a matrix has no inverse. If Det(A) is very close to 0 it is described as ill-conditioned.

The standard notation for the determinant of a matrix is two vertical lines (like the absolute value or magnitude): |A|, or Det(A). More formally:

Matrix Inverse

The matrix equivalent of division in conventional algebra. For a matrix, A, to be invertible its determinant must be non-zero, and ideally not very close to zero. A matrix that has an inverse is by definition non-singular. A symmetric real-valued matrix is referred to as being positive definite if all its eigenvalues are positive, whereas a positive semi-definite matrix allows for some eigenvalues to be 0. A matrix, A, that is invertible satisfies the relation AA‑1=I

Matrix Transpose

A matrix operation in which the rows and columns are transposed, i.e. in which elements aij are swapped with aji for all i and j. The inverse of a transposed matrix is the same as the transpose of the matrix inverse. The symbology used is AT or A'. Note that the following relation holds:

(AT)–1=(A‑1)T

Symmetric Matrix

A matrix in which element aij = aji for all i and j. For symmetric matrices, which must by definition be square, the matrix transpose is the same as the matrix,hence A=AT

Trace of a Matrix

The trace of a (square) matrix is defined as the sum of the diagonal elements of a matrix, aii, and is generally denoted as Tr(A)

Eigenvalues and Eigenvectors of Matrices

If A is a real-valued k by k square matrix and x is a non-zero real-valued vector, then a scalar λ that satisfies the equation:

(AλI)x=0, or the equivalent expression Ax=λx

is known as an eigenvalue of A and x is an eigenvector of A. There are k eigenvalues of A, each with a corresponding eigenvector. The matrix A can be decomposed into three parts:

A=EDE‑1 (diagonalization)

where E is a matrix of its eigenvectors and D is a diagonal matrix of its eigenvalues.

Partitioned matrices

A matrix A can be partitioned into a series of sub-matrices, which may be square or rectangular. Operations, such as multiplication, may then be applied to the partitioned matrix as if the partitions were elements. For example, given two matrices A and B, where:

Jacobian

Let y=f(x) be a set of n functions, typically in n variables of the form:

then the Jacobian matrix is the matrix of first partial differentials of the function (if these exist):

The Jacobian provides information on the gradient of a multivariate function, which is useful in gradient search optimization amongst other applications. The determinant of J is sometimes also called the Jacobian.

see further: https://en.wikipedia.org/wiki/Jacobian_matrix

Hessian

Let y=f(x) be a set of n functions in n variables of the form:

then the Hessian matrix is the matrix of second order partial differentials of the function (if these exist):

The Hessian provides information on the curvature of a multivariate function and hence is useful in determining the location of optima as part of an optimization procedure. As with the Jacobian (see below) the term is also used to refer to its determinant.

see further: https://en.wikipedia.org/wiki/Hessian_matrix