Understanding Compression of Convolutional Neural Nets: Part 4
This is the fourth post of a five part series on compressing convolutional neural networks. As indicated in the previous blogs of this series, sometimes you have a resource crunch and you need to reduce the number of parameters of a trained neural network via compression to work with available resources. In my earlier
blog post on this topic, I had shown how to reduce the number of connection weights of a fully connected layer of a convolutional neural network (CNN). In this post, I am going to illustrate how to reduce the number of parameters of a trained convolution layer. The way to do this is to use tensor decomposition which you can view as a technique analogous to matrix decomposition used for fully connected layer compression. For a detailed introduction on tensors and decomposition, please refer to my three part series on this topic. Herein, I will only provide a brief introduction to tensors and their decomposition to show how tensor decomposition can be used to reduce the number of parameters in a trained convolution layer.
A Brief Tensor Refresher
A tensor is a multidimensional or N-way array. The array dimensionality, the value of N, specifies the tensor order or the number of tensor modes. We access the elements of a real-valued tensor
of order K using K indices as ti1,i2,...ik. Thus, a color image is a tensor of order three or with three modes. Before proceeding any further, let’s look at the following code wherein we use outer product of three vectors to generate a rank 1 tensor. Before proceeding any further, let’s look at the following code wherein we use outer product of three vectors to generate a rank 1 tensor.
import numpy as np
a1 = np.array([1,2,3])
b1 = np.array([4,5,6])
c1 = np.array([7,8,9])
t1 = np.outer(np.outer(a1,b1),c1).reshape(3,3,3)
print(t1)
The above code produces the following tensor of order three:
[[[ 28 32 36]
[ 35 40 45]
[ 42 48 54]]
[[ 56 64 72]
[ 70 80 90]
[ 84 96 108]]
[[ 84 96 108]
[105 120 135]
[126 144 162]]]
Analogous to a color image, this tensor has three planes or slices. Now if we have another set of three vectors and perform the outer product calculations, we will end up with another rank 1 tensor of order three. We can then add these two tensors to generate another tensor of order three. This tensor will be rank 2 tensor because it was generated via combining two rank 1 tensors. Of course, we need not limit ourselves to outer products of triplets of vectors. We can use outer products of quadruplets of vectors to create rank 1 tensors of order four and by mixing r such tensors, we can get tensors of rank r and order k and so on. Lets now think of the reverse problem. Suppose we have a tensor of order k. Is it then possible to obtain rank 1 tensors whose summation corresponds to the given tensor? Dealing with this problem is what we do in the tensor decomposition method known as CANDECOMP/PARAFAC decomposition, or simply the CP decomposition.
The CP Decomposition
The CP decomposition factorizes a tensor into a linear combination of rank one tensors. Thus, a tensor of order 3 can be decomposed as a sum of R rank one tensors as
where (∘) represents the outer product. This is illustrated in the figure below.
The CP decomposition is sometimes expressed in the form of factor matrices where the vectors from the rank one tensor components are combined to form factor matrices. For the decomposition expression shown above, the three factor matrices A, B, and C will be formed as shown below:
Often, the vectors in rank one tensors are normalized to unit length. In such cases, the CP decomposition is expressed as
where λr is a scalar accounting for normalization. With such a decomposition, a tensor element xijk can be approximated as
The decomposition is performed by a minimization algorithm, known as alternating least squares (ALS). The basic gist of this algorithm is to keep certain components of the solution fixed while finding the remaining components and then iterating the procedure by switching the components to be kept fixed.
With the above background about tensor decomposition, we are ready to perform convolution layer compression using CP decomposition. But you will have to wait for the next installment of Exploration in ML & AI Newsletter. 😀