Understanding Compression of Convolutional Neural Nets: Part 4

Oct 02, 2023

This is the fourth post of a five part series on compressing convolutional neural networks. As indicated in the previous blogs of this series, sometimes you have a resource crunch and you need to reduce the number of parameters of a trained neural network via compression to work with available resources. In my earlier

Understanding Compression of Convolutional Neural Nets: Part 3

Ishwar Sethi

September 25, 2023

Read full story

blog post on this topic, I had shown how to reduce the number of connection weights of a fully connected layer of a convolutional neural network (CNN). In this post, I am going to illustrate how to reduce the number of parameters of a trained convolution layer. The way to do this is to use tensor decomposition which you can view as a technique analogous to matrix decomposition used for fully connected layer compression. For a detailed introduction on tensors and decomposition, please refer to my three part series on this topic. Herein, I will only provide a brief introduction to tensors and their decomposition to show how tensor decomposition can be used to reduce the number of parameters in a trained convolution layer.

Visit for more content

A Brief Tensor Refresher

A tensor is a multidimensional or N-way array. The array dimensionality, the value of N, specifies the tensor order or the number of tensor modes. We access the elements of a real-valued tensor

\({ {\mathcal{T}}\in {\mathbb{R}^{I_1\times I_2 \times \cdots I_K}}}\)

of order K using K indices as t_i1,i2,...ik. Thus, a color image is a tensor of order three or with three modes. Before proceeding any further, let’s look at the following code wherein we use outer product of three vectors to generate a rank 1 tensor. Before proceeding any further, let’s look at the following code wherein we use outer product of three vectors to generate a rank 1 tensor.

import numpy as np
a1 = np.array([1,2,3])
b1 = np.array([4,5,6])
c1 = np.array([7,8,9])
t1 = np.outer(np.outer(a1,b1),c1).reshape(3,3,3)
print(t1)

The above code produces the following tensor of order three:

[[[ 28  32  36]
  [ 35  40  45]
  [ 42  48  54]]

 [[ 56  64  72]
  [ 70  80  90]
  [ 84  96 108]]

 [[ 84  96 108]
  [105 120 135]
  [126 144 162]]]

Analogous to a color image, this tensor has three planes or slices. Now if we have another set of three vectors and perform the outer product calculations, we will end up with another rank 1 tensor of order three. We can then add these two tensors to generate another tensor of order three. This tensor will be rank 2 tensor because it was generated via combining two rank 1 tensors. Of course, we need not limit ourselves to outer products of triplets of vectors. We can use outer products of quadruplets of vectors to create rank 1 tensors of order four and by mixing r such tensors, we can get tensors of rank r and order k and so on. Lets now think of the reverse problem. Suppose we have a tensor of order k. Is it then possible to obtain rank 1 tensors whose summation corresponds to the given tensor? Dealing with this problem is what we do in the tensor decomposition method known as CANDECOMP/PARAFAC decomposition, or simply the CP decomposition.

The CP Decomposition

The CP decomposition factorizes a tensor into a linear combination of rank one tensors. Thus, a tensor of order 3 can be decomposed as a sum of R rank one tensors as

\({\mathcal{T}}\approx \sum\limits_{r=1}\limits^{R}\bf{a}^{r}\circ \bf{b}^{r}\circ \bf{c}^{r},\)

where (∘) represents the outer product. This is illustrated in the figure below.

The CP decomposition is sometimes expressed in the form of factor matrices where the vectors from the rank one tensor components are combined to form factor matrices. For the decomposition expression shown above, the three factor matrices A, B, and C will be formed as shown below:

\( \bf{A} = [\bf{a}^{1}\text{ }\bf{a}^{2}\text{ }\cdots \bf{a}^{R}],\text{ } \bf{B} = [\bf{b}^{1}\text{ }\bf{b}^{2}\text{ }\cdots \bf{b}^{R}],\text{and } \bf{C} = [\bf{c}^{1}\text{ }\bf{c}^{2}\text{ }\cdots \bf{c}^{R}]\)

Often, the vectors in rank one tensors are normalized to unit length. In such cases, the CP decomposition is expressed as

\({\mathcal{T}}\approx \sum\limits_{r=1}\limits^{R}\lambda_r\bf{a}^{r}\circ \bf{b}^{r}\circ \bf{c}^{r},\)

where λ_r is a scalar accounting for normalization. With such a decomposition, a tensor element x_ijk can be approximated as

\( t_{ijk}\approx \hat{t}_{ijk} = \sum\limits_{r=1}\limits^{R}\lambda_r a^{r}_i b^{r}_j c^{r}_k\)

The decomposition is performed by a minimization algorithm, known as alternating least squares (ALS). The basic gist of this algorithm is to keep certain components of the solution fixed while finding the remaining components and then iterating the procedure by switching the components to be kept fixed.

With the above background about tensor decomposition, we are ready to perform convolution layer compression using CP decomposition. But you will have to wait for the next installment of Exploration in ML & AI Newsletter. 😀

Share Explorations in ML & AI

Explorations in ML & AI

Understanding Compression of Convolutional Neural Nets: Part 4

Understanding Compression of Convolutional Neural Nets: Part 3

A Brief Tensor Refresher

The CP Decomposition

Discussion about this post