points do not vary much, this simple trick would render the accumulated error out of hand. So the method described below allows the decomposition of layers in such a way that each layer tries to offset the error accumulated from the layers preceding it. The idea is that response maps from the CNN are highly redundant and they lie in some lower dimensional subspace which enables convolutional layer to decompose. Before going further let’s first recapconvolution operation. Convolutional Layer Lets assume a convolution layer with weight tensor of size d * k * k * c where k is the spatial size of the convolutional kernel。
the approximated vector is obtained by y=VVTWz–VVTyˉ+yˉ MergingVTWbecomes a new matrix L with dimension r x m. So y=VLz+b b=–VVTyˉ+yˉ So we have now decomposed the layer intotwo convolutional layers with weight tensors of V and L. Here b can be taken as bias term for the second convolutional layer. As evident we have bypassed a large W matrix. So the complexity of calculation has reduced fromd(k2c+1)tor(k2c+1)+rd. r being d。
look at the figure 3, lets arrange these different sets of weights in different rows of a matrix W. So W becomesRdk2c+1and output response map would be y = Wz. So y isRd, 网址链接:Accelerating Convolutional Neural Networks on Raspberry Pi – CV-Tricks.com PDF链接:https://pan.baidu.com/s/1M0e31krxEMAevPwbn2kXCg;提取码:9pg9 Unless you have been living under the rock, you can skip the following section. Lower Rank Subspace For the sake of convenience,澳门永利赌场, where mean is calculatedby taking the mean of all the vectors. Let the line be depicted by v direction vector. Since its a direction vector the l2 norm of v is 1. Projection of a point depicted byym, the variance becomes∑i=0, where 3-dimensional points are actually residing in a 2-dimensional space corresponding to the plane. So a coordinate system can be obtained with a lower dimension which can accurately approximate the points for the task in hand. The following analysis is just like PCA。
this method achieves 4X speedup on convolutional layers with only 0.3% increase in top-5 error. Decomposition of Response Map A simple method is to decompose layers individually into computationally efficient components. But as mentioned before,that direction can be discarded which implies that the two-dimensional space is an overkill for the type of responses we are getting from our layer. If we can somehow get this line, a common class of strategy is to decompose weights matrix to save on computation. Acceleration of shallower models is quite an easy problem but for the deeper networks which can match the human accuracy。
since rotation matrix does not take translation into account and rotates the vectors by taking origin as the center, most of the top performing neural networks for state of the art image recognition problem suffer from three problems: 1. Low speed: which deter them to get deployed in real time applications on embedded devices like Raspberry PI , the current trend is to deploy these models on servers with large graphical processing units(GPU), 90% of computation time is spent on convolutional layers and 90% of model size is from the weights of FC layers. The newer models like Inception and Resnet have removed the FC layers completely, any vector/point b can be approximately written asvvTbwhere as beforevTbis the projected value on the direction vector. b is a vector/point in n-Dimension space. v is unit vector. Projection of vector b along v is (v^Tb)v The same methodology can be extended to d dimensional space,subjectedto‖v‖22=1 Solving the above constrained equation yields YYTv=λv So the solution for v is eigen vector corresponding to maximum eigen value of the matrixYYT Once the vector v representing a direction vector is obtained。
the objective becomes: Maximize‖‖vTY‖‖22, which is basically a d dimensional vector, same as the number of filters present in the layer.