MobileNetV2 architecture
We have explored MobileNet V2 architecture in depth. MobileNet V2 model has 53 convolution layers and 1 AvgPool with nearly 350 GFLOP. It has two main components:
- Inverted Residual Block
- Bottleneck Residual Block
There are two types of Convolution layers in MobileNet V2 architecture:
- 1x1 Convolution
- 3x3 Depthwise Convolution
These are the two different components in MobileNet V2 model:
Each block has 3 different layers:
- 1x1 Convolution with Relu6
- Depthwise Convolution
- 1x1 Convolution without any linearity
There are Stride 1 Blocks and Stride 2 Blocks. The internal components of the two blocks are as follows:
Stride 1 Block:
- Input
- 1x1 Convolution with Relu6
- Depthwise Convolution with Relu6
- 1x1 Convolution without any linearity
- Add
Stride 2 Block:
- Input
- 1x1 Convolution with Relu6
- Depthwise Convolution with stride=2 and Relu6
- 1x1 Convolution without any linearity
Layers in MobileNetV2
# | Op | Expansion | Repeat |
1 | Convolution | - | 1 |
2 | Bottleneck | 1 | 1 |
3 | Bottleneck | 6 | 2 |
4 | Bottleneck | 6 | 3 |
5 | Bottleneck | 6 | 4 |
6 | Bottleneck | 6 | 3 |
7 | Bottleneck | 6 | 3 |
8 | Bottleneck | 6 | 1 |
9 | Convolution | - | 1 |
10 | AvgPool | - | 1 |
11 | Convolution | - | 1 |
Bottleneck is either Inverted Residual Block or Bottleneck Residual Block or Stride 1 or Stride 2 block.
Convolutions in MobileNetV2
Following is the list of the 53 Convolution layers in MobileNetV2 architecture with details of different parameters like Input height, Input width, Kernel height and more:
# Conv | Input H/W | Input C | Kernel H/W | Stride H/W | Padding H/W | Output H/W | Output C |
1 | 224 | 3 | 3 | 2 | 0 | 112 | 32 |
2 | 112 | 32 | 3 | 1 | 1 | 112 | 32 |
3 | 112 | 32 | 1 | 1 | 0 | 112 | 16 |
4 | 112 | 16 | 1 | 1 | 0 | 112 | 96 |
5 | 112 | 96 | 3 | 2 | 0 | 56 | 96 |
6 | 56 | 96 | 1 | 1 | 0 | 56 | 24 |
7 | 56 | 24 | 1 | 1 | 0 | 56 | 144 |
8 | 56 | 144 | 3 | 1 | 1 | 56 | 144 |
9 | 56 | 144 | 1 | 1 | 0 | 56 | 24 |
10 | 56 | 24 | 1 | 1 | 0 | 56 | 144 |
11 | 56 | 144 | 3 | 2 | 0 | 28 | 144 |
12 | 28 | 144 | 1 | 1 | 0 | 28 | 32 |
13 | 28 | 32 | 1 | 1 | 0 | 28 | 192 |
14 | 28 | 192 | 3 | 1 | 1 | 28 | 192 |
15 | 28 | 192 | 1 | 1 | 0 | 28 | 32 |
16 | 28 | 32 | 1 | 1 | 0 | 28 | 192 |
17 | 28 | 192 | 3 | 1 | 1 | 28 | 192 |
18 | 28 | 192 | 1 | 1 | 0 | 28 | 32 |
19 | 28 | 32 | 1 | 1 | 0 | 28 | 192 |
20 | 28 | 192 | 3 | 2 | 0 | 14 | 192 |
21 | 14 | 192 | 1 | 1 | 0 | 14 | 64 |
22 | 14 | 64 | 1 | 1 | 0 | 14 | 384 |
23 | 14 | 384 | 3 | 1 | 1 | 14 | 384 |
24 | 14 | 384 | 1 | 1 | 0 | 14 | 64 |
25 | 14 | 64 | 1 | 1 | 0 | 14 | 384 |
26 | 14 | 384 | 3 | 1 | 1 | 14 | 384 |
27 | 14 | 384 | 1 | 1 | 0 | 14 | 64 |
28 | 14 | 64 | 1 | 1 | 0 | 14 | 384 |
29 | 14 | 384 | 3 | 1 | 1 | 14 | 384 |
30 | 14 | 384 | 1 | 1 | 0 | 14 | 64 |
31 | 14 | 64 | 1 | 1 | 0 | 14 | 384 |
32 | 14 | 384 | 3 | 1 | 1 | 14 | 384 |
33 | 14 | 384 | 1 | 1 | 0 | 14 | 96 |
34 | 14 | 96 | 1 | 1 | 0 | 14 | 576 |
35 | 14 | 576 | 3 | 1 | 1 | 14 | 576 |
36 | 14 | 576 | 1 | 1 | 0 | 14 | 96 |
37 | 14 | 96 | 1 | 1 | 0 | 14 | 576 |
38 | 14 | 576 | 3 | 1 | 1 | 14 | 576 |
39 | 14 | 576 | 1 | 1 | 0 | 14 | 96 |
40 | 14 | 96 | 1 | 1 | 0 | 14 | 576 |
41 | 14 | 576 | 3 | 2 | 0 | 7 | 576 |
42 | 7 | 576 | 1 | 1 | 0 | 7 | 160 |
43 | 7 | 160 | 1 | 1 | 0 | 7 | 960 |
44 | 7 | 960 | 3 | 1 | 1 | 7 | 960 |
45 | 7 | 960 | 1 | 1 | 0 | 7 | 160 |
46 | 7 | 160 | 1 | 1 | 0 | 7 | 960 |
47 | 7 | 960 | 3 | 1 | 1 | 7 | 960 |
48 | 7 | 960 | 1 | 1 | 0 | 7 | 160 |
49 | 7 | 160 | 1 | 1 | 0 | 7 | 960 |
50 | 7 | 960 | 3 | 1 | 1 | 7 | 960 |
51 | 7 | 960 | 1 | 1 | 0 | 7 | 320 |
52 | 7 | 320 | 1 | 1 | 0 | 7 | 1280 |
53 | 1 | 1280 | 1 | 1 | 0 | 1 | 1001 |
The parameters of each Convolution layer in order are:
- Input Height and width
- Input Channel
- Kernel Height and Width
- Stride Height/ Width
- Padding Height/ Width
- Output Height/ Width
- Output Channel
With this, you have the complete idea about the architecture of MobileNetV2 model. Enjoy.