MobileNetV2 architecture

We have explored MobileNet V2 architecture in depth. MobileNet V2 model has 53 convolution layers and 1 AvgPool with nearly 350 GFLOP. It has two main components:

  • Inverted Residual Block
  • Bottleneck Residual Block

There are two types of Convolution layers in MobileNet V2 architecture:

  • 1x1 Convolution
  • 3x3 Depthwise Convolution

These are the two different components in MobileNet V2 model:

conv_mobilenet_v2

Each block has 3 different layers:

  • 1x1 Convolution with Relu6
  • Depthwise Convolution
  • 1x1 Convolution without any linearity

There are Stride 1 Blocks and Stride 2 Blocks. The internal components of the two blocks are as follows:

stride_block_mobilenet

Stride 1 Block:

  • Input
  • 1x1 Convolution with Relu6
  • Depthwise Convolution with Relu6
  • 1x1 Convolution without any linearity
  • Add

Stride 2 Block:

  • Input
  • 1x1 Convolution with Relu6
  • Depthwise Convolution with stride=2 and Relu6
  • 1x1 Convolution without any linearity

Layers in MobileNetV2

# Op Expansion Repeat
1 Convolution - 1
2 Bottleneck 1 1
3 Bottleneck 6 2
4 Bottleneck 6 3
5 Bottleneck 6 4
6 Bottleneck 6 3
7 Bottleneck 6 3
8 Bottleneck 6 1
9 Convolution - 1
10 AvgPool - 1
11 Convolution - 1

Bottleneck is either Inverted Residual Block or Bottleneck Residual Block or Stride 1 or Stride 2 block.

Convolutions in MobileNetV2

Following is the list of the 53 Convolution layers in MobileNetV2 architecture with details of different parameters like Input height, Input width, Kernel height and more:

# Conv Input H/W Input C Kernel H/W Stride H/W Padding H/W Output H/W Output C
1 224 3 3 2 0 112 32
2 112 32 3 1 1 112 32
3 112 32 1 1 0 112 16
4 112 16 1 1 0 112 96
5 112 96 3 2 0 56 96
6 56 96 1 1 0 56 24
7 56 24 1 1 0 56 144
8 56 144 3 1 1 56 144
9 56 144 1 1 0 56 24
10 56 24 1 1 0 56 144
11 56 144 3 2 0 28 144
12 28 144 1 1 0 28 32
13 28 32 1 1 0 28 192
14 28 192 3 1 1 28 192
15 28 192 1 1 0 28 32
16 28 32 1 1 0 28 192
17 28 192 3 1 1 28 192
18 28 192 1 1 0 28 32
19 28 32 1 1 0 28 192
20 28 192 3 2 0 14 192
21 14 192 1 1 0 14 64
22 14 64 1 1 0 14 384
23 14 384 3 1 1 14 384
24 14 384 1 1 0 14 64
25 14 64 1 1 0 14 384
26 14 384 3 1 1 14 384
27 14 384 1 1 0 14 64
28 14 64 1 1 0 14 384
29 14 384 3 1 1 14 384
30 14 384 1 1 0 14 64
31 14 64 1 1 0 14 384
32 14 384 3 1 1 14 384
33 14 384 1 1 0 14 96
34 14 96 1 1 0 14 576
35 14 576 3 1 1 14 576
36 14 576 1 1 0 14 96
37 14 96 1 1 0 14 576
38 14 576 3 1 1 14 576
39 14 576 1 1 0 14 96
40 14 96 1 1 0 14 576
41 14 576 3 2 0 7 576
42 7 576 1 1 0 7 160
43 7 160 1 1 0 7 960
44 7 960 3 1 1 7 960
45 7 960 1 1 0 7 160
46 7 160 1 1 0 7 960
47 7 960 3 1 1 7 960
48 7 960 1 1 0 7 160
49 7 160 1 1 0 7 960
50 7 960 3 1 1 7 960
51 7 960 1 1 0 7 320
52 7 320 1 1 0 7 1280
53 1 1280 1 1 0 1 1001

The parameters of each Convolution layer in order are:

  • Input Height and width
  • Input Channel
  • Kernel Height and Width
  • Stride Height/ Width
  • Padding Height/ Width
  • Output Height/ Width
  • Output Channel

With this, you have the complete idea about the architecture of MobileNetV2 model. Enjoy.