MobileNetV2 architecture

We have explored MobileNet V2 architecture in depth. MobileNet V2 model has 53 convolution layers and 1 AvgPool with nearly 350 GFLOP. It has two main components:

Inverted Residual Block
Bottleneck Residual Block

There are two types of Convolution layers in MobileNet V2 architecture:

1x1 Convolution
3x3 Depthwise Convolution

These are the two different components in MobileNet V2 model:

conv_mobilenet_v2

Each block has 3 different layers:

1x1 Convolution with Relu6
Depthwise Convolution
1x1 Convolution without any linearity

There are Stride 1 Blocks and Stride 2 Blocks. The internal components of the two blocks are as follows:

stride_block_mobilenet

Stride 1 Block:

Input
1x1 Convolution with Relu6
Depthwise Convolution with Relu6
1x1 Convolution without any linearity
Add

Stride 2 Block:

Input
1x1 Convolution with Relu6
Depthwise Convolution with stride=2 and Relu6
1x1 Convolution without any linearity

Layers in MobileNetV2


#	Op	Expansion	Repeat
1	Convolution	-	1
2	Bottleneck	1	1
3	Bottleneck	6	2
4	Bottleneck	6	3
5	Bottleneck	6	4
6	Bottleneck	6	3
7	Bottleneck	6	3
8	Bottleneck	6	1
9	Convolution	-	1
10	AvgPool	-	1
11	Convolution	-	1

Bottleneck is either Inverted Residual Block or Bottleneck Residual Block or Stride 1 or Stride 2 block.

Convolutions in MobileNetV2

Following is the list of the 53 Convolution layers in MobileNetV2 architecture with details of different parameters like Input height, Input width, Kernel height and more:


# Conv	Input H/W	Input C	Kernel H/W	Stride H/W	Padding H/W	Output H/W	Output C
1	224	3	3	2	0	112	32
2	112	32	3	1	1	112	32
3	112	32	1	1	0	112	16
4	112	16	1	1	0	112	96
5	112	96	3	2	0	56	96
6	56	96	1	1	0	56	24
7	56	24	1	1	0	56	144
8	56	144	3	1	1	56	144
9	56	144	1	1	0	56	24
10	56	24	1	1	0	56	144
11	56	144	3	2	0	28	144
12	28	144	1	1	0	28	32
13	28	32	1	1	0	28	192
14	28	192	3	1	1	28	192
15	28	192	1	1	0	28	32
16	28	32	1	1	0	28	192
17	28	192	3	1	1	28	192
18	28	192	1	1	0	28	32
19	28	32	1	1	0	28	192
20	28	192	3	2	0	14	192
21	14	192	1	1	0	14	64
22	14	64	1	1	0	14	384
23	14	384	3	1	1	14	384
24	14	384	1	1	0	14	64
25	14	64	1	1	0	14	384
26	14	384	3	1	1	14	384
27	14	384	1	1	0	14	64
28	14	64	1	1	0	14	384
29	14	384	3	1	1	14	384
30	14	384	1	1	0	14	64
31	14	64	1	1	0	14	384
32	14	384	3	1	1	14	384
33	14	384	1	1	0	14	96
34	14	96	1	1	0	14	576
35	14	576	3	1	1	14	576
36	14	576	1	1	0	14	96
37	14	96	1	1	0	14	576
38	14	576	3	1	1	14	576
39	14	576	1	1	0	14	96
40	14	96	1	1	0	14	576
41	14	576	3	2	0	7	576
42	7	576	1	1	0	7	160
43	7	160	1	1	0	7	960
44	7	960	3	1	1	7	960
45	7	960	1	1	0	7	160
46	7	160	1	1	0	7	960
47	7	960	3	1	1	7	960
48	7	960	1	1	0	7	160
49	7	160	1	1	0	7	960
50	7	960	3	1	1	7	960
51	7	960	1	1	0	7	320
52	7	320	1	1	0	7	1280
53	1	1280	1	1	0	1	1001

The parameters of each Convolution layer in order are:

Input Height and width
Input Channel
Kernel Height and Width
Stride Height/ Width
Padding Height/ Width
Output Height/ Width
Output Channel

With this, you have the complete idea about the architecture of MobileNetV2 model. Enjoy.