矩阵向量乘法GEMV: $O_i = A_{i,k} \circ B_k$
# 将矩阵A的每一行与向量B做内积运算,得到输出向量O
A = [[1,2,3],
[4,5,6]]
B = [0.1,0.2,0.3]
# 输出 O:
# O[0] = 1*0.1 + 2*0.2 + 3*0.3 = 1.4
# O[1] = 4*0.1 + 5*0.2 + 6*0.3 = 3.2
O = [1.4,3.2]
矩阵乘法GEMM: $O_{i,j} = A_{i,k} \circ B_{k,j}$
# 两个矩阵相乘,输出矩阵O的每个元素是A的行与B的列的内积
A = [[1,2],
[3,4]]
B = [[0.1,0.2],
[0.3,0.4]]
# 输出 O:
# O[0,0] = 1*0.1 + 2*0.3 = 0.7
# O[0,1] = 1*0.2 + 2*0.4 = 1.0
# O[1,0] = 3*0.1 + 4*0.3 = 1.5
# O[1,1] = 3*0.2 + 4*0.4 = 2.2
O = [[0.7,1.0],
[1.5,2.2]]
双线性变换Bilinear: $O_{i,j} = A_{i,k} \circ B_{j,k,l} \circ C_{i,l}$
# 涉及三个张量的乘法运算,可以看作是两次矩阵乘法的组合
一维卷积1D convolution: $O_{b,k,j} = I_{b,rc,i+rx} \circ W_{k,rc,rx}$
# 沿着一个维度滑动卷积核进行卷积操作
I = [1,2,3,4,5] # 输入序列(batch=1, channel=1)
W = [0.1,0.2,0.3] # 卷积核(kernel_size=3)
# 输出(stride=1, padding=0):
# O[0] = 1*0.1 + 2*0.2 + 3*0.3 = 1.4
# O[1] = 2*0.1 + 3*0.2 + 4*0.3 = 2.0
# O[2] = 3*0.1 + 4*0.2 + 5*0.3 = 2.6
O = [1.4,2.0,2.6]
一维反卷积Transposed 1D convolution: $O_{b,k,i} = I_{b,rc,i+rx} \circ W_{rc,k,L-rx-1}$
# 通过填充和转置的方式实现上采样
二维卷积2D convolution: $O_{b,k,i,j} = I_{b,rc,i+rx,j+ry} \circ W_{k,rc,rx,ry}$
# 在高度和宽度两个维度上滑动卷积核
I = [[1,2,3], # 输入图像(batch=1,channel=1)
[4,5,6],
[7,8,9]]
W = [[0.1,0.2], # 卷积核(2*2)
[0.3,0.4]]
# 输出(stride=1, padding=0)
# O[0,0] = 1*0.1 + 2*0.2 + 4*0.3 + 5*0.4 = 3.0
# O[0,1] = 2*0.1 + 3*0.2 + 5*0.3 + 6*0.4 = 4.7
# O[1,0] = 4*0.1 + 5*0.2 + 7*0.3 + 8*0.4 = 6.7
# O[1,1] = 5*0.1 + 6*0.2 + 8*0.3 + 9*0.4 = 8.2
O = [[3.0,4.7],
[6.7,8.2]]
二维反卷积Transposed 2D convolution: $O_{b,k,i,j} = I_{b,rc,i+rx,j+ry} \circ W_{rc,k,X-rx-1,Y-ry-1}$
# 二维的上采样
三维卷积3D convolution: $O_{b,k,d,i,j} = I_{b,rc,d+rd,i+rx,j+ry} \circ W_{k,rc,rd,rx,ry}$
# 在深度、高度、宽度三个维度上滑动卷积核
三维反卷积Transposed 3D convolution: $O_{b,k,d,i,j} = I_{b,rc,d+rd,i+rx,j+ry} \circ W_{rc,k,D-rd-1,X-rx-1,Y-ry-1}$
# 三维的上采样操作
分组卷积Group convolution:$O^{g}{b,k,i,j} = I^{g}{b,rc,i+rx,j+ry} \circ W^{g}_{k,rc,rx,ry}$
# 将输入通道分组,每组独立进行卷积操作
I = [ # 输入(batch=1, channel=4) 分成2组
group1: [[1,2,3], [4,5,6]], # 前两个通道
group2: [[7,8,9], [10,11,12]] # 后两个通道
]
W = [ # 每组使用独立的卷积核
group1: [[0.1,0.2], [0.3,0.4]],
group2: [[0.5,0.6], [0.7,0.8]]
]
深度可分离卷积Depthwise convolution: $O_{b,k,i,j} = I_{b,c,i+rx,j+ry} \circ W^{c}_{k,rx,ry}$
# 每个输入通道使用独立的卷积核
I = [ # 输入(batch=1, channel=2)
channel1: [[1,2,3], [4,5,6]],
channel2: [[7,8,9], [10,11,12]]
]
W = [ # 每个通道独立的卷积核
channel1: [[0.1,0.2], [0.3,0.4]],
channel2: [[0.5,0.6], [0.7,0.8]]
]
空洞卷积Dilated convolution: $O_{b,k,i,j} = I_{b,rc,i+rx \times dx, j+ry \times dy} \circ W_{k,rc,rx,ry}$
# 在卷积核元素之间插入空洞
I = [[1,2,3,4], # 输入
[5,6,7,8],
[9,10,11,12],
[13,14,15,16]]
W = [[0.1,0.2], # 2*2卷积核,膨胀率=2
[0.3,0.4]]
# 实际感受野为3*3,中间有空洞
# O[0,0] = 1*0.1 + 3*0.2 + 9*0.3 + 11*0.4
Transposed Convolution(反卷积/上采样)
I = [[1,2], # 输入2*2
[3,4]]
W = [[0.1,0.2], # 2*2卷积核
[0.3,0.4]]
# 输出4*4 (通过在输入间插入0,然后做普通卷积)
# 先变成
# [[1,0,2,0],
# [0,0,0,0],
# [3,0,4,0],
# [0,0,0,0]]
- 分组卷积和深度可分离卷积主要用于模型压缩
- 空洞卷积常用于语义分割任务
- 反卷积则常用于图像生成和超分辨率任务