猫狗大战

通过来自 kaggle 上的猫狗数据集,训练一个识别猫狗图片的分类器。设计一个使用 ResNet18作为主干的卷积神经网络在迁移网络时采用固定值模式,猫狗大战数据集训练集有 25000张,猫狗各占一半。测试集 12500 张。

导包

1
2
3
4
5
6
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, TensorDataset
from torchvision import transforms
from torchvision import datasets
from torchvision import models

数据集处理

将图片分类放到文件夹中以满足 ImageFolder 读取

1
2
mv dog.* dog
mv cat.* cat

将25000张训练集图片划分成 20000 张训练集和 5000 张测试集,分别放入 train 和 test 文件夹中并读取

1
2
3
4
train_data = datasets.ImageFolder(root='train', transform=transforms.Compose([transforms.Resize((224, 224)), transforms.ToTensor()]))
test_data = datasets.ImageFolder(root='test', transform=transforms.Compose([transforms.Resize((224, 224)), transforms.ToTensor()]))
train_loader = DataLoader(dataset=train_data, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test_data, batch_size=64, shuffle=True)

Resnet18 网络

采用固定值迁移 Resnet18 网络

  1. param.requires_grad = False:遍历模型的所有参数,并将它们的 requires_grad 属性设置为 False,这样在训练过程中这些参数就不会被修改,实现了参数的冻结。这是因为在微调的过程中,我们一般希望保持预训练的特征提取部分不变。
  2. features = net.fc.in_features:获取全连接层(通常是分类器)的输入特征数。这是因为我们将要替换全连接层,新的全连接层的输入尺寸需要匹配当前模型的输出特征数。
  3. net.fc = nn.Linear(features, 2):将全连接层替换为一个新的线性层,输出维度为 2,解决二分类问题。
1
2
3
4
5
6
7
8
9
10
11
12
net = models.resnet18(pretrained=True)
for param in net.parameters():
param.requires_grad = False
features = net.fc.in_features
net.fc = nn.Linear(features, 2)

if torch.cuda.is_available():
net = net.cuda()
net.fc = net.fc.cuda()

loss_fn = nn.CrossEntropyLoss()
opt = torch.optim.SGD(net.fc.parameters(), lr=0.001, momentum=0.9)

训练

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
for epoch in range(20):
for i, data in enumerate(train_loader):
x, y = data
if torch.cuda.is_available():
x = x.cuda()
y = y.cuda()
pred = net(x)
loss = loss_fn(pred, y)

opt.zero_grad()
loss.backward()
opt.step()

if epoch % 2 == 0:
print(epoch, loss.item())

torch.save(net, 'catvsdog_model.pth')

测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def rightness(predictions, labels):
pred = torch.max(predictions.data, 1)[1]
rights = pred.eq(labels.data.view_as(pred)).sum()
return rights, len(labels)

rights = 0
length = 0
for i, data in enumerate(test_loader):
x, y = data
x = x.cuda()
y = y.cuda()
net.eval()
pred = net(x)
right = pred.argmax(dim=1) == y
rights += rightness(pred, y)[0]
length += rightness(pred, y)[1]

print(rights, length, rights/length)

损失输出:

1
2
3
4
5
6
7
8
9
10
0 0.2833782434463501
2 0.12724365293979645
4 0.05407297983765602
6 0.027817511931061745
8 0.17078480124473572
10 0.017018912360072136
12 0.0462612621486187
14 0.039736609905958176
16 0.20226672291755676
18 0.08775252848863602

准确率:

1
tensor(4897, device='cuda:0') 5000 tensor(0.9794, device='cuda:0')