Yolo 논문 정리 및 Pytorch 코드 구현, 분석 02 ( You Only Look Once: Unified, Real-Time Object Detection )

투명한 기부를 하고싶다면 이 링크로 와보세요! 🥰 (클릭!)

바이낸스(₿) 수수료 평생 20% 할인받는 링크로 가입하기! 🔥 (클릭!)

2019/01/31 - [Programmer Jinyo/Machine Learning] - Yolo 논문 정리 및 Pytorch 코드 구현, 분석 01 ( You Only Look Once: Unified, Real-Time Object Detection )

이 포스트는 위 포스트에서 이어지는 글이다.

이제 본격적으로 darknet 의 class를 정의 해 보자.

이전 글에서 언급했듯, 우리는 nn.Module class를 이용하여 PyTorch에서 custom architectures를 만든다. darknet.py에 아래의 class를 추가하자.

class Darknet(nn.Module):
    def __init__(self, cfgfile):
        super(Darknet, self).__init__()
        self.blocks = parse_cfg(cfgfile)
        self.net_info, self.module_list = create_modules(self.blocks)

우리는 darknet class의 blocks와 net_info, module_list들을 초기화 하였다.

저번에 create_modules를 실행한 결과를 잠시 가져오자면, net_info는 다음과 같았고

{'type': 'net', 'batch': '64', 'subdivisions': '16', 'width': '608', 'height': '608', 'channels': '3', 'momentum': '0.9', 'decay': '0.0005', 'angle': '0', 'saturation': '1.5', 'exposure': '1.5', 'hue': '.1', 'learning_rate': '0.001', 'burn_in': '1000', 'max_batches': '500200', 'policy': 'steps', 'steps': '400000,450000', 'scales': '.1,.1'}

module_list는 레이어들이 들어있는 ModuleList( ... ) 들이었다.

network의 forward 만들기

이제 forward 함수 부분을 만들자.

이 함수는 nn.Module 의 오버라이딩 함수이다.

forward는 해당 layer의 입력을 받아서 출력할 결과를 계산한 후 return 해 주는 목적을 가지고 있다.

    def forward(self, x, CUDA):
        modules = self.blocks[1:] # 0은 net 이름
        outputs = {} #We cache the outputs for the route layer

forward는 3개 input을 받는다.

self, x, CUDA 가 그것인데, self는 파이썬에서 항상 받는 그것이고 x는 앞 레이어로부터의 인풋, CUDA는 true면 GPU를 쓰는거고 false면 GPU를 안 쓰는것이다.

이제 우리는 self.block의 네트워크의 정보가 들어있는 첫 'net block'을 제외하고 나머지 전부 block을 순차적으로 돌면서 실행 할 것이다.

route와 shortcut layer은 이전 layer의 output이 필요하므로 outputs 에 저장해주도록 한다. key는 layer의 인덱스이고 value는 feature maps로 설정할 것이다.

module_list 로부터 순서대로 appended되었기 때문에 우리는 modules를 순서대로 실행해주도록 한다.

write = 0     #This is explained a bit later
for i, module in enumerate(modules):        
    module_type = (module["type"])

*write 라는 변수는 추후 설명하도록 하겠다.

Convolutional and Upsample Layers

만약 모듈이 conv layer이라거나 upsample module일 경우에는 이미 구현 해 놓았기 때문에 그냥 forward pass 해 주면 된다.

        if module_type == "convolutional" or module_type == "upsample":
            x = self.module_list[i](x)

Route Layer / Shortcut Layer

route layer의 코드는 이전 글에서 설명했듯, 숫자가 하나인 경우와 둘인 경우를 나누어 처리해야 한다. 두개의 feature maps를 이어붙여야 할 때는 우리는 torch.cat 함수를 사용하도록 한다. ( 두 번째 argument에 1을 주는데, 이는 몇 번째 dimension을 합칠것이냐는 명령이고 pytorch의 conv layer 차원 format이 B C H W 이기 때문이다. 우리는 Channel dimension을 이어붙이고 싶은 상태이므로 1을 준다. 0을 준다면 Batch dimension이 붙어 2배의 batch size가 되어버릴 것이다.)

        elif module_type == "route":
            layers = module["layers"]
            layers = [int(a) for a in layers]

            if (layers[0]) > 0:
                layers[0] = layers[0] - i

            if len(layers) == 1:
                x = outputs[i + (layers[0])]

            else:
                if (layers[1]) > 0:
                    layers[1] = layers[1] - i

                map1 = outputs[i + layers[0]]
                map2 = outputs[i + layers[1]]

                x = torch.cat((map1, map2), 1)

        elif  module_type == "shortcut":
            from_ = int(module["from"])
            x = outputs[i-1] + outputs[i+from_]

shortcut의 경우 이전 레이어와 합쳐주면 되므로 위와같이 구현해 놓았다.

YOLO (Detection Layer)

Yolo의 output은 feature map의 depth에 (혹은 dimension에) bounding box 속성들이 있는 feature map이다. Cell에 의해 predict된 attributes는 쭉 이어붙인 상태로 되어있다. 그러므로 예를 들어 (5,6) Cell 의 2번째 bounding box를 참조하고 싶다면

map[ 5, 6, (2-1) * (5+C) : 2 * (5+C) ] 의 인덱스를 참조해야 한다. 이전 글에서 언급했지만, 여기의 C는 class 개수이고 앞의 5는 xyhw와 confidence이다. 이 형식은 output을 처리하기가 굉장히 불편하다.

이 것들이 나누어진 tensor이 아닌 single tensor에 들어가 있게 하기 위해서, predict_transform 이라는 함수를 만들자.

Transforming Output

predict_transform 함수는 util.py 에 작성 할 것이며, Darknet의 forward 시점에 사용할 것이다.

util.py의 맨 위쪽에 import를 진행하자.

from __future__ import division

import torch 
import torch.nn as nn
import torch.nn.functional as F 
from torch.autograd import Variable
import numpy as np
import cv2

predict_transform 은 5개의 파라미터를 가진다. prediction(우리의 최종 결과물), inp_dim (입력 이미지의 차원) , anchors , num_classes , CUDA flag(GPU 쓸 것인지) 등이다.

def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True):

predict_transform 함수는 결과 feature map을 받아와서 2d tensor로 바꾸는데, 각 줄은 하나의 bounding box를 나타낸다.

그 순서는 아래 사진과 같다.

변형하는 코드는 다음과 같다

    batch_size = prediction.size(0)
    stride =  inp_dim // prediction.size(2)
    grid_size = inp_dim // stride
    bbox_attrs = 5 + num_classes
    num_anchors = len(anchors)
    
    prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size)
    prediction = prediction.transpose(1,2).contiguous()
    prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)

anchor의 dimension은 net block의 height와 width속성과 대응된다.

이 값은 원본 이미지의 크기에 비례한 값이므로, stride의 영향을 받아 작아진 output의 값에 비례하게 줄여줘야 한다.

    anchors = [(a[0]/stride, a[1]/stride) for a in anchors]

이제 output을 변형시킬 차례이다.

output을 다음과 같이 변형시킨다.

b_x,y,w,h 는 각각 바운딩 박스의 중심 x,y / 크기 w,h 이다.

그리고 또한 confidence도 sigmoid를 통과시켜 준다.

    #Sigmoid the  centre_X, centre_Y. and object confidencce
    prediction[:,:,0] = torch.sigmoid(prediction[:,:,0])
    prediction[:,:,1] = torch.sigmoid(prediction[:,:,1])
    prediction[:,:,4] = torch.sigmoid(prediction[:,:,4])

* 시간이 너무 부족해져 글을 완성하는것은 나중으로 미루려고 한다 ㅜㅜ.. 시간이 날 때 마다 차차 글을 완성 해 나갈 것이지만 혹시나 만약을 위해 이어서 읽을 링크를 남긴다.

이 글은 아래의 링크 중간정도까지의 내용을 다루고 있으며 그 후로 읽고자 한다면 읽어나가면 충분히 이어지는 맥락으로 글을 읽을 수 있을 것이다.

다만 이 글은 weight를 로딩해서 사용하고 있기 때문에 실제로 학습을 시키는 부분까지는 진행할 수 없다. (직접 짜야한다)

https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch-part-3/

그래서 찾았는데, 이 아래 링크는 비슷한 방법으로 train / test 코드까지 갖추어 놓았다. 참고하면 좋을 것 같다.

https://github.com/eriklindernoren/PyTorch-YOLOv3

* 다음 글에서 위 깃허브의

2019/02/13 - [Programmer Jinyo/Machine Learning] - Yolo V3 / Pytorch로 자동차 번호판 라벨링 / object detection 해보기.

활용 편을 다루어 보았다. (....)

저작자표시 비영리 변경금지

Sint의 뇌

Yolo 논문 정리 및 Pytorch 코드 구현, 분석 02 ( You Only Look Once: Unified, Real-Time Object Detection )

티스토리툴바

Yolo 논문 정리 및 Pytorch 코드 구현, 분석 02 ( You Only Look Once: Unified, Real-Time Object Detection )

'Programmer Jinyo/Machine Learning' Related Articles

티스토리툴바