[CNN] XAI 구현하기

회사에서 프로젝트를 진행하면서, XAI(Explanable AI)의 도움이 필요한 일이 생겼다.

GRAD-CAM, CAM 등 다양한 XAI 방법이 존재하고, 실제로 구현된 코드를 활용해 본 적도 있으나,

아키텍처가 베이스 아키텍처에서 많이 벗어나면서 활용에 문제가 발생했다.

이는 근본적으로 XAI의 알고리즘과 구동 원리를 명확히 알지 못함에서 비롯된 것이며,

이에 따라 구동 원리를 명확히 파악하고자 각각의 XAI 알고리즘의 유래와 원리, 구현까지 수행해보려 한다.

목적은 딥러닝 모델이 분류 문제 해결 시, 어떤 위치(ROI)에 주안점을 두고 문제를 해결하는지를 파악하여,

학습이 잘 되었는지, 왜 잘 안되었는지를 확인하고자 하는 것이다.

가장 처음 등장한 XAI인, CAM부터 알아보도록 하겠다.

CAM 이란?

CAM은 Class Activation Map의 약자로, 2016년도에 공개된 논문에 의해 처음 제시되었다.

논문 분석은 원리 이해의 가장 기본이자 핵심이므로, 논문 분석을 통해 원리를 알아보겠다.

CAM 논문 리뷰

논문 정보

논문명: Learning Deep Features for Discriminative Localization (직역: 차별적 지역화를 위한 심층적 특성 학습)
저자: Bolei Zhou 외 3인 (MIT)
공개년도: 2016
공개출처: CVPR(Computer Vision and Pattern Recognition, 국제 컴퓨터 비전 및 패턴인식 학술대회)

* 참고: CVPR은 국제전기전자공학회(IEEE)와 국제컴퓨터비전재단(CVF)이 1983년부터 공동 주최하는 대표적인 AI 국제 학술대회. 유럽컴퓨터비전학술대회(ECCV), 국제컴퓨터비전학술대회(ICCV)와 더불어 컴퓨터비전 분야 3대 학술대회로 꼽힘.(출처: http://www.aitimes.com/news/articleView.html?idxno=139176)

Abstract

In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network to have remarkable localization ability despite being trained on image-level labels.

GAP(Global Average Pooling) 레이어에 대해 재검토했으며, CNN이 이미지 수준의 레이블을 학습함에도 불구하고 Localization(직역: 현지화, 위치를 찾는 문제)에서 놀라운 성능을 내는 이유에 대해 초점을 두었음.

While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that can be applied to a variety of tasks.

이 기술은 이전에 훈련을 정규화하기 위한 수단으로 제안되었지만,
저자들은 이 기술이 실제로 다양한 작업에 적용 가능한 일반적인 지역화 가능한 심층 표현을 구축한다는 것을 발견함.

Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014, which is re- markably close to the 34.2% top-5 error achieved by a fully supervised CNN approach.

GAP는 명백히 단순함에도 불구하고, ILSVRC 2014의 객체 localization 문제에서 37.1% top-5 오류를 달성하기까지 함.

We demonstrate that our net-work is able to localize the discriminative image regions on a variety of tasks despite not being trained for them.

네트워크가 차별적 이미지 영역을 훈련받지 않더라도, 다양한 작업에서 현지화(Localization)할 수 있음을 보여준다고 함.

핵심 내용

[Introduction]

Despite having this remarkable ability to localize objects in the convolutional layers, this ability is lost when fully-connected layers are used for classification.

Convolutional Layer는 객체의 위치를 잘 찾아내지만,
분류 문제를 해결하기 위해 사용되는 Fully-connected layer에서 이 능력을 모두 소실한다고 함.

In order to achieve this, [13] uses global average pool- ing which acts as a structural regularizer, preventing over- fitting during training. In our experiments, we found that the advantages of this global average pooling layer extend beyond simply acting as a regularizer - In fact, with a little tweaking, the network can retain its remarkable localization ability until the final layer. This tweaking allows identifying easily the discriminative image regions in a single forward- pass for a wide variety of tasks, even those that the network was not originally trained for.

GAP는 위 문제를 해결하기 위해 제안된 개념으로,
Fully-connected layer에서 위치 정보가 제거되고, 파라미터의 수가 급증하고, 오버피팅되는 여러가지 문제를 해결함.
즉, FC layer의 대신 최종 출력에서 사용 가능함.

GAP의 동작 원리[출처: https://gaussian37.github.io/dl-concept-global_average_pooling/]

[Class Activation Map]

In this section, we describe the procedure for generating class activation maps (CAM) using global average pooling (GAP) in CNNs. A class activation map for a particular category indicates the discriminative image regions used by the CNN to identify that category (e.g., Fig. 3). The procedure for generating these maps is illustrated in Fig. 2.

CAM의 개념에 대해 설명

CAM(class activation map)은 CNN에서 사용되는 특정 카테고리를 인식하기 위해 차별적 이미지 위치를 나타내는 것임.

Given this simple connectivity structure, we can identify the importance of the image regions by projecting back the weights of the output layer on to the convolutional feature maps, a technique we call class activation mapping.

CAM을 통해 출력층의 가중치와 연산되는 이미지의 부위의 중요성을 나타낼 수 있다고 함.

As illustrated in Fig. 2, global average pooling outputs the spatial average of the feature map of each unit at the last convolutional layer. A weighted sum of these values is used to generate the final output.

GAP는 마지막 컨볼루션층의 각 유닛의 특징맵의 지역(채널) 평균을 산출함.

이 값들의 가중 평균이 최종 결과를 생성하는데에 사용됨.

Similarly, we compute a weighted sum of the feature maps of the last convolutional layer to obtain our class activation maps. We describe this more formally below for the case of softmax. The same technique can be applied to regression and other losses.

CAM을 얻을 때에도 이처럼, 마지막 컨볼루션층의 특징맵의 가중합을 계산함.
이 식을 소프트맥스(softmax, 다중클래스 예측 문제)의 케이스로 아래에서 설명하려 함.

수식은 글로 옮겨적기가 어려워서, 추후에 정리하여 업로드할 예정

Mc(x, y) directly indicates the importance of the activation at spatial grid (x, y) leading to the classification of an image to class c.

결론적으로, Mc(x, y)는 이미지에서 클래스 C를 분류하도록 이끄는 데에 (x, y) 위치의 중요성을 의미함.

추가로 얻은 정보로는 GAP와 GMP의 기능적 차이가 있음.

Global average pooling (GAP) vs global max pooling (GMP)

GAP loss encourages the net- work to identify the extent of the object as compared to GMP which encourages it to identify just one discrimina- tive part. This is because, when doing the average of a map, the value can be maximized by finding all discriminative parts of an object as all low activations reduce the output of the particular map. On the other hand, for GMP, low scores for all image regions except the most discriminative one do not impact the score as you just perform a max. We verify this experimentally on ILSVRC dataset in Sec. 3: while GMP achieves similar classification performance as GAP, GAP outperforms GMP for localization.

위의 내용을 요약하면,

GAP는 객체의 크기 등을 인식하는 데에 도움을 주는 반면, GMP는 단 하나의 차별적인 부위를 인식하는 데에 도움을 줌.

이 때문에 GAP는 객체를 구별가능한 부분을 최대한 모두 찾고, GMP는 낮은 점수는 모두 무시되는 경향이 있음.

ILSVRC 데이터 셋을 통해서 GMP와 GAP는 유사한 분류 성능을 보이나,

Localication에서는 GAP가 더 뛰어난 성능을 나타내는 것을 확인함.

CAM 코드 리뷰 및 적용

아래 코드 적용해보기

https://github.com/zhoubolei/CAM

BE HELPED

'인공지능 > ML, DL' 카테고리의 다른 글

[NLP] Sentence-transformer를 활용한 문장 임베딩 (0)	2023.04.28
[ChatGPT] Python ChatGPT API 사용 방법 (한국어 예시) (0)	2023.04.27
[tensorflow] 모델 로드 시 컴파일 warning 해결 (No training configuration found in the save file, so the model was not compiled.) (0)	2022.12.19
[tensorflow] tensorflow 2.9 (2.9.1) 버전에서 CPU만 사용하기 (GPU 사용 안함) (0)	2022.11.23
[ML, DL] Baseline Accuracy란? (기준 모델, 기준 정확도) (0)	2022.08.30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

완벽하지 않은 완벽주의자

[CNN] XAI 구현하기 - (1) CAM

CAM 이란?