YOLO

๋ถ„์•ผ
Object Detection
๋ฆฌ๋ทฐ ๋‚ ์งœ
2021/03/20
๋ณธ ํฌ์ŠคํŠธ๋Š” ์ œ๊ฐ€ ํœด๋จผ์Šค์ผ€์ดํ”„ ๊ธฐ์ˆ  ๋ธ”๋กœ๊ทธ์— ๋จผ์ € ์ž‘์„ฑํ•˜๊ณ  ์˜ฎ๊ธด ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค.
๋ณธ ํฌ์ŠคํŠธ์—์„œ๋Š” Object Detection ์ด ํ•„์š”ํ•  ๋•Œ ๋งŽ์ด ์“ฐ์ด๋Š” ํˆด์ด๊ธฐ๋„ํ•œ, YOLO ์— ๋Œ€ํ•ด์„œ ๋ฆฌ๋ทฐํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. YOLO ์—๋Š” ์ง€๊ธˆ๊นŒ์ง€ ์ด 4๊ฐ€์ง€ ๋ฒ„์ „์ด ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ์š”, ๊ทธ ์ค‘ ์ œ์ผ ์ฒ˜์Œ ๋“ฑ์žฅํ•œ yolov1 ์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์นœ๊ตฌ๋ฅผ ์‚ดํŽด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
๋ฆฌ๋ทฐํ•˜๋ ค๋Š” ๋…ผ๋ฌธ์˜ ์ œ๋ชฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
โ€œYou Only Look Once: Unified, Real-Time Object Detectionโ€
๋…ผ๋ฌธ์— ๋Œ€ํ•œ ๋‚ด์šฉ์„ ์ง์ ‘ ๋ณด์‹œ๊ณ  ์‹ถ์œผ์‹  ๋ถ„์€ย ์ด๊ณณ์„ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ์ข‹์Šต๋‹ˆ๋‹ค.

Objective

๋…ผ๋ฌธ์˜ ๋ฐฐ๊ฒฝ์€ ์กด์žฌํ•˜๋˜ Object Detection System ์˜ ํƒ์ง€ ์†๋„๊ฐ€ ํ˜„์ €ํžˆ ๋Š๋ ธ๋˜ ๊ฒƒ์—์„œ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
์•„โ€ฆ Objective Detection ์€ ๋˜ ๋ญ˜๊นŒ? ํ•˜์‹œ๋Š” ๋ถ„๋“ค์„ ์œ„ํ•ด ์ต์ˆ™ํ•˜๋ฉด์„œ๋„ ํฅ๋ฏธ๋กœ์šด ์‚ฌ์ง„ ํ•œ ๊ฐ€์ง€๋ฅผ ๋ณด์—ฌ๋“œ๋ฆฌ๊ณ  ์‹œ์ž‘ํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

Object Detection

Object Detection ์€ย ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ํŠน์ • object ๋ฅผ ์ฐพ์•„์ฃผ๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค. ์œ„ ๊ทธ๋ฆผ์—์„œ๋Š” ๊ฐœ, ์ž์ „๊ฑฐ, ์ž๋™์ฐจ๋ฅผ ์ด๋ฏธ์ง€ ์†์—์„œ ์ฐพ์•„์„œ ๊ทธ๊ฒƒ๋“ค์„ ๊ฐ์‹ธ์ฃผ๋Š” ๊ฐ€์žฅ ์ž‘์€ bounding box ๋ฅผ ํ‘œ์‹œํ•ด ์ค€ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋‹น์—ฐํ•˜๊ฒŒ๋„, ์ด๋ฏธ์ง€์˜ ์—ฐ์†์ธ ๋™์˜์ƒ์—์„œ๋„ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.
๊ทธ๋Ÿฌ๋ฉด ์ด๋Ÿฌํ•œ ๊ธฐ์ˆ ์ด ๊ธฐ์กด์— ๋Š๋ ธ๋˜ ์ด์œ ๋Š” ๋ฌด์—‡์ผ๊นŒ์š”??
๋…ผ๋ฌธ์—์„œ๋Š” ๊ทธ ์ด์œ ๋ฅผ ํฌ๊ฒŒ ๊ธฐ์กด์˜ Object Detection System ์ดย classifying ์„ ์žฌ๊ตฌ์„ฑํ•ด์„œ ๊ตฌํ˜„ํ•œ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ๋งํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
๊ฐ‘์ž๊ธฐ classifying ์ด ์™œ ๋‚˜์™€????!!
๋ผ๊ณ  ์ƒ๊ฐํ•˜์‹œ๋Š” ๋ถ„๋“ค์„ ์œ„ํ•ด์„œ, ๊ฐ„๋‹จํžˆ ๋ถ€์—ฐ์„ค๋ช…์„ ๋“œ๋ฆฌ์ž๋ฉด Object Detection ์€ 1. ๋ฌผ์ฒด์˜ ์กด์žฌ ์—ฌ๋ถ€์™€ 2. ๊ทธ ๋ฌผ์ฒด๊ฐ€ ๋ฌด์—‡์ธ์ง€์˜ ํŒ๋‹จ ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. 2 ๋ฒˆ์˜ ๊ณผ์ •์„ ์œ„ํ•ด์„œย ๋‹จ๋…์ ์ธ classifying process ๊ฐ€ ์žˆ์—ˆ๋˜ ๊ฒƒ์ด๋ผ๊ณ  ๋ณด์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค. ์ค€๋น„ํ•œ ๋ผ๋ฒจ๋“ค ์ค‘ ๊ฐ€์žฅ ์ ํ•ฉํ•œ ์นœ๊ตฌ๋ฅผ ๋ฝ‘์•„๋‚ด๋Š” ๋ถ„๋ฅ˜ ๊ณผ์ •์ด ์žˆ์—ˆ๋˜ ๊ฒƒ์ด์ฃ .
๋‹ค์‹œ ๋ณธ๋ก ์œผ๋กœ ๋Œ์•„๊ฐ€์„œ, ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด์˜ Object Detection System ์˜ ๋‘ ๊ฐ€์ง€ ๊ธฐ์กด ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์˜ˆ์‹œ๋ฅผ ๋“ญ๋‹ˆ๋‹ค.
์ฒซ ๋ฒˆ์งธ ์˜ˆ์‹œ๋กœ, DPM (Deformable Parts Model) ์˜ ๊ฒฝ์šฐ, sliding-window ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•ด์„œ ์ด๋ฏธ์ง€์˜ ๋‹ค์–‘ํ•œ location ๊ณผ scale ์†์—์„œ classifying ์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.
๋‘ ๋ฒˆ์งธ ์˜ˆ์‹œ๋กœ, R-CNN ์˜ ๊ฒฝ์šฐ, region proposal methods ๋ฅผ ์‚ฌ์šฉํ•ด ์ด๋ฏธ์ง€ ์†์—์„œ potential bounding box ๋ฅผ ์ƒ์„ฑํ•œ ์ดํ›„์— ์ด box ์—์„œ classifying ์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.
๋‘ ์˜ˆ์‹œ์—์„œ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ๋Š” ๊ณตํ†ต์ ์ด์ž, ๋…ผ๋ฌธ์—์„œ ์ œ๊ธฐํ•˜๋Š” ๋ฌธ์ œ์ ์€ classifying ์ž์ฒด๋งŒ์œผ๋กœ๋„ ํ•™์Šต์ด ํ•„์š”ํ•œ ์š”์†Œ์ธ๋ฐ, ์ด ๊ฒƒ์„ ์ง‘์ค‘ํ•˜๊ณ ์ž ํ•˜๋Š” ์˜์—ญ์„ sliding-window ๊ธฐ๋ฒ•์œผ๋กœ ๋‹ค์–‘ํ™”ํ•˜๋ฉด์„œ ๊ณ„์† ํ™•์ธํ•˜๋ ค๊ณ  ํ•˜๊ณ , ์ˆ˜ ๋งŽ์€ bounding box ์— ๋Œ€ํ•ด์„œ ์ ์šฉํ•˜๋ ค๊ณ  ํ•˜๋‹ˆ ๋‹น์—ฐํžˆ ๋Š๋ฆด ์ˆ˜ ๋ฐ–์— ์—†์—ˆ๋˜ ์ ์ž…๋‹ˆ๋‹ค.
YOLO ๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์— ์ง๋ฉดํ•˜์—ฌ ํ•œ ๊ฐ€์ง€ ์•„์ด๋””์–ด๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. Object Detection ๋ฌธ์ œ๋ฅผ ์•ž์„œ ๋งํ•œ ๋‘ ๋‹จ๊ณ„๊ฐ€ ์•„๋‹Œ,ย single regression problem ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜๋Š” ์—†์„๊นŒ์— ๋Œ€ํ•ด์„œ ๊ณ ๋ฏผํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์‚ฌ์‹ค ์ธ๊ฐ„์ด ์ด๋ฏธ์ง€์—์„œ ๋ฌผ์ฒด๋ฅผ ์ธ์‹ํ•  ๋•Œ ๋‘ ๋‹จ๊ณ„๋ฅผ ๊ฑฐ์น˜์ง„ ์•Š์ฃ . ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ์ ์—์„œ ๊ทธ๋ƒฅ ์ด๋ฏธ์ง€๋ฅผ ๋”ฑ ๋ณด๋ฉด ๊ทธ ์†์˜ ๋ฌผ์ฒด๊ฐ€ ๋”ฑ ๋‚˜์˜จ๋‹ค ๋ผ๋Š” ์˜๋ฏธ๋กœย โ€œYou Only Look Onceโ€ย ์˜ YOLO ๋กœ ๊ทธ๋“ค์˜ ๋ฐฉ๋ฒ•๋ก ์„ ๋ช…๋ช…ํ•ฉ๋‹ˆ๋‹ค.

Unified Detection

Unified Detection ์ด ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•œ ํ•˜๋‚˜์˜ ํšŒ๊ท€ ๋ฌธ์ œ๋กœ Object Detection ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
์ €๋Š” ์ด ๊ณผ์ •์ด ์ƒ๊ฐ๋ณด๋‹ค ๊ฐ„๋‹จํ•ด์„œ ๋†€๋ž์Šต๋‹ˆ๋‹ค. (์•„ ๋ฌผ๋ก โ€ฆ ๊ณผ์ •์„ โ€œ์„ค๋ช…ํ•˜๊ธฐ์—๋งŒ" ๊ฐ„๋‹จํ•œ ๊ฒƒ์ด์ง€ ์ด๊ฑฐ ๋ผ๋ฒจ๋ง ํ•˜๋ ค๋ฉด ์—„์ฒญ๋‚  ๊ฒƒ ๊ฐ™โ€ฆ๊ธดํ•ด์š”.) ๋”ฑ 4๊ฐ€์ง€ ํŠน์„ฑ๋“ค๋กœ ์„ค๋ช… ๋“œ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ฒซ ๋ฒˆ์งธ๋กœ, input ์ด๋ฏธ์ง€๋Š”ย S x S Grid ๊ฒฉ์ž ํ˜•ํƒœ๋กœ ๋™๋“ฑํ•˜๊ฒŒ ๋‚˜๋ˆ„์–ด ์ง‘๋‹ˆ๋‹ค.
๋‘ ๋ฒˆ์งธ๋กœ, ํ•˜๋‚˜์˜ Grid ๊ฒฉ์ž๋Š”ย B ๊ฐœ์˜ potential bounding box ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ โ€œ๊ฐ€์ง„๋‹คโ€ ๋Š” ํŠน์ • ์ง์‚ฌ๊ฐํ˜• bounding box ์˜ ์ค‘์•™ (์ง์‚ฌ๊ฐํ˜•์˜ ์ค‘์•™์ด ์–ด๋”˜์ง€๋Š” ๊ตณ์ด ์„ค๋ช…ํ•˜์ง€ ์•Š์„๊ฒŒ์š”.) ์ด ์–ด๋–ค Grid ์— ์†ํ•˜๋ฉด ๊ทธ Grid ๊ฐ€ bounding box ๋ฅผ ๊ฐ€์ง„๋‹ค๊ณ  ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” responsible ์ด๋ผ๋Š” ํ‘œํ˜„์„ ์“ฐ์ง€๋งŒ ๋„ˆ๋ฌด ์˜์–ด์ ์ธ ํ‘œํ˜„์ธ ๊ฒƒ ๊ฐ™๋„ค์š”.
์„ธ ๋ฒˆ์งธ๋กœ, ํ•˜๋‚˜์˜ potential bounding box ๋Š”ย x, y, w, h, c ์˜ 5 ๊ฐ€์ง€ ์š”์†Œ๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.x, y ๋Š” potential bounding box ์˜ ์ค‘์•™์„ ๋‚˜ํƒ€๋‚ด๋Š” ์š”์†Œ์ด๊ณ , w, h ๋Š” potential bounding box ์˜ ๋„ˆ๋น„์™€ ๋†’์ด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์š”์†Œ์ž…๋‹ˆ๋‹ค. c ๋Š” potential box ๊ฐ€ object ๋ฅผ ํฌํ•จํ•  ๊ฐ€๋Šฅ์„ฑ Pr(Object) ์™€ IOU (Intersection Over Union)์˜ ๊ณฑ ํ˜•ํƒœ๋กœ ๋‚˜ํƒ€๋‚ด์ง„ confidence ๋ผ๋Š” ์นœ๊ตฌ์ž…๋‹ˆ๋‹ค.
์•„, IOU ๊ฐ€ ๋ฌด์—‡์ธ๊ฐ€ ํ•˜๋ฉด,
์œ„ ๊ทธ๋ฆผ๊ฐ™์€ ์นœ๊ตฌ์ž…๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ์— ๋Œ€์ž…์‹œ์ผœ๋ณด์ž๋ฉด, ์‹ค์ œ ๋ฌผ์ฒด๊ฐ€ ์กด์žฌํ•˜๋Š” ์˜์—ญ์„ X, potential bounding box ์˜์—ญ์„ Y ๋ผ ํ•˜๋ฉด X ์™€ Y ์˜์—ญ์˜ ๊ต์ง‘ํ•ฉ ์˜์—ญ์˜ ๋„ˆ๋น„๋ฅผ ํ•ฉ์ง‘ํ•ฉ ์˜์—ญ์˜ ๋„ˆ๋น„๋กœ ๋‚˜๋ˆˆ ๊ฐ’์ž…๋‹ˆ๋‹ค. ์ •์„ฑ์ ์œผ๋กœ๋Š” ๋‘ ์˜์—ญ์ด ์–ผ๋งˆ๋‚˜ ๋น„์Šทํ•œ๊ฐ€๋ฅผ ์ธก์ •ํ•˜๋Š” ์š”์†Œ์ž…๋‹ˆ๋‹ค.
Prโก(Object)โˆ—IOUpredtruth\Pr(Object)*IOU_{pred}^{truth}
์ •๋ฆฌํ•˜์ž๋ฉด, confidence c ๋Š” ์œ„์˜ ์‹์œผ๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋„ค ๋ฒˆ์งธ๋กœ, ๊ฐ๊ฐ์˜ Grid ๋Š”ย C ๊ฐœ์˜ potential conditional probabilities Pr(Class_i | Object) ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. Object ๊ฐ€ ์กด์žฌํ•  ๋•Œ ๊ทธ๊ฒƒ์ด ์–ด๋–ค Object ์ผ ํ™•๋ฅ ์— ๋Œ€ํ•œ ์š”์†Œ์ธ ๊ฒƒ์ด์ฃ .
์ž, ์œ„์—์„œ ์ œ๊ฐ€ ๋ง์”€๋“œ๋ฆฐ ์š”์†Œ๋“ค์˜ ๊ฐœ์ˆ˜๋ฅผ ์„ธ์–ด๋ณผ๊นŒ์š”?
S x S ์˜ Grid ๊ฐ€ ์žˆ๊ณ , ๊ฐ Grid ๋Š” B ๊ฐœ์˜ potential bounding box ๋ฅผ ๊ฐ€์ง€๊ณ , ๊ฐ potential bounding box ๋Š” 5๊ฐ€์ง€ ์š”์†Œ x, y, w, h, c ๋ฅผ ๊ฐ€์ง€๊ณ , ๊ฐ Grid ๋Š” C ๊ฐœ์˜ potential conditional probabilities ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
S *S *(5*B+C) ๊ฐœ์˜ ์š”์†Œ๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒƒ์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๊ทธ๋ฆฌ๊ณ  ์ด ์š”์†Œ๋“ค์€ย ground truth ์™€์˜ ๋น„๊ต๋ฅผ ํ†ตํ•ด ํ•™์Šต์„ ์ง„ํ–‰ํ•ด์•ผ ํ•˜๋Š” ์š”์†Œ๋“ค์ด๊ธฐ ๋•Œ๋ฌธ์—, ์ตœ์ข…์ ์œผ๋กœ ์‚ฐ์ถœ ๋˜์–ด์•ผ ํ•˜๋Š” vector ์˜ ํ˜•ํƒœ์ž…๋‹ˆ๋‹ค. ๋งˆ์น˜ N ๊ฐœ์˜ Label ์„ ๊ฐ€์ง„ classifier ์—์„œ softmax layer ๋ฅผ ๋งˆ์ง€๋ง‰์œผ๋กœ ๋‘์–ด 1xN dimension vector ๋ฅผ ์‚ฐ์ถœํ•˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์€ ๋งฅ๋ฝ์ž…๋‹ˆ๋‹ค.
๊ฒฐ๊ตญ Unifed Detection ์—์„œ ์„ค๋ช…ํ•˜๋ ค๊ณ  ํ–ˆ๋˜ ๊ฒƒ์€ย input image ๋กœ ๋ถ€ํ„ฐ bounding box ๋“ค์„ ์ฐพ์•„๋‚ด๊ธฐ ์œ„ํ•ด์„œ ์„ธ๋ถ€์ ์œผ๋กœ ์ •์˜๋˜์–ด์•ผ ํ•˜๋Š” ์š”์†Œ๋“ค์— ๋Œ€ํ•ด ์ •์˜ํ•œ ๋ถ€๋ถ„์ด์—ˆ๋‹ค๊ณ  ๋ณด์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

Network Design

๋…ผ๋ฌธ์—์„œ๋Š” ์‚ฌ์šฉํ•œ ๋„คํŠธ์›Œํฌ์— ๋Œ€ํ•ด์„œ ๋งŽ์€ ๋ถ€๋ถ„์„ ํ• ๋‹นํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๊ทธ๋„ ๊ทธ๋Ÿด ๊ฒƒ์ด ๋…ผ๋ฌธ์˜ ์ฃผ์š” ์ดˆ์ ์ด Object Detection ์„ single regression problem ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋ก ์˜ ์ œ์‹œ์™€, ์ด๋ฅผ ์œ„ํ•œ loss function ์˜ ์žฌ์ •์˜ ์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
๋ณดํ†ต ์ด๋Ÿฐ ๋…ผ๋ฌธ๋“ค์€ ๋„คํŠธ์›Œํฌ ์ ์œผ๋กœ ์œ ์˜๋ฏธํ•œ ๋ณ€๊ฒฝ์„ ์ ์šฉํ•ด ํšจ์œจ ๊ฐœ์„ ์„ ์ง„ํ–‰ํ•œ ResNet ๋“ฑ๊ณผ๋Š” ์„ฑ๊ฒฉ ์ž์ฒด๊ฐ€ ๋‹ค๋ฅด๊ธฐ๋Š” ํ•˜์ง€๋งŒ, ๊ทธ๋Ÿผ์—๋„ ๊ธฐ์กด์˜ ๋„คํŠธ์›Œํฌ์— ํŠน๋ณ„ํ•œ ์žฅ์น˜๋ฅผ ํ•˜์—ฌ ๋…ผ๋ฌธ์—์„œ ์ดˆ์ ์„ ๋งž์ถ˜ ๋ฌธ์ œ ํ•ด๊ฒฐ์— ํŠนํ™”๋œ ์—ญํ• ์„ ๋ถ€์—ฌํ•˜๋“ฏ์ด ๊ฐ•์กฐ๋ฅผ ํ•œ ๋…ผ๋ฌธ๋“ค์€ ๋“œ๋ฌธ๋“œ๋ฌธ ์žˆ๋Š”๋ฐ YOLO ๋Š” ๊ทธ๋Ÿฐ ๊ฒƒ์กฐ์ฐจ ์–ธ๊ธ‰์ด ์—†์–ด์„œ ์•„์‰ฝ๊ธด ํ–ˆ์Šต๋‹ˆ๋‹ค.
GoogLeNet ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋„คํŠธ์›Œํฌ๋ฅผ ์ •์˜ํ–ˆ๊ณ , 24 ๊ฐœ์˜ convolution layers ์™€ 2 ๊ฐœ์˜ fully connected layers ๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. GoogLeNet ์—์„œ ์‚ฌ์šฉํ•œ inception module ๋Œ€์‹ ์— 1x1 reduction layers ์™€ 3x3 convolution layers ๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ ๋Š” ํ•˜๋Š”๋ฐ ๊ทธ ์ด์œ ์— ๋Œ€ํ•ด์„œ๋Š” ํฌ๊ฒŒ ์–ธ๊ธ‰์„ ํ•ด์ฃผ์ง„ ์•Š์Šต๋‹ˆ๋‹ค.
์—ฌ๋Ÿฌ๋ถ„๋“ค๋„ ๊ฐ„๋‹จํžˆ ๋ณด๊ณ ๋งŒ ๋„˜์–ด๊ฐ€๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Training

๋…ผ๋ฌธ์˜ Training ๋ถ€๋ถ„์€ ์„ค๋ช…์ด ์ดํ•ด๋Š” ์ž˜ ๋˜๋Š”๋ฐ ํ๋ฆ„์ด ์—†์Šต๋‹ˆ๋‹ค. ๋‚˜์—ด์‹์ธ๋ฐ๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๊ทธ ์ด์œ ๊ฐ€ ์ž์„ธํžˆ ์„ค๋ช…ํžˆ ์•ˆ๋˜์–ด ์žˆ๋Š” ๋ถ€๋ถ„๋“ค์ด ๋งŽ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ, ์ œ ๋‚˜๋ฆ„๋Œ€๋กœ ๊ทธ๋ƒฅ ํ•œ ๋ฒˆ ๋ณด๊ณ ๋งŒ ๋„˜์–ด๊ฐ€๋ฉด ์ข‹์„ ์นœ๊ตฌ๋“ค๊ณผ, ์ž์„ธํžˆ ์‚ดํŽด๋ณผ ์นœ๊ตฌ๋“ค์„ ๋‚˜๋ˆ„์–ด์„œ ์„ค๋ช…๋“œ๋ฆฌ๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
๋จผ์ €,ย ํ•œ ๋ฒˆ ๋ณด๊ณ ๋งŒ ๋„˜์–ด๊ฐˆ ์š”์†Œ๋“ค์ž…๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ ์ž์„ธํ•œ ์„ค๋ช…์ด๋‚˜ ์ด์œ ๋ฅผ ์•Œ๋ ค์ฃผ์ง€ ์•Š์•„์„œ ์•„์‰ฝ์Šต๋‹ˆ๋‹ค๋งŒ, performance ์— ๋ถ„๋ช… ๋„์›€์ด ๋˜์—ˆ๊ธฐ์— ์ง„ํ–‰ํ–ˆ์„ ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค.
1.
๋…ผ๋ฌธ์—์„œ๋Š” ImageNet 1000-class competition dataset ์œผ๋กœ pretrain ์„ ์ง„ํ–‰ํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
2.
Ren et al. ์—์„œ pretrain model ์— convolutional layer ์™€ fully connected layer ๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ์„ฑ๋Šฅ์ด ์ข‹์•„์ง„๋‹ค๊ณ  ํ•œ ๋ถ€๋ถ„์— ์ฐฉ์•ˆํ•˜์—ฌ ์ง„ํ–‰ํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
3.
Bounding box ์˜ width ์™€ height ๋ฅผ normalize ํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. (์•„๋งˆ๋„ ์ˆ˜๋ ด ์†๋„๋ฅผ ์œ„ํ•ด์„œ๊ฒ ์ฃ โ€ฆ?)
4.
๋งˆ์ง€๋ง‰ layer ์— leaky ReLU ๋ฅผ ์„ค์น˜ํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
ฯ•(x)={x,ifย xย >ย 00.1x,otherwiseย \phi(x) = \begin{cases} x, & \text{if x > 0} \\ 0.1x, & \text{otherwise } \end{cases}
๋‹ค์Œ์œผ๋กœ,ย ์ž์„ธํžˆ ์‚ดํŽด ๋ณผ ์š”์†Œ๋“ค์ž…๋‹ˆ๋‹ค. ์ด๋“ค์€ ๋’ค์—์„œ ์„ค๋ช…ํ•  loss function ์„ค๊ณ„์— ์˜ํ–ฅ์„ ์ฃผ๊ฑฐ๋‚˜ ๋ฐฉ๋ฒ•๋ก ์— ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๋Š” ์š”์†Œ๋“ค์ด๊ธฐ ๋•Œ๋ฌธ์— ์ž์„ธํžˆ ์‚ดํŽด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
๊ทธ ์ „์— ๋จผ์ € Object Detection ์—์„œ ํ‰๊ฐ€ ์š”์†Œ๋กœ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” mAP ์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณผ ํ•„์š”๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฏธ ์•Œ๊ณ  ๊ณ„์‹œ๋‹ค๋ฉด ๊ทธ๋ƒฅ ๋„˜์–ด๊ฐ€์…”๋„ ์ข‹์Šต๋‹ˆ๋‹ค.mAP ์— ๋Œ€ํ•ด์„œ ์•Œ๊ธฐ ์œ„ํ•ด์„  AP ์— ๋Œ€ํ•ด์„œ ์•Œ์•„์•ผ ํ•˜๊ณ , AP ์— ๋Œ€ํ•ด์„œ ์•Œ๊ธฐ ์œ„ํ•ด์„  precesion ๊ณผ recall ์— ๋Œ€ํ•ด์„œ ์•Œ์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.๋ญ ์ด๋ ‡๊ฒŒ ์•Œ์•„์•ผ ํ• ๊ฒŒ ๋งŽ์•„!!!!! ๋ผ๊ณ  ํ•˜์‹ค ์ˆ˜ ์žˆ์ง€๋งŒ ์•Œ๊ณ  ๋ณด๋ฉด ์–ด๋ ค์šด ๊ฐœ๋…์€ ์•„๋‹™๋‹ˆ๋‹ค.ํ•œ ๋งˆ๋””๋กœ ํ‘œํ˜„ํ•˜์ž๋ฉด, Precesion ์€ ์ •ํ™•๋„์ด๊ณ  Recall ์€ ์žฌํ˜„์œจ์ž…๋‹ˆ๋‹ค.์˜ณ๊ณ  ๊ทธ๋ฆ„์„ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋Š” ์–ด๋–ค ์ž‘์—… X ์— ๋Œ€ํ•ด์„œ X ์˜ ๊ฒฐ๊ณผ๋กœ๋Š” 4๊ฐ€์ง€๊ฐ€ ๋‚˜์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 1. ์˜ณ๋‹ค๊ณ  ํŒ๋‹จํ–ˆ๋Š”๋ฐ ์˜ณ์€ ๊ฒฝ์šฐ 2. ์˜ณ๋‹ค๊ณ  ํŒ๋‹จํ–ˆ๋Š”๋ฐ ํ‹€๋ฆฐ ๊ฒฝ์šฐ 3. ํ‹€๋ฆฌ๋‹ค๊ณ  ํŒ๋‹จํ–ˆ๋Š”๋ฐ ์˜ณ์€ ๊ฒฝ์šฐ 4. ํ‹€๋ฆฌ๋‹ค๊ณ  ํŒ๋‹จํ–ˆ๋Š”๋ฐ ํ‹€๋ฆฐ ๊ฒฝ์šฐ ๊ฐ๊ฐ์˜ ํ™•๋ฅ ์„ Pi(X)P_i(X)๋ผ ์นญํ•œ๋‹ค๋ฉด, Precesion ์€ P1(X)P1(X)+P2(X)\frac{P_1(X)}{P_1(X)+P_2(X)}๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๊ณ , Recall ์€ P1(X)P1(X)+P3(X)\frac{P_1(X)}{P_1(X)+P_3(X)}๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณด๋ฉด ๋ถ„๋ชจ ๋ถ€๋ถ„๋งŒ ๋‹ค๋ฅด์ฃ ?? ์ด๊ฒŒ ๋‘ ๊ธฐ์ค€์ด ์ง‘์ค‘ํ•˜๋Š” ๋ถ€๋ถ„์ด ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. Precesion ์€ ์˜ณ๋‹ค๊ณ  ํŒ๋‹จํ•œ ๊ฒƒ ์ค‘ ์‹ค์ œ ์˜ณ์€ ๊ฒƒ์˜ ๋น„์œจ์ด๋ฉฐ, Recall ์€ ์‹ค์ œ๋กœ ์˜ณ์€ ๊ฒƒ ์ค‘ ์˜ณ๋‹ค๊ณ  ํŒ๋‹จํ•œ ๊ฒƒ์˜ ๋น„์œจ์ž…๋‹ˆ๋‹ค. Precesion ์ด ๋†’๋‹ค๋Š” ๊ฒƒ์€ P2(X)P_2(X)๊ฐ€ P1(X)P_1(X)์— ๋น„ํ•ด ์ž‘์€ ๊ฒฝ์šฐ์ด๊ณ , Recall ์ด ๋†’๋‹ค๋Š” ๊ฒƒ์€ P3(X)P_3(X)๊ฐ€ P1(X)P_1(X)์— ๋น„ํ•ด ์ž‘์€ ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค. ์ฆ‰, Precesion ๊ณผ Recall ๋ชจ๋‘๊ฐ€ ๋†’์€ ๊ฒƒ์€ ๊ทธ๋ƒฅ X ๊ฐ€ ํ‹€๋ฆฐ ํŒ๋‹จ์„ ํ–ˆ์„ ๊ฒฝ์šฐ๊ฐ€ ์ ์„ ๋•Œ์ด๊ณ , ์™„๋ฒฝํ•œ ํŒ๋‹จ์„ ํ•œ ๊ฒƒ์— ๊ฐ€๊นŒ์›Œ ์ง‘๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ , ์™„๋ฒฝํ•œ ํŒ๋‹จ์— ๊ฐ€๊น๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด์„  Precesion ๊ณผ Recall ๋ชจ๋‘๋ฅผ ๋†’์ด๋ฉด ๋œ๋‹ค๋Š” ์‚ฌ์‹ค๋„ ๋ฐ˜๋Œ€๋กœ ๋„์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ, P2(X)P_2(X)์™€ P3(X)P_3(X) ๋ชจ๋‘ ์ž‘์•„์ง€๊ธธ ์›ํ•˜์ง€๋งŒ... ์‹ค์ œ๋กœ๋Š” ๊ทธ๋ ‡๊ฒŒ ์™„๋ฒฝํ•œ ํŒ๋‹จ์ด ์‰ฝ๊ฒŒ ๋‚˜์˜ค์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฝํ–ฅ์„ฑ ๋•Œ๋ฌธ์— P2(X)P_2(X)์™€ P3(X)P_3(X)๋Š” ์˜๋„์น˜ ์•Š๊ฒŒ ํ•˜๋‚˜๊ฐ€ ์ปค์ง€๋ฉด ํ•˜๋‚˜๊ฐ€ ์ž‘์•„์ง€๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•˜๋Š”๋ฐ,
๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— Precision - Recall Curve ๋ฅผ ๊ทธ๋ฆฌ๋ฉด ์œ„์™€ ๊ฐ™์ด ๊ฐ์†Œํ•˜๋Š” ๊ทธ๋ž˜ํ”„ ์–‘์ƒ์ด ๋งŽ์ด ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ AP ์— ๋Œ€ํ•œ ๊ฐœ๋…์ด ๋“ฑ์žฅํ•ฉ๋‹ˆ๋‹ค. AP ๋Š” Precesion, Recall ๋ชจ๋‘ ์ปค์„œ ์™„๋ฒฝํ•œ ํŒ๋‹จ์„ ํ•˜๋Š” ๊ฒƒ์ด ์ข‹๊ฒŒ ํ‰๊ฐ€ํ•˜์ž๋Š” ์˜๋ฏธ์—์„œ ์ •์˜๋œ evaluation metric ์œผ๋กœ Precision - Recall Curve ๋กœ ๊ฐ์‹ธ์ ธ ์žˆ๋Š” ์˜์—ญ์˜ ๋„ˆ๋น„๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.mAP ๋Š” ์—ฌ๊ธฐ์„œ ํ•œ ๋‹จ๊ณ„ ๋” ๊ฐ€์„œ mean ์ด๋ผ๋Š” ๋‹จ์–ด๊ฐ€ ์ถ”๊ฐ€๋œ evaluation metric ์ž…๋‹ˆ๋‹ค. ํŒ๋‹จ์ด ํ•œ ๋ฒˆ์ด ์•„๋‹ˆ๋ผ ์—ฌ๋Ÿฌ ๋ฒˆ ๋‚˜ํƒ€๋‚  ๋•Œ mean ์„ ๊ณ„์‚ฐํ•ด ์‚ฐ์ถœํ•˜๋Š” ์ง€ํ‘œ์ž…๋‹ˆ๋‹ค. YOLO ์˜ ๊ฒฝ์šฐ์—๋Š” "์ž๋™์ฐจ๋ฅผ ๊ฒ€์ถœํ•˜๋Š” ํŒ๋‹จ" ๋งŒ ์žˆ์„ ๋ฟ๋งŒ์ด ์•„๋‹ˆ๋ผ, "์ธ๊ฐ„์„ ๊ฒ€์ถœํ•˜๋Š” ํŒ๋‹จ" ๋„ ์žˆ๋“ฏ์ด label ์ˆ˜ ๋งŒํผ์˜ ํŒ๋‹จ์ด ์ด๋ฃจ์–ด์ง€๊ธฐ ๋•Œ๋ฌธ์— mAP ๋ฅผ ์ง€ํ‘œ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.์„ค๋ช…์ด ๊ธธ์—ˆ๋Š”๋ฐ ์–ผ๋ฅธ ๋ณธ๋ก ์œผ๋กœ ๋Œ์•„๊ฐ‘์‹œ๋‹ค..!!!
1.
๋…ผ๋ฌธ์—์„œ๋Š”ย Sum-Squared Error ๋ฅผ ์‚ฌ์šฉํ•ด loss ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.์ตœ์ ํ™”๊ฐ€ ์šฉ์ดํ•ด์„œ ์‚ฌ์šฉํ–ˆ๋‹ค๊ณ  ํ•˜๋Š”๋ฐ, ๋…ผ๋ฌธ์—์„œ Sum-Squared Error ๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ ์ง๋ฉดํ–ˆ๋˜ ๋ฌธ์ œ๊ฐ€ loss ์˜ ์ตœ์ ํ™”๊ฐ€ mAP ์˜ ์ฆ๊ฐ€์™€ ์™„๋ฒฝํ•˜๊ฒŒ align ๋˜์ง€ ์•Š์•˜๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.ํŠนํžˆ x, y, w, h ์—์„œ ์‚ฐ์ถœ๋˜๋Š” localization error ์™€ c ์—์„œ ์‚ฐ์ถœ๋˜๋Š” classification error ๋ฅผ ๋™์ผํ•œ ๊ฐ€์ค‘์น˜๋กœ ๊ณ„์‚ฐํ•˜๋‹ค๋ณด๋‹ˆ ๋งŽ์€ ์ด๋ฏธ์ง€์—์„œ Grid ๊ฐ€ object ๋ฅผ ํฌํ•จํ•˜์ง€ ์•Š๋Š” ๊ฒฐ๊ณผ๋ฅผ ์‚ฐ์ถœํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.์ด๊ฒŒ ์–ด๋–ค ๊ฒฝ์šฐ์— ๋ฌธ์ œ๊ฐ€ ๋˜๋ƒ ํ•˜๋ฉด, c ๋ฅผ 0์œผ๋กœ ์ถ”์ •ํ–ˆ๋Š”๋ฐ ์‹ค์ œ๋กœ ๋ฌผ์ฒด๊ฐ€ ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ, ์ด ํฐ ์ฐจ์ด๊ฐ€ gradient ๋ฅผ ๊ต‰์žฅํžˆ ํฌ๊ฒŒ ๋งŒ๋“ค์—ˆ๊ณ  model ์— instability ๋ฅผ ๋ถ€์—ฌํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.์ด๋Ÿฐ ํ˜„์ƒ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œย bounding box coordinate ์˜ loss ๋ฅผ ๋Š˜๋ฆฌ๊ณ , object ๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ์— confidence prediction ์˜ loss ๋ฅผ ์ค„์ด๋Š” factor ๋ฅผ ๋„์ž…ํ–ˆ์Šต๋‹ˆ๋‹ค.
2.
๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•œ Sum-Squared Error ๋Š” large bounding box ์™€ small bounding box ๊ฐ€ loss ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ๋™์ผํ•˜๊ฒŒ ๊ฐ€์ค‘ํ–ˆ์Šต๋‹ˆ๋‹ค.์ด๊ฒŒ ์™œ ๋ฌธ์ œ๊ฐ€ ๋˜๋ƒํ•˜๋ฉด, IOU ๋ฅผ ๊ณ„์‚ฐํ•จ์— ์žˆ์–ด์„œ large bounding box ์˜ ๋ฏผ๊ฐ๋„, ์ฆ‰ deviation ์ด ์ž‘๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.์ด๋Ÿฐ ํ˜„์ƒ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ w, h ์˜ scale ์„ square root ๋ฅผ ์‚ฌ์šฉํ•ด ์ „๋ฐ˜์ ์œผ๋กœ ์ค„์—ฌ์„œย large bounding box ์™€ small bounding box ์—์„œ์˜ deviation ์ฐจ์ด๋ฅผ ์ค„์ด๋Š” ๋ฐฉ์‹์„ ์„ ํƒํ–ˆ์Šต๋‹ˆ๋‹ค.
3.
Obejct ๋‹น ํ•˜๋‚˜์˜ bounding box ๋ฅผ ๊ฐ€์ง€๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด์„œย NMS(Non-Maximum Suppresion) ์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ง ๊ทธ๋Œ€๋กœ maximum ์ด ์•„๋‹Œ confidence ๋ฅผ ๊ฐ€์ง„ bounding box ๋ฅผ ์–ต์ œํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋ ‡๊ฒŒ ์œ„ 1, 2 ๋ฒˆ์„ ๊ณ ๋ คํ•ด์„œ Sum-Squared Error ๋กœ loss ๋ฅผ ์„ค๊ณ„ํ–ˆ๊ณ  ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
ฮปcoordโˆ‘i=0S2โˆ‘j=0BIijobj[(xiโˆ’x^i)2+(yiโˆ’y^i)2]+ฮปcoordโˆ‘i=0S2โˆ‘j=0BIijobj[(wiโˆ’w^i)2+(hiโˆ’h^i)2]+โˆ‘i=0S2โˆ‘j=0BIijobj(Ciโˆ’C^i)2+ฮปnoobjโˆ‘i=0S2โˆ‘j=0BIijnoobj(Ciโˆ’C^i)2+โˆ‘i=0S2Iiobjโˆ‘cโˆˆclasses(pi(c)โˆ’p^i(c))2\lambda_{coord}\sum_{i=0}^{S^2}\sum_{j=0}^B\mathbb{I}_{ij}^{obj}[(x_i-\hat{x}_i)^2+(y_i-\hat{y}_i)^2]+\\ \lambda_{coord}\sum_{i=0}^{S^2}\sum_{j=0}^B\mathbb{I}_{ij}^{obj}[(\sqrt{w_i}-\sqrt{\hat{w}_i})^2+(\sqrt{h_i}-\sqrt{\hat{h}_i})^2]+\\ \sum_{i=0}^{S^2}\sum_{j=0}^B\mathbb{I}_{ij}^{obj}(C_i-\hat{C}_i)^2+\\ \lambda_{noobj}\sum_{i=0}^{S^2}\sum_{j=0}^B\mathbb{I}_{ij}^{noobj}(C_i-\hat{C}_i)^2+\\ \sum_{i=0}^{S^2}\mathbb{I}_i^{obj}\sum_{c\in classes}(p_i(c)-\hat{p}_i(c))^2
์œ„ ์‹์€ ์‚ฌ์‹ค ๋ณต์žกํ•ด๋ณด์ด์ง€๋งŒ, ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค.
Iijobj\mathbb{I}_{ij}^{obj}1_ij^obj (โ€ฆ์ด ๊ธฐํ˜ธ ํ‘œํ˜„ํ•˜๋ ค๋‹ˆ ์กฐ๊ธˆ ๊ทธ๋ ‡๋„ค์š”.) ๋Š” i ๋ฒˆ์งธ Grid ์˜ j ๋ฒˆ์งธ bounding box predictor ์— object ๊ฐ€ ์กด์žฌํ•˜๋Š”์ง€์˜ ์—ฌ๋ถ€์ž…๋‹ˆ๋‹ค.Iijobj\mathbb{I}_{ij}^{obj}๋Š” i ๋ฒˆ์งธ Grid ์— object ๊ฐ€ ์กด์žฌํ•˜๋Š”์ง€์˜ ์—ฌ๋ถ€์ž…๋‹ˆ๋‹ค.
์ด๋ ‡๊ฒŒ ์ƒ๊ฐํ•˜๋ฉด,
์ฒซ ๋ฒˆ์งธ ์ค„ ์ˆ˜์‹์€ bounding box ์˜ ์ค‘์‹ฌ์ ์— ๋Œ€ํ•œ loss ํ•ญ๋ชฉ์ž„์„,๋‘ ๋ฒˆ์งธ ์ค„ ์ˆ˜์‹์€ bounding box ์˜ dimension ์— ๋Œ€ํ•œ loss ํ•ญ๋ชฉ์ž„์„,์„ธ ๋ฒˆ์งธ ์ค„ ์ˆ˜์‹์€ bounding box predictor ๋‚ด ๋ฌผ์ฒด๊ฐ€ ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ์˜ confidence ์— ๋Œ€ํ•œ loss ํ•ญ๋ชฉ์ž„์„,๋„ค ๋ฒˆ์งธ ์ค„ ์ˆ˜์‹์€ bounding box predictor ๋‚ด ๋ฌผ์ฒด๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ์˜ confidence ์— ๋Œ€ํ•œ loss ํ•ญ๋ชฉ์ž„์„,๋งˆ์ง€๋ง‰ ์ค„ ์ˆ˜์‹์€ classification loss ํ•ญ๋ชฉ์ž„์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
1๋ฒˆ์€ lambda ํ•ญ๋ชฉ์œผ๋กœ, ์ฒซ ๋ฒˆ์งธ, ๋‘ ๋ฒˆ์งธ, ๋„ค ๋ฒˆ์งธ ์ˆ˜์‹์—์„œ ์ ์šฉ๋˜์—ˆ๊ณ , 2๋ฒˆ์€ square-root dimension ํ•ญ๋ชฉ์œผ๋กœ ๋‘ ๋ฒˆ์งธ ์ˆ˜์‹์—์„œ ์ ์šฉ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๊ฒฐ๋ก ์ ์œผ๋กœ, Training ๋ถ€๋ถ„์—์„œ๋Š”ย loss function ์˜ ์„ค๊ณ„์™€ ๊ทธ ๊ณผ์ •์—์„œ ํ•ธ๋“ค๋งํ•œ ๋ถ€๋ถ„๋“ค์„ ์ค‘์ ์ ์œผ๋กœ ์„ค๋ช…ํ•˜๋ ค๊ณ  ํ–ˆ๋‹ค๊ณ  ๋ณด์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

Comparison to Other Detection Systems

๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค๋ฅธ Detection System ๊ณผ YOLO ๋ฅผ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค.
๋จผ์ €, ๋‚ด์šฉ์„ ์•Œ๋ ค๋“œ๋ฆฌ๊ธฐ ์ „์—!!
์ œ ์ƒ๊ฐ์—๋Š” ๋…ผ๋ฌธ์—์„œ ์„ค๋ช…ํ•˜๋ ค๊ณ  ํ•˜๋Š” ๋ฐ”๋Š” ๋ถ„๋ช…ํžˆ ๋ช‡ ๊ฐ€์ง€๋กœ ์ถ”๋ ค์ ธ ์žˆ๋Š”๋ฐ, ์ด ๊ฒƒ์„ ๊ฐœ๋ณ„์ ์œผ๋กœ ๋น„๊ตํ•˜๋‹ค ๋ณด๋‹ˆ ๋ถˆํ•„์š”ํ•˜๊ฒŒ ๊ธธ์–ด์ง„ ์ ์ด ์žˆ๋Š” ๊ฒƒ ๊ฐ™์•„์„œ ์ด ๋ถ€๋ถ„์„ ์ƒ๋žตํ•˜์…”๋„ YOLO ๋ฅผ ์ดํ•ดํ•˜๋Š”๋ฐ๋Š” ํฌ๊ฒŒ ๋ฌธ์ œ๊ฐ€ ์—†๋‹ค๊ณ  ์ƒ๊ฐ์ด ๋“ญ๋‹ˆ๋‹ค. ์ €๋ž‘ ๋น„์Šทํ•œ ์ƒ๊ฐ์ด์‹œ๋‹ค๋ฉด ๋‹ค์Œ ํ•ญ๋ชฉย Experimentsย ๋กœ ์ด๋™ํ•ด๋„ ๋ฌด๋ฐฉํ•  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
์•„๋ž˜๋Š” ๊ฐ๊ฐ์˜ Detection System ๊ณผ์˜ ๋น„๊ต์ž…๋‹ˆ๋‹ค.
1.
DPM (Deformable Ports Model)DPM ์€ static feature ์ถ”์ถœ, classify region. bounding box prediction ๋“ฑ์˜ ์ž‘์—…์ดย ๋ถ„๋ฆฌ๋œ pipelineย ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.ํ•˜์ง€๋งŒ, YOLO ๋Š” feature extraction, bounding box prediction, non-maximal suppression, contextual reasoningย ๋ชจ๋‘ ๋™์‹œ์— ๊ฐ€๋Šฅํ•˜์—ฌ ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•ฉ๋‹ˆ๋‹ค.
2.
R-CNNR-CNN ์—์„œ ์ง„ํ–‰ํ•˜๋Š” pipeline ์€ ์ƒ๋‹นํžˆ ๋ณต์žกํ•ฉ๋‹ˆ๋‹ค. DPM ์—์„œ ์‚ฌ์šฉํ•œ sliding-window ๋Œ€์‹ ์— region proposal ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.์„ธ๋ถ€์ ์œผ๋กœ๋Š”, selective search ๋กœ potential bounding box ๋“ค์„ ์ฐพ๊ณ , convolutional network ๊ฐ€ feature ๋ฅผ extract ํ•˜๊ณ , SVM ์ด box ๋“ค์„ scoring ํ•˜๊ณ , linear model ์ด bounding box ๋ฅผ ์กฐ์ •ํ•˜๊ณ , non-maximal suppression ์ด duplicate detection ์„ ์ œ๊ฑฐํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ณผ์ •๋“ค์€ ๊ฐ๊ฐ ๊ฐœ๋ณ„์ ์œผ๋กœ tuning ๋˜์–ด์•ผ ํ•˜๊ณ  ํ•™์Šต์ด ๋Š๋ฆฝ๋‹ˆ๋‹ค.ํ•˜์ง€๋งŒ, YOLO ๋Š” ์ด๋Ÿฌํ•œ individual components ๋ฅผ ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ optimize ํ•  ์ˆ˜ ์žˆ์–ด์„œ ๋” ๋น ๋ฆ…๋‹ˆ๋‹ค.
3.
Other Fast DetectorsFast R-CNN, Faster R-CNN ์€ sharing computation ๊ณผ selective search ๋Œ€์‹  neural network ๋ฅผ ์‚ฌ์šฉํ•ด R-CNN ์˜ ์†๋„๋ฅผ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ทธ๋Ÿผ์—๋„ real-time performance ๋ถ€์กฑํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ™์€ ๋งฅ๋ฝ์œผ๋กœ ๋งŽ์€ ์—ฐ๊ตฌ๊ฐ€ DPM pipeline ์„ speed-up ํ•˜๋Š”๋ฐ ์ดˆ์ ์„ ๋งž์ถ”์—ˆ์ง€๋งŒ 30Hz ๊นŒ์ง€๋งŒ ์ง€์›ํ•˜๋Š” ๋“ฑ ๋ถ€์กฑํ•œ ์ ์ด ์กด์žฌํ–ˆ์Šต๋‹ˆ๋‹ค.YOLO ๋Š” ๊ทธ ๋ฐฉ๋ฒ•๋ก ์— ๋Œ€ํ•œ ๋””์ž์ธ ์ž์ฒด๊ฐ€ ๋น ๋ฅด๊ธฐ ๋•Œ๋ฌธ์— real-time performance ์—์„œ๋„ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.
4.
Deep MultiBoxR-CNN ๊ณผ ๋‹ฌ๋ฆฌ multibox ๋Š” selective search ๋Œ€์‹  convolutional neural network ๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ single object detection ์€ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ multiple object detection ์„ ๊ตฌํ˜„ํ•˜์ง€๋Š” ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค.YOLO ๋Š” classification probabilities ๋ฅผ ๋‘์–ด multiple object detection ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
5.
OverFeatOverFeat ์€ ์•ž์„œ ๋“ฑ์žฅํ–ˆ๋˜ R-CNN ๊ณผ DPM ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ disjoint ํ•œ system ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฌธ์ œ๋ฅผ ํฌํ•จํ•ด์„œ prediction ์„ ๋‚ด๋ฆด ๋•Œ local information ๋งŒ์„ ๋ณด์•„ prediction ์— global context ๋ฐ˜์˜ํ•˜๊ธฐ ์–ด๋ ค์› ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ผ๊ด€๋œ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋ ค๋ฉด significant ํ•œ post-processing ์ด ํ•„์š”ํ•œ ์ ๋„ ๋‹จ์ ์ž…๋‹ˆ๋‹ค.YOLO ๋Š” local ํ•œ patch ์— ๋Œ€ํ•œ classifying ์ด ์•„๋‹Œ ํ•œ ๋ฒˆ์˜ ๋ชจ๋ธ ํ•™์Šต์œผ๋กœ ์ด๋ฃจ์–ด์ง€๊ธฐ ๋•Œ๋ฌธ์— global context ๋ฅผ ์ถฉ๋ถ„ํžˆ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
6.
MultiGraspYOLO ์˜ grid approach ์˜ ๊ธฐ์›์ธ MultiGrasp ๋Š” ํ•˜๋‚˜์˜ object ๋ฅผ ํฌํ•จํ•œ ์ด๋ฏธ์ง€์—์„œ graspable region ์„ ์˜ˆ์ธกํ•ด๋‚ด๋Š” ๋น„๊ต์  ๊ฐ„๋‹จํ•œ ์ž‘์—…์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.YOLI ๋Š” ์ด๋Ÿฐ MultiGrasp ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์ด๋ฏธ์ง€ ์† ๋‹ค์ค‘ ๋ผ๋ฒจ์˜ ๋‹ค์ค‘ ๋ฌผ์ฒด์˜ bounding box ์™€ classification probabilities ๋ฅผ ์ฐพ์•„๋‚ธ๋‹ค๋Š” ์ ์—์„œ ๋”์šฑ ๋ณต์žกํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ–ˆ์Šต๋‹ˆ๋‹ค.
ํœดโ€ฆ ์ƒ๋‹นํžˆ ๊ธธ์—ˆ์ง€๋งŒ, ์š”์•ฝํ•˜์ž๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
1.
YOLO๋Š” Object Identify + Obejct Labeling ์˜ ๋‘ ๋ฒˆ์˜ ๊ณผ์ •์„ ํ•˜๋‚˜๋กœ ์ค„์—ฌ ์†๋„๊ฐ€ ๋น ๋ฆ…๋‹ˆ๋‹ค.
2.
Locally information ์„ ๋ฐ”ํƒ•์œผ๋กœ object ๋ฅผ labeling ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ ๊ตฌํ˜„๋œ loss function ์„ ๋”ฐ๋ฅธ ๋ชจ๋ธ ํ•™์Šต์œผ๋กœ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ๋•Œ๋ฌธ์— global contexture ๋ฅผ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
3.
YOLO ๋Š” multi-label multi-object detection ์„ ์„ฑ๊ณต์ ์œผ๋กœ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
4.
YOLO ๋Š” ๋™์˜์ƒ ํ”Œ๋žซํผ ๋“ฑ์˜ real-time ๋งค์ฒด์—์„œ์˜ object detection ์—์„œ๋„ ๋ฌด๋ฆฌ ์—†์ด ํˆด์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋” ์š”์•ฝํ•˜์—ฌ ํ•œ ๋ฌธ์žฅ์œผ๋กœ ์ •๋ฆฌํ•˜์ž๋ฉด,
YOLO ๋Š” ๊ธฐ์กด classifying ์„ ์žฌ๊ตฌ์„ฑํ•œ ๋ฐฉ๋ฒ•๋“ค๊ณผ๋Š” ๋‹ฌ๋ฆฌ ํ•œ ๋ฒˆ์˜ ๊ณผ์ •์œผ๋กœ object detection ์„ global texture ๋ฅผ ๋ฐ˜์˜ํ•˜๋ฉด์„œ ์ง„ํ–‰ํ•˜์—ฌ ์ผ๋ฐ˜์ ์ธ ํƒ์ง€์—์„œ ์†๋„๊ฐ€ ๋น ๋ฆ„๊ณผ ๋™์‹œ์— ๋™์˜์ƒ์—์„œ๋„ ๋†’์€ ์ฃผ์‚ฌ์œจ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ์ฒด ํƒ์ง€ ๊ธฐ์ˆ ๋กœ ์ •์˜ํ•  ์ˆ˜ ์žˆ๊ฒ ์Šต๋‹ˆ๋‹ค.

Experiments

๋…ผ๋ฌธ์—์„œ๋Š” YOLO ์— ๋Œ€ํ•œ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด ์‹คํ—˜์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
PASCAL VOC 2007 DataSet ์œผ๋กœ Real-Time Detectors, Less Than Real-Time์„ ๋น„๊ตํ•œ ๊ฒƒ์ด ์œ„ ํ‘œ์ž…๋‹ˆ๋‹ค.
๋จผ์ € Less Than Real-Time ์„ ๋ด…์‹œ๋‹ค. mAP ๊ฐ€ ๊ฐ€์žฅ ์ข‹์€ ๊ฒƒ์€ Faster R-CNN ๊ณ„์—ด์ด์ง€๋งŒ, ์ด ๋“ค์€ FPS ๊ฐ€ ๋„ˆ๋ฌด ๋‚ฎ์•„์„œ Real-Time Detectors ๋กœ ์ ํ•ฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์ œ์™ธํ•˜๊ณ ๋Š” mAP ๊ฐ€ ๊ฐ€์žฅ ๋†’์€ ๊ฒƒ์ด YOLO ๊ณ„์—ด ์ž…๋‹ˆ๋‹ค.
๋‹ค์Œ์œผ๋กœ Real-Time Detectors ๋ฅผ ๋ด…์‹œ๋‹ค. mAP ๊ฐ€ ๋†’์€ ๊ฒƒ, ๊ทธ๋ฆฌ๊ณ  FPS ๊ฐ€ ๋†’์€ ๊ฒƒ ๋ชจ๋‘ YOLO ๊ณ„์—ด ์ž„์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, PASCAL VOC 2007 DataSet ์œผ๋กœ Fast R-CNN ๊ณผ YOLO ์˜ error ๋ฅผ ๋ถ„์„ํ•œ ๊ฒƒ์ด ์œ„ ํ‘œ์ž…๋‹ˆ๋‹ค.
Correct ๋Š” ์ •ํ™•ํ•œ class ๋กœ ์˜ˆ์ธกํ•˜๊ณ  IOU ๊ฐ€ 0.5 ๋ณด๋‹ค ํฐ ๊ฒฝ์šฐ์ด๊ณ ,Localization ์€ ์ •ํ™•ํ•œ class ๋กœ ์˜ˆ์ธกํ•˜๊ณ  IOU ๊ฐ€ 0.1๊ณผ 0.5 ์‚ฌ์ด์ธ ๊ฒฝ์šฐ์ด๊ณ ,Similar ๋Š” ์œ ์‚ฌํ•œ class ๋กœ ์˜ˆ์ธกํ•˜๊ณ  IOU ๊ฐ€ 0.1 ๋ณด๋‹ค ํฐ ๊ฒฝ์šฐ์ด๊ณ ,Other ๋Š” ํ‹€๋ฆฐ class ๋กœ ์˜ˆ์ธกํ•˜๊ณ  IOU ๊ฐ€ 0.1 ๋ณด๋‹ค ํฐ ๊ฒฝ์šฐ์ด๊ณ ,Background ๋Š” class ์— ๊ด€๊ณ„ ์—†์ด IOU ๊ฐ€ 0.1๋ณด๋‹ค ์ž‘์€ ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค.
YOLO ๋Š” Localization error ๊ฐ€ ํฌ์ง€๋งŒ, Fast R-CNN ์€ Background error ๊ฐ€ ํฝ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  correct ๋Š” R-CNN ์ด ์กฐ๊ธˆ ๋” ์šฐ์„ธํ•ฉ๋‹ˆ๋‹ค. ์˜จ์ „ํ•œ ์ •๋‹ต๋งŒ ๋”ฐ์ง€๋ฉด R-CNN ์ด ๋”์šฑ ์ข‹์€ detector ์ด์ง€๋งŒ, ์ „์ฒด์ ์œผ๋กœ ๋ณด์•˜์„ ๋•Œ๋Š” YOLO ๋ฅผ ๋†’๊ฒŒ ํ‰๊ฐ€ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์ „์ฒด์ ์œผ๋กœ accuracy ๊ฐ€ ๊ธฐ์กด์— ๋น„ํ•ด์„œ ํฌ๊ฒŒ ๋–จ์–ด์ง€๋Š” detector ๋Š” ์•„๋‹ˆ๋ผ๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
YOLO ์˜ background error ๊ฐ€ ์ž‘๋‹ค๋Š” ์ ์—์„œ ์ฐฉ์•ˆํ•ด Fast R-CNN ๊ณผ YOLO ๋ฅผ ํ•ฉ์ณค์„ ๋–„๋Š” mAP ๋ณ€ํ™”๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ Fast R-CNN ๋ณด๋‹ค 3.2% ๊ฐ€ ์ข‹์€ 75.0% ์˜ mAP ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ, YOLO ํ˜ผ์ž ๋Œ๋ฆฌ๋Š” ๊ฒƒ๋ณด๋‹ค๋Š” ํ›จ์”ฌ ๋Š๋ฆฐ๋ฐ, ๊ทธ๊ฒƒ๊ณผ ๊ด€๊ณ„ ์—†์ด YOLO ์ž์ฒด๊ฐ€ ๊ต‰์žฅํžˆ ๋นจ๋ผ์„œ Fast R-CNN ํ˜ผ์ž ๋Œ๋ฆฌ๋Š” ๊ฒƒ์—์„œ ํฐ ๋ณ€ํ™”๋Š” ์—†๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
PASCAL VOC 2012 Dataset ์—์„œ YOLO ๋Š” 57.9% ์˜ mAP ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” R-CNN + VGG ์™€ ๋น„์Šทํ•œ ์ˆ˜์น˜๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ๋น„์Šทํ•œ ํ‰๊ท  ์ˆ˜์น˜๋ฅผ ๊ฐ€์ง„ detector ์™€ ๋น„๊ตํ•˜์—ฌ YOLO ๋Š” ์ž‘์€ ๋ฌผ์ฒด์—์„œ ํƒ์ง€์— ์–ด๋ ค์›€์„ ๊ฒช๋Š” ๋ชจ์Šต๋“ค์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. Bottle ๊ฐ™์€ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ์ ์ˆ˜๋ฅผ ๋ณด์‹œ๋ฉด ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.๋”๋ถˆ์–ด ์ด ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด Fast R-CNN + YOLO ๋Š” mAP ๊ด€์ ์—์„œ ๊ฐ€์žฅ ์ข‹์€ detector ์ค‘ ํ•˜๋‚˜๋กœ ๋ณผ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.
Detector ์˜ Generalizability ๋ฅผ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•ด ๋…ผ๋ฌธ์—์„œ๋Š” VOC 2007 Person Datasets ์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•™์Šตํ•œ detector ์— Picasso ์˜ ๊ทธ๋ฆผ, ์˜ˆ์ˆ  ๊ทธ๋ฆผ์„ ๋„ฃ์–ด AP ๋ฅผ ํ…Œ์ŠคํŠธํ•ฉ๋‹ˆ๋‹ค.R-CNN ์˜ ๊ฒฝ์šฐ VOC 2007 Person detection ์—์„œ๋Š” ๋†’์€ AP ๋ฅผ ๋ณด์˜€์ง€๋งŒ Picasso ์™€ ์˜ˆ์ˆ  ๊ทธ๋ฆผ์—์„œ๋Š” ํ˜„์ €ํ•˜๊ฒŒ ๋‚ฎ์•„์ง„ AP ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.DPM ์€ Picasso ๋‚˜ ์˜ˆ์ˆ  ๊ทธ๋ฆผ์—์„œ์˜ AP ์ €ํ•˜๊ฐ€ ํฌ์ง„ ์•Š์•˜์ง€๋งŒ VOC 2007 Person ์— ๋Œ€ํ•œ AP ๋„ ๋†’์€ ํŽธ์€ ์•„๋‹ˆ์—ˆ์Šต๋‹ˆ๋‹ค.
๋”๋ถˆ์–ด, ๋…ผ๋ฌธ์—์„œ๋Š” YOLO ์˜ ์‹ค์‹œ๊ฐ„ ๊ฐ์ฒด ํƒ์ง€ ๋Šฅ๋ ฅ์„ ๋ณด์—ฌ ์ฃผ๊ธฐ ์œ„ํ•ด ์‚ฌ์ง„์„ ์ฒจ๋ถ€ํ–ˆ์Šต๋‹ˆ๋‹ค.

Conclusion

์ด๊ฒƒ์œผ๋กœ ๋…ผ๋ฌธย โ€œYou Only Look Once: Unified, Real-Time Object Detectionโ€ย ์˜ ๋‚ด์šฉ์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ์š”์•ฝํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค.
๋…ผ๋ฌธ์ด ์ƒ๊ฐ๋ณด๋‹ค ๋‚ด์šฉ์ด ๊ต‰์žฅํžˆ ์‰ฌ์›Œ์„œ ์ž˜ ์ฝํ˜”๋˜๊ฒŒ ๊ธฐ์–ต์— ๊ฐ€์žฅ ๋‚จ๋Š” ์ ์ธ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๋”๋ถˆ์–ด ์‰ฝ๊ฑฐ๋‚˜ ์ ์€ ๋ณธ๋ก  ๋‚ด์šฉ์— ๋น„ํ•ด์„œ ๊ธฐ์กด ๊ธฐ์ˆ ๋“ค๊ณผ์˜ ์ฐจ๋ณ„์  ๋‚˜์—ด์—์„œ ํ•ต์‹ฌ์€ ๋ช‡ ๊ฐ€์ง€๋กœ ์ •ํ•ด์ ธ ์žˆ๋Š”๋ฐ ์„ค๋ช…๋งŒ ์žฅํ™ฉํ•œ ๋Š๋‚Œ์ด๋ผ ์•„์‰ฌ์› ์Šต๋‹ˆ๋‹ค.(์•„๋งˆ๋„ ๋…ผ๋ฌธ ํ†ต๊ณผ๋ฅผ ์œ„ํ•ด์„œ๊ฒ ์ฃ ..?)
Object Detection ์˜ย ๊ณ„๋ณด(?)ย ๊ฐ™์€ ์นœ๊ตฌ๊ฐ€ ์ •๋ฆฌ๋˜์–ด ์žˆ์–ด์„œ ๋ฌด์—‡์„ ์ฝ์–ด๋ณผ๊นŒ ๊ณ ๋ฏผํ•˜๋‹ค๊ฐ€ ๋งŽ์ด ๋“ค์–ด๋ณธ ์นœ๊ตฌ์ธ YOLO ๋ฅผ ์ฝ์—ˆ๋Š”๋ฐ ๋‚˜๋ฆ„ ์ž˜ ์ฝ์—ˆ๋‹ค๊ณ  ์ƒ๊ฐ์ด ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์‹ค ์ €๋Š” yolov4 ๋ฅผ ์—ด์–ด๋ณด์•˜๋Š”๋ฐ, yolo ์‹œ๋ฆฌ์ฆˆ์˜ ์ €์ž๊ฐ€ ๋ฐ”๋€Œ๊ธฐ๋„ ํ–ˆ๊ณ , performance ๊ฐœ์„ ์˜ ๊ทนํ•œ์„ ๋ณด์—ฌ์ฃผ๋Š” ๋Š๋‚Œ์ด์–ด์„œ ์ข€ ๋” ์›๋ก ์ ์ธ ์นœ๊ตฌ๋ฅผ ์ฐพ์•„์„œ ์ฝ์œผ๋ ค๊ณ  ํ–ˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์—„์ฒญ ๋Œ€๋‹จํ•œ ์นœ๊ตฌ๋ผ๊ณ  ํ•˜๋˜๋ฐ Objective Detection ์— ๋Œ€ํ•ด์„œ ์กฐ๊ธˆ ๋” ๊ด€์‹ฌ์ด ์ƒ๊ธฐ๋ฉด ๋‹ค์‹œ ์—ด์–ด๋ณด๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๋Ÿฌ๋ถ„๋“ค๋„ Objective Detection ์— ๊ด€์‹ฌ์ด ์žˆ๋‹ค๋ฉด ์œ„ ๊ณ„๋ณด ์‚ฌ์ดํŠธ์—์„œ ํ•˜๋‚˜ํ•˜๋‚˜์”ฉ ์ •๋ณตํ•ด๋‚˜๊ฐ€์‹œ๋Š” ๊ฒƒ๋„ ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.