CycleGAN

๋ถ„์•ผ
Image to Image Translation
๋ฆฌ๋ทฐ ๋‚ ์งœ
2020/11/05
๋ณธ ํฌ์ŠคํŠธ๋Š” ์ œ๊ฐ€ ํœด๋จผ์Šค์ผ€์ดํ”„ ๊ธฐ์ˆ  ๋ธ”๋กœ๊ทธ์— ๋จผ์ € ์ž‘์„ฑํ•˜๊ณ  ์˜ฎ๊ธด ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค.
๋ณธ ํฌ์ŠคํŠธ์—์„œ๋Š” ์ด์ „์— ํฌ์ŠคํŒ…ํ–ˆ๋˜ GAN์„ ํ™œ์šฉํ•˜์˜€๊ณ , ์ด์ „์— ํฌ์ŠคํŒ…ํ–ˆ๋˜ pix2pix์ฒ˜๋Ÿผ image to image translation์„ ๊ตฌํ˜„ํ–ˆ์ง€๋งŒ pix2pix์—์„œ๋Š” ํ•˜์ง€ ๋ชปํ–ˆ๋˜ ๊ฒƒ๋“ค์„ ์ƒˆ๋กœ์ด ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“  ๋…ผ๋ฌธ์— ๋Œ€ํ•ด์„œ ๋ฆฌ๋ทฐํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ด ํฌ์ŠคํŠธ๋Š” ์ด์ „ ํฌ์ŠคํŠธ GAN์— ๋Œ€ํ•œ ์ดํ•ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ง„ํ–‰ํ•  ์˜ˆ์ •์œผ๋กœ, ์ด์ „ ํฌ์ŠคํŠธ์— ๋Œ€ํ•œ ๋‚ด์šฉ์€ย ์ด๊ณณ(GAN)์„ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ์ข‹์Šต๋‹ˆ๋‹ค. ๋ฆฌ๋ทฐํ•˜๋ ค๋Š” ๋…ผ๋ฌธ์˜ ์ œ๋ชฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
โ€œUnpaired Image-to-Image Translation using Cycle-Consistent Adversaria Networksโ€
๋…ผ๋ฌธ์— ๋Œ€ํ•œ ๋‚ด์šฉ์„ ์ง์ ‘ ๋ณด์‹œ๊ณ  ์‹ถ์œผ์‹  ๋ถ„์€ย ์ด๊ณณ์„ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ์ข‹์Šต๋‹ˆ๋‹ค.

Objective

๋…ผ๋ฌธ์—์„œ ๋ชฉ์ ์œผ๋กœ ํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์€ย unpaired image to image translation framework์„ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ถ€์—ฐ ์„ค๋ช…์„ ํ•˜์ž๋ฉด, ๊ธฐ์กด์˜ pix2pix ๊ฐ™์€ general-purposed image to image translation framework ๊ฐ™์€ ๊ฒฝ์šฐ์—๋Š” conditional gan์„ ์ด์šฉํ•˜์—ฌ input์— ํ•ด๋‹น๋˜๋Š” output์„ ํ˜•์„ฑํ•˜๊ธฐ ์œ„ํ•ด input๊ณผ ํ•จ๊ป˜ ground truth output์„ ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ๋„ฃ์–ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ง€๊ธˆ ์†Œ๊ฐœํ•œ cycleGAN์˜ ๊ฒฝ์šฐ์—๋Š” input ๋ฐ์ดํ„ฐ์™€ ๊ทธ์— ํ•ด๋‹นํ•˜๋Š” ground truth ๋ฐ์ดํ„ฐ๊ฐ€ ์•„๋‹Œ, input data์™€ ๊ทธ์—๋Š” ๋Œ€์‘๋˜์ง€ ์•Š๋Š” ์—ฌ๋Ÿฌ ground truth data๋“ค์„ ์ด์šฉํ•ด ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด์ „์— selfie2anime์— ๋Œ€ํ•ด์„œ ํฌ์ŠคํŒ…์„ ์ง„ํ–‰ํ•œ ์ ์ด ์žˆ์—ˆ๋Š”๋ฐ, ์ด๊ฒƒ์„ ์˜ˆ๋กœ ๋“ค์ž๋ฉด pix2pix ์˜ ๊ฒฝ์šฐ์—๋Š” ํ•™์Šต๋ฐ์ดํ„ฐ๋กœ selfie์™€ ์ด๋ฅผ ๋ณ€ํ™˜ํ•œ anime ์‚ฌ์ง„์„ ์ด์šฉํ•ด ๋Œ€์‘๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•œ ๋ฐ˜๋ฉด, cycleGAN์—์„œ ์ง„ํ–‰ํ•˜๋ ค๋Š” ๊ฒƒ์€ anime ์‚ฌ์ง„๋“ค์˜ ํŠน์„ฑ(๋…ผ๋ฌธ์—์„œ๋Š” distribution์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค)๋“ค์„ ์ด์šฉํ•ด selfie ์‚ฌ์ง„์„ animeํ™” ์‹œํ‚จ๋‹ค๊ณ  ๋ณด์‹œ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, ์–ด๋– ํ•œ ๋ฐฐ๊ฒฝ์ด๋“ ย paired image ์—†์ด image to image translation์„ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด ๋…ผ๋ฌธ์˜ ๋ชฉํ‘œ์ด๋ฉฐ ์ด๋ฅผ ๋‹ฌ์„ฑํ•จ์œผ๋กœ์จ ๋‹ค์–‘ํ•œ ์‘์šฉ์ด ๊ฐ€๋Šฅํ•จ์„ ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

Cycle Consistency

๋…ผ๋ฌธ์—์„œ๋Š” ์•ž์„œ ์„ค๋ช…ํ•œ unpaired image to image translation์„ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ์—ฌ๋Ÿฌ ๊ด€๋ จ ์—ฐ๊ตฌ๋“ค์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๊ฐ€์žฅ ๋จผ์ € image to image translation์ด๋ผ๋Š” ์ ์—์„œ ์•ž์„œ ์–ธ๊ธ‰ํ–ˆ๋˜ pix2pix์˜ ๊ตฌ์กฐ๋ฅผ ์–ธ๊ธ‰ํ•˜๋Š”๋ฐ, ์ด๋Š” paired image to image translation์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ฒฐ์ด ์•ฝ๊ฐ„ ๋‹ค๋ฅด๋‹ค๊ณ  ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ์œผ๋กœ Zhou et al.๊ณผ Godard et al.์—์„œ ์†Œ๊ฐœ๋œ cycle consistency loss์—์„œ ์˜๊ฐ์„ ๋ฐ›์•„์„œ ๋ณ€ํ™˜๊ณผ ์—ญ๋ณ€ํ™˜์ด ์„œ๋กœ์— ๋Œ€ํ•ด์„œ ์ผ๋Œ€์ผ ๋Œ€์‘์ด ๋˜๋„๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๊ธฐ์กด์˜ neural style transfer ๋ฐฉ๋ฒ•(ํ•œ ์ด๋ฏธ์ง€๋ฅผ ๋‹ค๋ฅธ ์ด๋ฏธ์ง€์˜ style์„ ๊ฐ€์ง€๋„๋ก ๋ณ‘ํ•ฉ)์„ ์–ธ๊ธ‰ํ•˜๋Š”๋ฐ, ์ด๋ฏธ์ง€ ํ•˜๋‚˜์˜ ๊ฐœ๋ณ„์ ์ธ transfer์ด ์•„๋‹Œ ๋‘ ์ด๋ฏธ์ง€ ์ง‘๋‹จ์˜ ๋ณ€ํ™˜ ๊ด€๊ณ„๋ฅผ ํ•™์Šต์‹œ์ผœ ์ผ๋ฐ˜์ ์ธ translation์„ ํ•™์Šตํ•˜๊ณ  ์‹ถ์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
๋”ฐ๋ผ์„œ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด ์„ธ ๊ฐ€์ง€ ๋ฐฉ๋ฒ• ์ค‘ ๋‘ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์•ž์„œ ์„ค๋ช…ํ•œ ๋ณ€ํ™˜๊ณผ ์—ญ๋ณ€ํ™˜์„ ๊ฑฐ์ณ ์›๋ž˜์˜ ์ด๋ฏธ์ง€๊ฐ€ ๋˜๋ ค๋Š” ํŠน์„ฑ์„ย cycle consistency๋ผ๊ณ  ์นญํ•ฉ๋‹ˆ๋‹ค.

Cost Function Design

์•ž์„œ ์„ค๋ช…๋“œ๋ฆฐ cycle consistency๋ฅผ ์ฐพ๋Š”๋‹ค๋ผ๋Š” ๊ฒƒ์„ ์ˆ˜์‹์ ์œผ๋กœ ์ „๊ฐœํ•˜๋ฉด ๋‘ ์ง‘๋‹จ X, Y์— ๋Œ€ํ•ด์„œ G(F(Y)) โ‰ˆ X, F(G(X)) โ‰ˆ Y๋ฅผ ๋งŒ์กฑํ•˜๋Š” ๋‘ mapping G: X โ†’ Y ์™€ F: Yโ†’ X๋ฅผ ์ฐพ๋Š”๋‹ค๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ด๋Ÿฌํ•œ ๋‘ mapping์„ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋‘ discriminator์ธ D_X, D_Y๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ GAN๊ณผ๋Š” ์ฐจ๋ณ„์ ์ธ ์ƒˆ๋กœ์šด cost function์„ ์„ค๊ณ„ํ•ฉ๋‹ˆ๋‹ค.
๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•œ cost function์€ ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ loss์˜ ๊ฒฐํ•ฉ์œผ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” GAN ๋…ผ๋ฌธ๋ฆฌ๋ทฐ์—์„œ๋„ ์–ธ๊ธ‰ํ–ˆ๋˜ adversarial loss์ž…๋‹ˆ๋‹ค.
LGAN(G,DY,X,Y)=Eyโˆผpdata(y)[logโกDY(y)]+Exโˆผpdata(x)[logโก(1โˆ’DY(G(X)))]L_{GAN}(G,D_Y,X,Y)=E_{y\sim p_{data}(y)}[\log D_Y(y)]+\\E_{x\sim p_{data}(x)}[\log (1-D_Y(G(X)))]
์œ„ ์‹๊ณผ ๊ฐ™์ด adversarial loss๋ฅผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, GAN ๋…ผ๋ฌธ๋ฆฌ๋ทฐ๋ฅผ ๋ณด์‹  ๋ถ„๋“ค์€ ์ต์ˆ™ํ•˜์‹ค ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ณด์ง€ ์•Š์œผ์‹  ๋ถ„๋“ค์„ ์œ„ํ•ด ๊ฐ„๋‹จํžˆ ๋ถ€์—ฐ์„ค๋ช…์„ ํ•˜์ž๋ฉด, G๋Š” generator์ด์ž G: X โ†’ Y mapping ์ด๋ฉฐ, D_Y๋Š” G๋กœ ์ธํ•ด์„œ mapping๋œ xโˆˆX๊ฐ€ real์ธ์ง€ fake์ธ์ง€ ๊ตฌ๋ณ„ํ•˜๋Š” discriminator์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ GAN์ฒ˜๋Ÿผ discriminator๋Š” real์„ real๋กœ fake๋ฅผ fake๋กœ ์ œ๋Œ€๋กœ ๊ตฌ๋ณ„ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด๊ธฐ ๋•Œ๋ฌธ์— real case์ธ ์ฒซ ๋ฒˆ์งธ ํ•ญ D_Y(y)๋ฅผ 1๋กœ, fake case์ธ ๋‘ ๋ฒˆ์งธ ํ•ญ ๋‚ด๋ถ€์˜ D_Y(G(x))๋ฅผ 0์œผ๋กœ ๊ตฌ๋ณ„ํ•˜๋Š” ๊ฒƒ์ด optimalํ•œ case๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋•Œ๋ฌธ์— discriminator๋Š” L_GAN์„ ์ตœ๋Œ€ํ™”์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.๋ฐ˜๋ฉด generator๋Š” fake๋ฅผ real์ธ ๊ฒƒ ๋งˆ๋ƒฅ ๋งŒ๋“ค์–ด ๋‚ด๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด๊ธฐ ๋•Œ๋ฌธ์— D_Y(G(x))๋ฅผ discriminator๊ฐ€ 1๋กœ ๊ตฌ๋ณ„ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋•Œ๋ฌธ์— generator๋Š” L_GAN์„ ์ตœ์†Œํ™”์‹œํ‚ค๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ min_G max_D L_GAN์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์„ cost function์œผ๋กœ ์„ค์ •์„ ํ–ˆ์—ˆ์Šต๋‹ˆ๋‹ค.
์—ฌ๊ธฐ๊นŒ์ง€๊ฐ€ ์ผ๋ฐ˜์ ์ธ GAN์˜ ๋ฒ”์œ„์ธ ๊ฒƒ์— ๋น„ํ•ด์„œ cycleGAN์—์„œ๋Š” adversarial loss๋กœ ํ•œ ๊ฐ€์ง€๊ฐ€ ๋” ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค. ์œ„ ๊ณผ์ •์˜ ๋ฐ˜๋Œ€ ๊ณผ์ •์ž…๋‹ˆ๋‹ค. ์œ„์˜ ์‹์€ G: X โ†’ Y mapping generator์™€ D_Y discriminator๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ loss์˜€๋‹ค๋ฉด, cycle consistency๋ฅผ ์œ„ํ•ด์„œ ๋…ผ๋ฌธ์—์„œ๋Š” F: Yโ†’ X mapping generator์™€ D_X discriminator ๋˜ํ•œ ํ•™์Šต์„ ์ง„ํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด์„œ ๋…ผ๋ฌธ์—์„œ๋Š” ์•ž์„œ ์†Œ๊ฐœํ–ˆ๋˜ GAN loss์˜ ๋ฐ˜๋Œ€๊ณผ์ •์„ ๋ฐ˜์˜ํ•˜๋Š” L_GAN(F, D_X, Y, X)๋ฅผ adversarial loss๋กœ ์ถ”๊ฐ€ํ•˜์—ฌ ์„ค๊ณ„๋ฅผ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.
์ด๋ ‡๊ฒŒ ๊นŒ์ง€๋งŒ ์ง„ํ–‰ํ•œ๋‹ค๋ฉด G์™€ F๋ฅผ ๊ฐ๊ฐ ํ•™์Šต ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์ถฉ๋ถ„ํ•œ ๊ทผ๊ฑฐ๋ฅผ cost function์„ ํ†ตํ•ด์„œ ์„ค๊ณ„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ, ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ์˜ ์ดˆ๊ธฐ์—์„œ ์„ค๋ช…์„ ๋“œ๋ ธ๋“ฏ์ด, cycle consistency๋ฅผ cost function์„ ํ†ตํ•ด์„œ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” cost function์˜ ์ถ”๊ฐ€์ ์ธ ํ•ญ๋ชฉ์˜ ๊ธฐ์—ฌ๊ฐ€ ํ•„์š”ํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š๋‹ค๋ฉด input domain์ด target domain์˜ distribution๋งŒ ๊ฐ€์ง„ ์ฑ„๋กœ randomํ•˜๊ฒŒ mapping ๋˜์–ด ๋‹ค์‹œ ์—ญ์œผ๋กœ mapping ํ–ˆ์„ ์‹œ ์›๋ž˜์˜ image๋กœ ๋Œ์•„์˜ค์ง€ ์•Š๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ƒํ™ฉ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋…ผ๋ฌธ์—์„œ๋Š” cycle consistency loss๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
Lcyc(G,F)=Exโˆผpdata(x)[โˆฃโˆฃF(G(x))โˆ’xโˆฃโˆฃ1]+Eyโˆผpdata(y)[โˆฃโˆฃG(F(y))โˆ’yโˆฃโˆฃ1]L_{cyc}(G,F)=E_{x\sim p_{data}(x)}[\left|| F(G(x))-x |\right|_1]+\\E_{y\sim p_{data}(y)}[\left|| G(F(y))-y |\right|_1]
์œ„์˜ ์‹์ด cycle consistency๋ฅผ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•œ ํ•ญ์œผ๋กœ ์ถ”๊ฐ€ํ•œ cycle consistency loss์ž…๋‹ˆ๋‹ค. p_data(x) distribution์„ ๋”ฐ๋ฅด๋Š” x๋“ค์— ๋Œ€ํ•ด์„œ F(G(x))์™€ x ์‚ฌ์ด์˜ L1 loss๋“ค์˜ mean๊ณผ p_data(y) distribution์„ ๋”ฐ๋ฅด๋Š” y๋“ค์— ๋Œ€ํ•ด์„œ G(F(y))์™€ y์‚ฌ์ด์˜ L1 loss๋“ค์˜ mean์„ ๋ฐ˜์˜ํ•˜์—ฌ cycle consistency loss๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ํ˜•ํƒœ๋กœ ํ•™์Šต์„ ์ง„ํ–‰์‹œํ‚ค๊ณ ์ž ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด๋“ค์„ ์ข…ํ•ฉํ•˜์—ฌ ์ตœ์ข…์ ์œผ๋กœ ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•œ cost function์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
L(G,F,DX,DY)=LGAN(G,DY,X,Y)+LGAN(F,DX,Y,X)+ฮปLcyc(G,F)L(G,F,D_X,D_Y)=L_{GAN}(G,D_Y,X,Y)+\\L_{GAN}(F,D_X,Y,X)+\lambda L_{cyc}(G,F)
ฮป๋ฅผ ์„ค์ •ํ•œ ๊ฒƒ์€ ์•ž์„œ ์„ค๋ช…ํ•œ adversarial loss์™€ cycle consistency loss ๊ฐ„์˜ ์ค‘์š”๋„๋ฅผ ฮป๋ฅผ ํ†ตํ•ด์„œ ์กฐ์ ˆํ•˜๊ณ  ์‹ถ์–ด์„œ ์‚ฌ์šฉ์„ ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์œ„ cost function์„ ํ†ตํ•ด์„œ ์ตœ์ข…์ ์œผ๋กœ ํ•™์Šต ํ•˜๊ณ  ์‹ถ์€ optimalํ•œ mapping G*, F*์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
Gโˆ—,Fโˆ—=argโกminโกG,FmaxโกDX,DYL(G,F,DX,DY)G^*,F^*=\arg\min_{G,F}\max_{D_X,D_Y}L(G,F,D_X,D_Y)

Network Architecture

๋„คํŠธ์›Œํฌ์˜ ๊ตฌ์กฐ๋Š” ๋…ผ๋ฌธ์—์„œ ์ž์„ธํžˆ ์งš๊ณ  ๋„˜์–ด๊ฐ€์ง€๋Š” ์•Š์ง€๋งŒ, ๋ถ€๋ก์— ์“ฐ์—ฌ์žˆ๋Š” ๋‚ด์šฉ์„ ๊ฐ„๋žตํžˆ ์š”์•ฝํ•ด์„œ ์ „๋‹ฌ๋“œ๋ฆฌ์ž๋ฉด, ํฌ๊ฒŒ๋Š” Johnson et al.์˜ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ๋”ฐ๋ผ์„œ ์„ค๊ณ„๋ฅผ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.
Generator์˜ ๊ฒฝ์šฐ, ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ๊ตฌ์กฐ 6 residual block structure, 9 residual block structure๊ฐ€ ์กด์žฌํ•˜๋Š”๋ฐ, ๋‘ ๊ตฌ์กฐ ๋ชจ๋‘ convolution-instance normalization-relu์— ํ•ด๋‹น๋˜๋Š” dk, ๊ทธ๋ฆฌ๊ณ  residual block์— ํ•ด๋‹น๋˜๋Š” Rk, ๊ทธ๋ฆฌ๊ณ  upsampling์„ ์œ„ํ•œ fractional strided convolution-instance normalization-relu layer์ธ uk๋“ค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
Discriminator์˜ ๊ฒฝ์šฐ 70x70 patchGAN์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” convolution-instance normalization-leaky relu๋กœ ๊ตฌ์„ฑ๋œ ck๋“ค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
์ž์„ธํ•œ ๋„คํŠธ์›Œํฌ ์•„ํ‚คํ…์ณ์— ๋Œ€ํ•ด์„œ ๋…ผ๋ฌธ์—์„œ ์ง์ ‘์ ์œผ๋กœ ๋‹ค๋ฃจ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๊ถ๊ธˆํ•˜์‹  ๋ถ„๋“ค์€ ๋…ผ๋ฌธ 18 ์ชฝ์˜ Appendix๋ฅผ ๋ณด์‹œ๋Š” ๊ฒƒ์„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค!

Evaluation

๋…ผ๋ฌธ์—์„œ ํ•™์Šตํ•œ mapping์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•œ ์ฒ™๋„๊ฐ€ ๊ต‰์žฅํžˆ ๋‹ค์–‘ํ•˜๊ฒŒ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ํฌ๊ฒŒ 4๊ฐ€์ง€๋กœ ๋‚˜๋ˆ„๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
1. Comparison against baselines2. Anaylsis of the loss function3. Image reconstruction quality4. Additional results on paired datatsets
๋จผ์ € ์ฒซ ๋ฒˆ์งธ๋กœ,ย Comparison against baselines๋Š” ๊ธฐ์กด์˜ ๋‹ค๋ฅธ ๋„คํŠธ์›Œํฌ + cost function์— ๋น„ํ•ด์„œ ๋…ผ๋ฌธ์˜ ๋„คํŠธ์›Œํฌ + cost function์˜ ๊ตฌ์กฐ๊ฐ€ ๊ฐ€์ง€๋Š” ์„ฑ๋Šฅ์ ์ธ ์žฅ์ ์„ ๋ณด๋Š” ์ฒ™๋„์ž…๋‹ˆ๋‹ค. ์ด ์ฒ™๋„๋Š” ๊ฑฐ์˜ ๋ชจ๋“  ๋…ผ๋ฌธ๋“ค์—์„œ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ œ์‹œํ•˜๋Š” ๊ณตํ†ต์ ์ธ ์ฒ™๋„์ด๊ณ , ์†Œ๊ฐœ๋“œ๋ฆฌ๋Š” ๋…ผ๋ฌธ์—์„œ๋„ ๋“ฑ์žฅํ•˜๋Š” ๋ชจ๋“  ํ‘œ์— ์ด ๋‚ด์šฉ์„ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด์— ๋Œ€ํ•œ ์ถ”๊ฐ€์ ์ธ ์–ธ๊ธ‰์€ ํ•˜์ง€ ์•Š๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ ์ž์ฃผ ๋“ฑ์žฅํ•˜๋Š” test method์ธ AMT์™€ FCN scores์— ๋Œ€ํ•œ ๋น„๊ต๋ฅผ ์ง„ํ–‰ํ•œ ๋‚ด์šฉ์— ๋Œ€ํ•ด์„œ ํ‘œ๋ฅผ ์ฒจ๋ถ€ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
AMT โ€œreal vs fakeโ€ test
FCN scores
AMT test๋Š” real๊ณผ fake ์ด๋ฏธ์ง€๋ฅผ ๊ตฌ๋ณ„ํ•˜๋Š” GAN์— ๋Œ€ํ•œ test๋กœ๋„ ์ž˜ ์•Œ๋ ค์ ธ ์žˆ๋Š”๋ฐ, ์‹ค์ œ ์‚ฌ๋žŒ๋“ค์„ ๋Œ€์ƒ์œผ๋กœ real๊ณผ fake๋ฅผ ๊ตฌ๋ณ„ํ•˜๊ฒŒ ํ…Œ์ŠคํŠธ๋ฅผ ์‹œํ‚ค๊ณ , ์–ผ๋งˆ๋‚˜ ๋งŽ์€ fake ์ด๋ฏธ์ง€๋ฅผ real๋กœ ์†์ด๋Š๋ƒ์— ์ดˆ์ ์ด ๋งž์ถ”์–ด์ ธ ์žˆ๋Š” ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ๋Š” ์œ„์—์„œ ๋ณด์‹œ๋Š” ๋ฐ”์™€ ๊ฐ™์ด ๋…ผ๋ฌธ์˜ ์•„ํ‚คํ…์ณ๊ฐ€ ๊ฑฐ์˜ 1/4์˜ ๊ฒฝ์šฐ๋กœ ์‚ฌ๋žŒ๋“ค์„ ์†์ผ ์ˆ˜ ์žˆ์—ˆ๋‹ค๋ผ๊ณ  ๋ณด์‹œ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
FCN scores๋Š” AMT test์™€๋Š” ๋‹ค๋ฅด๊ฒŒ ์ธ๊ฐ„์ด ์ง์ ‘ ํ…Œ์ŠคํŠธ์— ์ฐธ์—ฌํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. FCN scores๋Š” semantic segmentation algorithm์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์œผ๋กœ๋„ ์œ ๋ช…ํ•œ๋ฐ, ์–ด๋–ค ์ด๋ฏธ์ง€๋ฅผ ์›ํ•˜๋Š” label๋กœ ์ž˜ ์ธ์‹ํ–ˆ๋Š”์ง€๋ฅผ ์ฒ™๋„๋กœ ํ‰๊ฐ€๋ฅผ ์ง„ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ labels->photo์˜ image translation์—์„œ photo๋ฅผ ์›ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ž˜ ๋ถ„์„ํ•ด ๋ƒˆ๋Š”์ง€๋ฅผ ๋ณด๊ธฐ ์œ„ํ•œ ์‚ฌ์šฉํ–ˆ๊ณ , ๊ทธ ๊ฒฐ๊ณผ ์ด์ „์— ์†Œ๊ฐœ๋“œ๋ ธ๋˜ pix2pix์— ๊ทธ๋‚˜๋งˆ ๊ทผ์ ‘ํ•œ ์ˆ˜์น˜๋ฅผ ๋ฝ‘์•„๋ƒˆ๋‹ค๋Š” ๊ฒƒ์— ์˜๋ฏธ๊ฐ€ ์žˆ๋‹ค๊ณ  ๋ณด์‹œ๋ฉด ๋  ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
๋‹ค์Œ์œผ๋กœย Anaylsis of the loss functionย ์€ ์•ž์„œ ์†Œ๊ฐœ๋“œ๋ฆฐ cost function์„ ์†Œ๊ฐœํ•œ ๊ณผ์ •์ด ์ œ๋Œ€๋กœ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€๋Š”์ง€ ํ‰๊ฐ€๋ฅผ ํ•˜๊ธฐ ์œ„ํ•œ ์ฒ™๋„์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด์„œ ๋น„๊ตํ•˜๋Š” ๋Œ€์ƒ๋“ค๋กœ ์ตœ์ข…์ ์œผ๋กœ ๋…ผ๋ฌธ์—์„œ ์„ค๊ณ„ํ•œ cost function์—์„œ ํ•˜๋‚˜ ๋‘˜ ์”ฉ ํ•ญ๋ชฉ์„ ์ œ๊ฑฐํ•ด๊ฐ€๋ฉด์„œ cost function์„ ์ƒˆ๋กœ์ด ๋งŒ๋“ค๊ณ  ๊ฐ™์€ task๋ฅผ ์‹œ๋„ํ•˜์—ฌ ๋น„๊ตํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๊ทธ task๋กœ ์„ ํƒํ•˜์—ฌ ์ œ์‹œํ•œ ๊ฒƒ์ด labels->photos์˜ FCN scores ์™€ photos->labels์˜ classification performance์ž…๋‹ˆ๋‹ค.
FCN scores
Classification performance
๊ทธ ๊ฒฐ๊ณผ๊ฐ€ ์œ„์— ๋ณด์ด์‹œ๋Š” ํ‘œ์— ์ œ์‹œํ•œ ๊ฐ’๋“ค์ž…๋‹ˆ๋‹ค. GAN loss ์ž์ฒด๋ฅผ ์—†์• ๋Š” ๊ฒƒ๊ณผ cycle consistency๋ฅผ ์—†์• ๋Š” ๊ฒƒ ๋ชจ๋‘ performance์˜ ์ƒ๋‹นํ•œ ์ €ํ•˜๋ฅผ ๊ฐ€์ ธ์™”๋‹ค๋Š” ์‚ฌ์‹ค์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. forward cycle๊ณผ backward cycle์ด๋ผ ์ ํ˜€์žˆ๋Š” ๊ฒƒ๋“ค์€ ์•ž์„œ ์–ธ๊ธ‰ํ•œ cycle consistency loss์—์„œ ๊ฐ๊ฐ ํ•œ ๊ฐœ์˜ ํ•ญ๋งŒ ํฌํ•จํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
๋‹ค์Œ์œผ๋กœย Image reconstruction qualityย ์€ cycle consistency๋ฅผ ํ™•์ธํ•˜๋Š” ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ๋…ผ๋ฌธ์—์„œ ์ด๋ฅผ ์ •๋Ÿ‰์ ์œผ๋กœ ๋ถ„์„ํ•˜์ง€ ์•Š๊ณ  ์ •์„ฑ์ ์œผ๋กœ ์‚ฌ์ง„์„ ํ†ตํ•ด์„œ reconstructed image์™€ input ์ด๋ฏธ์ง€์˜ ์žฆ์€ ์œ ์‚ฌ์„ฑ์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” ๋…ผ๋ฌธ์—์„œ ์˜ˆ์‹œ๋กœ ๋“  ์‚ฌ์ง„๋“ค์ž…๋‹ˆ๋‹ค.
์ด์™€ ๋”๋ถˆ์–ดย Additional results on paired datatsetsย ์—์„œ paired learning dataset์„ ๊ฐ€์ง„ pix2pix์—์„œ ์ง„ํ–‰ํ–ˆ๋˜ dataset์„ ๋“ค์–ด pix2pix์—์„œ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์—ˆ๋˜ ์ž‘์—…๋“ค์„ ๋˜‘๊ฐ™์ด ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ฆ‰, ๋…ผ๋ฌธ์—์„œ ์†Œ๊ฐœํ•œ ๋„คํŠธ์›Œํฌ๊ฐ€ ๋” generalํ•œ ์ž‘์—…๋ฌผ์ด์—ˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” ๊ทธ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

Application

๋งˆ์ง€๋ง‰์œผ๋กœ ๋…ผ๋ฌธ์—์„œ evalutaiton part์—์„œ ์†Œ๊ฐœํ–ˆ๋˜ ๊ฒƒ์€ ๊ต‰์žฅํžˆ ๋‹ค์–‘ํ•œ ์ž‘์—…๋“ค์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ๋Š” ์ž‘์—…๋ฌผ๋“ค์ด์—ˆ์Šต๋‹ˆ๋‹ค. ์ƒ๋‹นํžˆ ํฅ๋ฏธ๋กœ์šด ๊ฒƒ์ด ๋งŽ์•„์„œ ๋ช‡ ๊ฐ€์ง€๋งŒ ๊ฐ„๋‹จํžˆ ์†Œ๊ฐœํ•ด๋“œ๋ฆฌ๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค! ์ด ๋ถ€๋ถ„์€ ์ด๋Ÿฐ ๊ฒƒ๋“ค๋„ ๊ฐ€๋Šฅํ•˜๊ตฌ๋‚˜ ์ •๋„๋กœ๋งŒ ๋ด์ฃผ์…”๋„ ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
Style transfer
input image๋ฅผ ๋‹ค์–‘ํ•œ ํ™”๊ฐ€์˜ ์Šคํƒ€์ผ์— ๋งž๊ฒŒ ๋ฐ”๊พธ๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค.
Object transfiguration
object๋ฅผ ํ˜•ํƒœ๊ฐ€ ๋น„์Šทํ•œ ๋‹ค๋ฅธ ๋ฌด์–ธ๊ฐ€๋กœ ๋ฐ”๊พธ๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค.
painting2photo
painting์„ photo๋กœ ๋ฐ”๊พธ๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค.
photo enhancement
์‚ฌ์ง„์„ ์„ ๋ช…ํ•˜๊ฒŒ ๋ฐ”๊พธ๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค.(DSLR์˜ style์„ ๋”ฐ๋ผํ–ˆ๋‹ค๊ณ  ๋ณด์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค)

Conclusion

์–ผ๋งˆ์ „์— GAN์˜ ์—ญ์‚ฌ์™€ ๋ฐœ์ „์ด๋ผ๋Š” ๊ธ€์„ ๋ณธ ์  ์žˆ์—ˆ๋Š”๋ฐ ์ƒ๋‹นํžˆ ํฅ๋ฏธ๋กœ์›Œ์„œ ์ฝ๊ฒŒ ๋œ ๋…ผ๋ฌธ cycleGAN์˜ ๋ฆฌ๋ทฐ์˜€์Šต๋‹ˆ๋‹ค. GAN์ด ์ฒ˜์Œ ๋“ฑ์žฅํ–ˆ์„ ๋•Œ๋งŒํผ์˜ ์ถฉ๊ฒฉ์€ ์•„๋‹ˆ์ง€๋งŒ, ๊ทธ๋ž˜๋„ unpaired image translation์—์„œ ํฐ ์—ญํ• ์„ ํ•ด ์ฃผ์—ˆ๋‹ค๊ณ  ์ƒ๊ฐ์ด ๋“ญ๋‹ˆ๋‹ค. ํŠนํžˆ application ์ชฝ์—์„œ ๊ต‰์žฅํžˆ ๋‹ค์–‘ํ•œ ์ž ์žฌ์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ๊ณ , ๊ทธ๊ฒƒ๋“ค์„ ์ฒ˜์Œ ๋ณด๋ฉด์„œ ์žฌ๋ฏธ์žˆ๊ฒŒ ์ฝ์„ ์ˆ˜ ์žˆ์—ˆ๋˜ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ํฅ๋ฏธ๋กœ์šด ์ฃผ์ œ์™€ ๋‹ค์–‘ํ•œ ์‘์šฉ์˜ ๋…ผ๋ฌธ์„ ์ฐพ๊ณ  ์‹ถ์œผ์‹œ๋ฉด์„œ GAN์— ๊ด€์‹ฌ์ด ์žˆ๋‹ค ํ•˜๋ฉด ์ฝ์–ด๋ณด์‹œ๋Š” ๊ฒƒ์„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค!