Pix2Pix

๋ถ„์•ผ
Image to Image Translation
๋ฆฌ๋ทฐ ๋‚ ์งœ
2020/09/29
๋ณธ ํฌ์ŠคํŠธ๋Š” ์ œ๊ฐ€ ํœด๋จผ์Šค์ผ€์ดํ”„ ๊ธฐ์ˆ  ๋ธ”๋กœ๊ทธ์— ๋จผ์ € ์ž‘์„ฑํ•˜๊ณ  ์˜ฎ๊ธด ํฌ์ŠคํŠธ์ž…๋‹ˆ๋‹ค.
๋ณธ ํฌ์ŠคํŠธ์—์„œ๋Š” ์ด์ „ ํฌ์ŠคํŠธ GAN์„ ํ™œ์šฉํ•˜์—ฌ image to image translation์—์„œ ๋ฒ”์šฉ์ ์œผ๋กœ ์“ฐ์ด๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ฒ˜์Œ์œผ๋กœ ์ œ์•ˆํ•œ ๋…ผ๋ฌธ์— ๋Œ€ํ•ด์„œ ๋ฆฌ๋ทฐํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์ด ํฌ์ŠคํŠธ๋Š” ์ด์ „ ํฌ์ŠคํŠธ GAN, U-Net์— ๋Œ€ํ•œ ์ดํ•ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ง„ํ–‰ํ•  ์˜ˆ์ •์œผ๋กœ, ์ด์ „ ํฌ์ŠคํŠธ์— ๋Œ€ํ•œ ๋‚ด์šฉ์€ย ์ด๊ณณ(GAN)๊ณผย ์ด๊ณณ(U-Net)์„ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ์ข‹์Šต๋‹ˆ๋‹ค. ๋ฆฌ๋ทฐํ•˜๋ ค๋Š” ๋…ผ๋ฌธ์˜ ์ œ๋ชฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
โ€œImage-to-Image Translation with Conditional Adversarial Networksโ€
๋…ผ๋ฌธ์— ๋Œ€ํ•œ ๋‚ด์šฉ์„ ์ง์ ‘ ๋ณด์‹œ๊ณ  ์‹ถ์œผ์‹  ๋ถ„์€ย ์ด๊ณณ์„ ์ฐธ๊ณ ํ•˜์‹œ๋ฉด ์ข‹์Šต๋‹ˆ๋‹ค.

Objective

๋…ผ๋ฌธ์—์„œ ๋ชฉ์ ์œผ๋กœ ํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์€ general-purposed image to image translation framework๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ถ€์—ฐ ์„ค๋ช…์„ ํ•˜์ž๋ฉด, grayscale to color, map to aerial, sketch to photo ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์˜ image to image translation์—์„œ๋„ ์ผ๋ฐ˜์ ์œผ๋กœ ๋™์ž‘ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฒ”์šฉ์ ์ธ framework์˜ ๊ตฌํ˜„์„ ๋ชฉ์ ์œผ๋กœ ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์ด๋Ÿฌํ•œ ๋ชฉํ‘œ๋ฅผ ์œ„ํ•ด ์„ ํ–‰์—ฐ๊ตฌ๋“ค์„ ์„œ์น˜ํ•œ ๊ฒฐ๊ณผ, euclidean distance์— ๋”ฐ๋ผ์„œ ์ „์ฒด predicted image์™€ ground truth ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ image๋ฅผ ์ตœ์†Œํ™”์‹œํ‚ค๊ธฐ ๋•Œ๋ฌธ์— blurryํ•œ ์ด๋ฏธ์ง€๊ฐ€ ๋‚˜์˜ค๋Š” CNN๋ณด๋‹ค๋Š”, real๊ณผ fake๋ฅผ ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์—†๊ฒŒ๋”์ด๋ผ๋Š” ํ•™์Šต ๋ชฉํ‘œ๋ฅผ ๊ฐ€์ง„ GAN์ด ์„ ๋ช…ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์—์„œ ๋…ผ๋ฌธ์—์„œ๋Š” GAN์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
ํ•˜์ง€๋งŒ, GAN์€ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ์ผ ๋ฟ์ด์ง€, input ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ๋‚ด์šฉ์„ ๋ฐ˜์˜ํ•˜์—ฌ ๋ณ€ํ™˜๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜๋Š” ์—†์—ˆ๊ธฐ์—, ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด GAN์— input image์— dependentํ•œ term์„ ์ถ”๊ฐ€ํ•œ conditional GAN์„ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.
๋˜ํ•œ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด์˜ image to image translation์˜ ๋…ผ๋ฌธ๋“ค๊ณผ ๋‹ค๋ฅด๊ฒŒ generator๋กœ U-Net ๊ตฌ์กฐ, ๊ทธ๋ฆฌ๊ณ  discriminator๋กœ patchGAN ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํšจ๊ณผ์ ์ธ ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

Conditional GAN(cGAN)

๊ธฐ์กด์˜ noise vector z๋กœ๋ถ€ํ„ฐ output vector y๋ฅผ ์ƒ์„ฑํ•ด๋‚ด๋Š” GAN๊ณผ๋Š” ๋‹ฌ๋ฆฌ conditional GAN(์ดํ•˜ cGAN)์˜ ๊ฒฝ์šฐ noise vector z ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ input vector x๋ฅผ ์ด์šฉํ•ด output vector y๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
์ด ๋•Œ๋ฌธ์— cGAN์˜ cost function๋„ GAN๊ณผ ํฌ๊ฒŒ ๋‹ค๋ฅผ ๊ฒƒ์€ ์—†์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ ํŠน์ง•์ ์œผ๋กœ ๋‹ค๋ฅธ ๊ฒƒ์€ discriminator๊ฐ€ real๊ณผ fake๋ฅผ ๊ธฐ์กด์—๋Š” generated๋œ distribution์— ๋Œ€ํ•œ ์ •๋ณด, ํ˜น์€ ground truth(ํ•™์Šต์— ์‚ฌ์šฉ๋  ๋ฐ์ดํ„ฐ) data image distribution์— ๋Œ€ํ•œ ์ •๋ณด ๊ฐ๊ฐ๋งŒ์œผ๋กœ๋งŒ ๊ตฌ๋ณ„ํ•ด ๋ƒˆ๋‹ค๋ฉด, cGAN์—์„œ๋Š” input image์— ๋Œ€ํ•œ distribution ์ •๋ณด๋„ discriminator๊ฐ€ real๊ณผ fake๋ฅผ ๊ตฌ๋ณ„ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
LcGAN(G,D)=Ex,y[logโกD(x,y)]+Ex,z[logโก(1โˆ’D(x,G(x,z)))]L_{cGAN}(G,D)=E_{x,y}[\log D(x,y)]+E_{x,z}[\log(1-D(x,G(x,z)))]
์•ž์„œ ๋ง์”€๋“œ๋ฆฐ ๊ฒƒ๊ณผ ๊ฐ™์ด discriminator์˜ ์—ฐ์‚ฐ์ด ๊ธฐ์กด์˜ GAN๊ณผ๋Š” ๋‹ฌ๋ฆฌ ๋‘ ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ•„์š”๋กœ ํ•จ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์•ž์„œ GAN ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ์—์„œ๋„ ์„ค๋ช…ํ–ˆ์ง€๋งŒ, discriminator๊ฐ€ real๋กœ ๊ตฌ๋ณ„ํ–ˆ์„ ๊ฒฝ์šฐ D๊ฐ’์„ 1, fake๋กœ ๊ตฌ๋ณ„ํ–ˆ์„ ๊ฒฝ์šฐ D๊ฐ’์„ 0์ด๋ผ ํ•˜๋ฉด, ์ฒซ ํ•ญ์€ real๋กœ, ๋‘˜์งธ ํ•ญ์€ fake๋กœ ๊ตฌ๋ณ„ํ•˜์—ฌ cost function์ด ์ตœ๋Œ€๊ฐ€ ๋˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ˜๋ฉด generator๋Š” discriminator๊ฐ€ ๊ตฌ๋ณ„ ๋ชปํ•˜๊ฒŒ ํ•˜๋Š” fake๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ด๊ธฐ ๋•Œ๋ฌธ์— discriminator๊ฐ€ ๋‘˜์งธ ํ•ญ์„ real๋กœ ๊ตฌ๋ณ„ํ•˜๊ธธ ์›ํ•˜๋ฉฐ, ์ด์— ๋”ฐ๋ผ cost function์ด ์ตœ์†Œ๊ฐ€ ๋˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.
LGAN(G,D)=Ey[logโกD(y)]+Ex,z[logโก(1โˆ’D(G(x,z))]L_{GAN}(G,D)=E_y[\log D(y)]+E_{x,z}[\log(1-D(G(x,z))]
๋…ผ๋ฌธ์—์„œ๋Š” unconditionalํ•œ ๊ฒฝ์šฐ์˜ cost function๋„ ์ •์˜ํ•˜์—ฌ ๋น„๊ต์˜ ์šฉ๋„๋กœ ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์œ„๋Š” ๊ทธ cost function์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์— ๋”ํ•ด ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์—์„œ GAN์— L2 loss๋ฅผ ์ถ”๊ฐ€ํ–ˆ์„ ๋•Œ ๋” ๋‚˜์€ ํ•™์Šต ํšจ๊ณผ๋ฅผ ๋ณด์ธ ๊ฒƒ์— ์ฐฉ์•ˆํ•˜์—ฌ ๊ทธ๊ฒƒ๊ณผ ๋น„์Šทํ•˜๋ฉด์„œ๋„, blurryํ•œ ์ด๋ฏธ์ง€๋ฅผ ๋œ ์ƒ์„ฑํ•ด ๋‚ด๋Š” L1 loss term์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
LL1(G)=Ex,y,z[โˆฃโˆฃyโˆ’G(x,z)โˆฃโˆฃ]L_{L1}(G)=E_{x,y,z}[\left||y-G(x,z) |\right|]
์ด๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์ตœ์ข…์ ์œผ๋กœ ์„ค๊ณ„ํ•œ generator์˜ objective๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
Gโˆ—=argโกminโกGmaxโกDLcGAN(G,D)+ฮปLL1(G)G^*=\arg\min_G\max_DL_{cGAN}(G,D)+\lambda L_{L_1}(G)

U-Net

๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•œ generator์˜ ๊ตฌ์กฐ๋Š” U-Net์„ ๋”ฐ๋ฅด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” output image์˜ resolution์„ ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ high resolution์„ ์ง€๋‹Œ input map์˜ ์ผ๋ถ€๋ฅผ ์ž˜๋ผ์„œ output decoder part์— concatenate ์‹œํ‚ค๋Š” ํ˜•ํƒœ์˜ ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ ๊ตฌํ˜„ํ•œ ์•„ํ‚คํ…์ฒ˜์ž…๋‹ˆ๋‹ค. ์ž์„ธํ•œ ์‚ฌํ•ญ์€ย U-Net๊ธ€์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

patchGAN

๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉํ•œ discriminator๋Š” patchGAN์˜ ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. patchGAN์€ ๊ธฐ์กด์˜ DCGAN๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ ์ด๋ฏธ์ง€์˜ patch ์กฐ๊ฐ์„ ๋ณด๊ณ  real/fake ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€์˜ ์ž‘์€ patch์— ๋Œ€ํ•ด์„œ ํŒ๋‹จํ•˜์—ฌ ๊ฐ patch ๋ณ„๋กœ์˜ real/fake ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•˜๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฏธ์ง€ ์ „์ฒด๋ฅผ ์—ฐ์‚ฐํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์—ฐ์‚ฐ์˜ ์ˆ˜๊ฐ€ ์ ๊ณ  ๋น ๋ฆ…๋‹ˆ๋‹ค.
์ด ๋ฐฉ์‹์„ ํ†ตํ•ด์„œ generator๋Š” ๊ฐ๊ฐ์˜ ์ด๋ฏธ์ง€ patch ์กฐ๊ฐ๋“ค์˜ ์ง„์œ„์—ฌ๋ถ€๋ฅผ ์†์ด๊ธฐ ์œ„ํ•ด์„œ ํ•™์Šตํ•˜๋Š” ๊ณผ์ •์ด ์ง„ํ–‰๋˜๊ณ , ๊ธฐ์กด์˜ ์ „์ฒด ์ด๋ฏธ์ง€๋ฅผ ์†์ด๊ธฐ ์œ„ํ•ด์„œ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•๋ณด๋‹ค output image๊ฐ€ ๋” high resolution์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

Experiments-Evaluation Metrics

๋…ผ๋ฌธ์—์„œ ๊ทธ๋“ค์˜ ์•„ํ‚คํ…์ฒ˜/๋ฐฉ๋ฒ•๋ก ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ ์‚ฌ์šฉํ•œ ๋ฐฉ๋ฒ•์€ ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€๋กœ ๋‚˜๋ˆ„์–ด์ง‘๋‹ˆ๋‹ค.
์ฒซ ๋ฒˆ์งธ๋กœ, Amazon Mechanical Turk(์ดํ•˜ AMT)๋ฅผ ์‹คํ–‰์‹œ์ผœ ์ƒ์„ฑํ•œ ์ด๋ฏธ์ง€์˜ real/fake ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋…ผ๋ฌธ์—์„œ๋Š” map generation, aerial photo generation, image colorization ๋“ฑ์„ ์ด ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด์„œ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
๋‘ ๋ฒˆ์งธ๋กœ, pre-trained ๋œ semantic classifier๋ฅผ ์ด์šฉํ•ด์„œ ์ƒ์„ฑํ•œ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ์ •ํ™•ํ•œ object๋ฅผ ๊ตฌ๋ณ„ํ•ด ๋‚ผ ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋‚ธ FCN-score๋ฅผ ์ด์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
์—ฌ๊ธฐ์„œ๋Š” ๋…ผ๋ฌธ์—์„œ ํŠน์ง•์ ์œผ๋กœ ์‚ฌ์šฉํ•œ ๊ตฌ์กฐ๋‚˜ ๋ฐฉ๋ฒ•๋ก ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ์ฒ™๋„์˜€๋˜ ๋‘ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•์— ์ฃผ๋ชฉํ•˜์—ฌ ์ง„ํ–‰ํ–ˆ๋˜ ํ‰๊ฐ€๋“ค์„ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

Evaluation-cGAN Objective Function

๋…ผ๋ฌธ์—์„œ๋Š” ์•ž์—์„œ ์„ค๋ช…ํ•œ cGAN cost function ์„ค๊ณ„๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ FCN-score๋ฅผ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.
FCN-scores for various cost functions
Generated output images
GAN์ด L1๊ณผ cGAN์— ๋น„ํ•ด์„œ ์„ฑ๋Šฅ์ด ์ข‹์ง€ ์•Š๋Š” ์ด์œ ๋Š” GAN์œผ๋กœ ์ƒ์„ฑํ•˜๋Š” ์ด๋ฏธ์ง€๋Š” input image ์™€ ์ƒ์„ฑ๋˜๋Š” output image๊ฐ„์˜ mismatch๋กœ ์ธํ•ด์„œ ๋ฐœ์ƒํ•˜๋Š” penalty ํ•ญ๋ชฉ์„ loss์— ํฌํ•จํ•˜๊ณ  ์žˆ์ง€ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
๊ทธ ์™ธ์— L1๋งŒ ์‚ฌ์šฉํ•œ ๊ฒƒ์€ blurryํ•œ image๋ฅผ ๋งŒ๋“ค๊ณ , ์ด์ „์˜ ์„ ํ–‰์—ฐ๊ตฌ์—์„œ ๋ฐํ˜€์ง„ ๋ฐ”์ฒ˜๋Ÿผ cGAN๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ๋•Œ cGAN๋งŒ ์‚ฌ์šฉํ•  ๋•Œ๋ณด๋‹ค visual artifcat๊ฐ€ ์ ๊ฒŒ ๋‚˜ํƒ€๋‚จ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Evaluation-Generator(U-Net)

๋…ผ๋ฌธ์—์„œ๋Š” ์•ž์—์„œ ์„ค๋ช…ํ•œ U-Net ๊ตฌ์กฐ๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ FCN-score๋ฅผ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.
FCN-scores for generator strucuture
Generated output images
ํ‘œ์—์„œ ๋ณด์ด๋Š” ๋ฐ”์™€ ๊ฐ™์ด U-Net์˜ ๊ฒฝ์šฐ๊ฐ€ ๊ธฐ์กด์˜ encoder-decoder ๊ตฌ์กฐ๋ณด๋‹ค ๋” ๋†’์€ FCN-score๋ฅผ ๋ณด์ž„์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์•ž์„œ ์„ค๋ช…ํ•œ ๊ฒƒ ์ฒ˜๋Ÿผ skip-connection์„ ํ†ตํ•ด encoder ๋ถ€๋ถ„์˜ high resolution์„ decoder ๋ถ€๋ถ„์— ์ „ํ•ด์ฃผ์—ˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
๊ทธ๋ฆผ์—์„œ ๋ณด์ด๋Š” ๊ฒƒ์ฒ˜๋Ÿผ U-Net์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ๊ฐ€, L1+cGAN์˜ ๋ณตํ•ฉ์ ์ธ cost function์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ๊ฐ€ ๋” ๊ณ ํ•ด์ƒ๋„์˜ ์„ ๋ช…ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•ด๋ƒ„์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

Evaluation-Discriminator(patchGAN)

๋…ผ๋ฌธ์—์„œ๋Š” ์•ž์—์„œ ์„ค๋ช…ํ•œ pathGAN ๋ฐฉ๋ฒ•๋ก ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ FCN-score๋ฅผ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.
FCN-scores for discriminator patch size
Generated output images
ํ‘œ์—์„œ ๋ณด์ด๋Š” ๋ฐ”์™€ ๊ฐ™์ด 70x70์˜ patch size๋กœ ๋‚˜๋ˆ„์–ด ์ง„์œ„์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•œ ๊ฒฝ์šฐ๊ฐ€ ๊ธฐ์กด์˜ 1x1์˜ pixelGAN์ด๋‚˜ 286x286์˜ imageGAN ๊ตฌ์กฐ๋ณด๋‹ค ๋” ๋†’์€ FCN-score๋ฅผ ๋ณด์ž„์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์•ž์„œ ์„ค๋ช…ํ•œ ๊ฒƒ ์ฒ˜๋Ÿผ ์ ์ ˆํ•œ size์˜ patch๋กœ ์ด๋ฏธ์ง€๋ฅผ ๋‚˜๋ˆ„์–ด discriminate ๊ณผ์ •์„ ์ง„ํ–‰ํ•  ๊ฒฝ์šฐ localization๊ณผ context extraction์— ์žˆ์–ด์„œ ๊ทน๊ฐ’๋“ค๋ณด๋‹ค ์œ ๋ฆฌํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
๊ทธ๋ฆผ์—์„œ ๋ณด์ด๋Š” ๊ฒƒ์ฒ˜๋Ÿผ 70x70์˜ patch size์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ๊ฐ€, L1์ด๋‚˜ ์ด๋ฏธ์ง€ ์ „์ฒด ํฌ๊ธฐ์˜ patch ๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ๋ณด๋‹ค ์ „์ฒด์ ์œผ๋กœ๋„, ์„ธ๋ถ€์ ์œผ๋กœ๋„ ์„ ๋ช…ํ•œ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด๋ƒ„์„ ๋ณผ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

Conclusion

์ด๊ฒƒ์œผ๋กœ ๋…ผ๋ฌธย โ€œImage-to-Image Translation with Conditional Adversarial Networksโ€์˜ ๋‚ด์šฉ์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ์š”์•ฝํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€ ๋ณ€ํ™˜์—์„œ ์ž์ฃผ ๋ด์˜ค๋˜ ๋…ผ๋ฌธ์ธ๋ฐ ์ด ๊ธฐํšŒ์— ์ฝ๊ฒŒ ๋˜์–ด ์ข‹์•˜๋˜ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
์—ฌ๊ธฐ์„œ ๋‹ค๋ฃจ์ง€๋Š” ์•Š์•˜์ง€๋งŒ AMT๋ฅผ ์‚ฌ์šฉํ•ด ์ง„ํ–‰ํ•œ ํ‰๊ฐ€๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ ์—ฌ๊ธฐ์„œ ๋„์ถœ๋œ ๊ฒฐ๊ณผ๋Š” ๋…ผ๋ฌธ์˜ ์•„ํ‚คํ…์ฒ˜/๋ฐฉ๋ฒ•๋ก ์ด ๋น„๊ต๋Œ€์ƒ๋“ค ์ค‘์— ๊ฐ€์žฅ ์ข‹์€ ๊ฒฐ๊ณผ๊ฐ€ ์•„๋‹ˆ์—ˆ์œผ๋ฉฐ, ๋…ผ๋ฌธ์˜ ํฐ ํ๋ฆ„๊ณผ๋Š” ๋‹ค๋ฅธ ๊ฒฐ์„ ์„ค๋ช…ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ œ์™ธํ–ˆ์Šต๋‹ˆ๋‹ค. ๊ด€์‹ฌ ์žˆ์œผ์‹  ๋ถ„์€ ํ•œ ๋ฒˆ์ฏค ์ฝ์–ด๋ณด์…”๋„ ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.