🔥

Final Project Script

Slide 16

Before we look at the results, let’s watch the demo video showing how our work works.
The image shown is the base, input image which we want to make 3D Motion Portait on.
The first process of our pipeline is generating background region mask and stylizing the region using CLIPstyler. As you can see, we set 400 iterations for updating input image based on CLIP similarity.
After stylizing the background, we create a near-duplicated photo with facial expression edited using pretrained StyleCLIP model.
Finally, 3D Motion Portrait is generated using two images, both of their background are stylized and one of them is facial-edited. The 60 iteration shown is the rendering process of time-variying image from Point Cloud to generate each frame of video. Then, the frames are connected to make video. The resulting video is as you can see.

Slide 17

These 3D Motion Portraits are our main results. All of the images are facial-edited using the text “Face with smile”.
The first row of images are stylized using the text “Night Sky”, and the second row of images are stylized using the text “Cherry Blossom”.
Our results well-expresses the semantic meaning of each stylization and face-editing text.

Slide 18

Not only koreans, our works also can be used on black or white people.
Also, not only smile, we can express various emotions such as anger which shown in second column.

Slide 19

In default, our work automatically generates background region mask for stylization. However, if the manually-generated mask is given, our work can be applied to stylize any regions user want.

Slide 20

However, our work has some limitations.
First, our automatic mask generation is quite imperfect. As shown in the slide, background are sometimes not detected between thin objects such as hair and fingers. Also the discontinuity between foreground and background region sometimes make image unnatural.
We are thinking about applying fine edge detection method such as applying hysteresis on our implementation to deal with the problem.

Slide 21

Next, our facial edit implementation doesn’t work on side-faced images which cannot be fully alinged to FFHQ landmark format. As shown in the slide, even we gave facial edit text as “face with smile”, the resulting face was not smiling at all.
We are thinking about replacing StyleCLIP model to other which have aboloty for recognize detail composition of face parts.

Slide 22

In conclusion, our work successfully generates 3D Motion Portrait with single RGB facial image.
Our work can express various emotions on base image, and can decorate the wanted region user want. Also our work can widely applied on people of various races.

Slide 23

Thank you for Listening and this is the end of our presentation