Computer Vision Final Project

Topic Selection

기본이 되는 선행연구가 있고, 해당 선행연구의 Github 코드가 공개되어 바로 사용해볼 수 있는 프로젝트

보여줄 수 있는 시각적인 결과가 자극적인 프로젝트

응용가능성이 있는 프로젝트

Candidates

1. 3D Moments from Near-Duplicate Photos

3D Moments from Near-Duplicate Photos

@inproceedings{wang2022_3dmoments, title = {3D Moments from Near-Duplicate Photos}, author = {Wang, Qianqian and Li, Zhengqi and Salesin, David and Snavely, Noah and Curless, Brian and Kontkanen, Janne}, booktitle = {CVPR}, year = {2022} }

https://3d-moments.github.io/

•

근접한 두 이미지를 입력으로 받아 자연스러운 3D scene motion 을 형성한 프로젝트입니다.

•

Feature Extraction, Image Synthesis 릁 수행하는 네트워크를 학습하기 위해 필요한 Training Data 가 camera parameters 를 알고 있는 두 장의 입력 이미지 (처음, 끝) 와 두 입력 이미지 사이 시각에 찍힌, ground truth 로 활용될 이미지인데, 일반적으로 구성하기는 어려워 보입니다. 

◦

논문에서는 실제 비디오에서 모션 변화가 적은 두 지점 사이를 이용해, 카메라가 고정되고 모든 변화는 물체에 의해서만 일어난다고 가정하고 사용한 데이터셋과 Mannequin Challenge dataset 을 합쳐서 데이터셋을 구성합니다.

•

연구적인 측면에서 디벨롭할 수 있는 부분

◦

논문에서는 Limitation 으로 두 이미지간의 큰 차이나 occlusion 에 취약하다는 것을 제시하고는 있는데, 해결방안이 떠오르지는 않습니다.

•

서비스적인 측면에서 디벨롭할 수 있는 부분

◦

비슷한 뷰의 다른 영상의 앞 뒤를 자연스럽게 이어주는 영상편집 기술에 활용할 수 있어 보입니다.

◦

Motion Interpolation 방법론으로 사용하여 비디오의 frame rate 를 늘릴 수 있어보입니다.

2. Sound-guided Semantic Image Manipulation

Sound-Guided Semantic Image Manipulation

cvpr2022-7227

https://kuai-lab.github.io/cvpr2022sound/

•

이미지를 gudance 를 바탕으로 manipulation 을 하는 연구인데, 일반적으로 guidance 로 텍스트를 많이 사용하는 추세와는 달리 소리를 통해서 이미지를 manipulation 하는 프로젝트입니다.

•

신기한 점은 Text-based Image Stylization 을 하는 StyleGAN2 와 비교를 했을 때 해당 논문의 결과가 더  semantic 한 의미를 잘 반영했다는 점입니다.

•

연구적인 측면에서 디벨롭할 수 있는 부분

◦

이전에 봤던 논문 blended-diffusion 에서 처럼 Mask 를 주면 전체가 아닌 로컬 영역에 대한 편집이 가능해보입니다. 

◦

논문의 성능이 어느정도인지 감이 안오지만, 소리에 편집을 위한 구체적인 gudiance 가 주어져도 잘 동작하도록 하는 것도 생각해볼만 한 것 같습니다.

•

서비스적인 측면에서 디벨롭할 수 있는 부분

◦

(가능할지는 모르겠지만…) 동영상의 소리와 시작 이미지만으로 동영상의 소리와 매칭되는 어느정도 의미있는 짧은 영상을 만들어낼 수 있지 않을까- 싶습니다. (ex. 동물과 짖는소리)

Something Else

•

결과물이 재미있어 보이는 프로젝트들을 아카이빙했습니다.

•

따로 디벨롭할 수 있는 방법이 떠오르지 않아서 여기에 둡니다.

Stylized Neural Painting

jiupinjia.github.io

https://jiupinjia.github.io/neuralpainter/

TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations

yzhouas.github.io

https://yzhouas.github.io/projects/TransFill/index.html

Simulating Fluids in Real-World Still Images

Simulating Fluids in Real-World Still Images

We have witnessed great progress in physical-based simulation and neural video generation for animating fluids. However, the two types of methods suffer from different drawbacks. The physical-based simulation methods are built upon manual-designed environments with specific materials, motion trajectories and textures, thus are only capable of animating particular fluids in synthetic scenarios.

https://slr-sfs.github.io/

Zero-Shot Text-Guided Object Generation with Dream Fields

Zero-Shot Text-Guided Object Generation with Dream Fields

CVPR 2022 and AI4CC 2022 (Best Poster) UC Berkeley, Google Research We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D objects solely from natural language descriptions. Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision.

https://ajayj.com/dreamfields

•

이 친구는 리뷰한 적 있는 친구인데, 무려 Zero-Shot 이라 데이터셋이 필요가 없고 결과가 인상깊어 가져와봅니다.

NED

TediGAN

StyleCLIP

IDE-3D

GaNimation

SC-FEGAN

Toward Fine-grained Facial Expression Manipulation

ExprGAN

CPM

Oh-My-Face

Encoder4Editing

Expression-manipulator

Environment Setup

conda create -n 3DMotion python=3.6
conda install -c conda-forge tensorflow-gpu=1.15
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install pytorch3d -c pytorch3d

pip install kornia==0.5.10

pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git

pip install imageio opencv-python configargparse scipy
pip install --ignore-installed certifi
pip install timm scikit-learn gdown imageio-ffmpeg

pip install tensorflow-gpu==1.15.2
pip install dlib matplotlib
Python
복사

Pretrained Model Setup

cd Moments3D
./download.sh
Python
복사

PATH="/usr/local/cuda-10.0/bin${PATH:+:${PATH}}"

LD_LIBRARY_PATH="/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"

conda create -n 3DMotion python=3.6

pip install tensorflow-gpu==1.15.2
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install pytorch3d -c pytorch3d

pip install ftfy regex tqdm
pip install imageio opencv-python configargparse scipy
pip install timm scikit-learn gdown imageio-ffmpeg
pip install dlib matplotlib
pip install kornia==0.5.10

pip install git+https://github.com/openai/CLIP.git
Python
복사

CUDA_VISIBLE_DEVICES=0 python run.py --input_dir='stylized_soyeon.png' --output_dir='output.jpg' --option_beta=0.15 --option_alpha=2.1 --option_gamma=3 --neutral='face' --target='face with smile'

해야 할 것들

마감일: 12월 12일

Poisson Blending

GitHub - PPPW/poisson-image-editing

This is a Python implementation of the poisson image editing paper to seamlessly blend two images. For example, I'd like to put the Statue of Liberty under the shinning ocean. The right side compares the results from direct copying and the Poisson image editing.

https://github.com/PPPW/poisson-image-editing

•

Facial Expression 바꾼 영역을 기존 이미지에 자연스럽게 치환해서 넣는 작업

Background Changing 이슈

•

먼저 Background 를 제거 후 Facial Expression 을 바꾼 이미지를 생성함

•

Facial Expression 변경 전 이미지와 후 이미지에서 Background Region 에 or 연산을 취한 부분을 Stylize 하여 둘 다 적용함  (둘 중 하나라도 Background 인 영역을 선택하여 Stylize)

◦

혹은 그냥 Foreground 영역을 빈 채로 (검은색 영역으로 지정) 하고 Stylize 해봐도 될듯 함.

•

Stylize 한 Background 에 각 Foreground 를 붙여 넣음.

•

적용한 뒤에 생긴 부분에 또 다시 Poisson Blending 을 적용해 자연스럽게 배경과 이어붙여봄

다양한 실험결과 뽑기

•

다양한 Text-based Facial Expression 이 가능함을 보여주고

•

다양한 Text-based Masked Region Stylization 이 가능함을 보여주고

•

Manual 한 영역을 배경 뿐만 아니라 다양하게 시도해보아도 괜찮을듯 함… (이슈가 있을수도!?)

발표 비디오 촬영

•

영어로 발표

•

모두가 발표에 참여해야 함

•

8 ~ 10 분 사이 (Strict) → 보통 학회 Oral 이 5분

◦

Introduction (problem definition and motivation

◦

Background and related work

▪

◦

Methods

▪

◦

Experiments

▪

◦

Conclusion (discussion and future work)

▪

•

얼굴이 나오지 않아도 됨 (대본 읽어도 될듯,,,)

•

시연을 해서 잘 정제된 데이터가 아니라 Real-World 데이터에 대해서도 잘 되는 것을 보여주면 평가에 좋다고 이야기를 하심.

마감일: 12월 16일

Project Report 작성

•

CVPR style

•

4 ~ 8 page

•

필수 요소

◦

Introduction (problem definition and motivation)

◦

Background and related work

◦

Methods

◦

Experiments

◦

Conclusion (discussion and future work)

Abstract + Introduction + Related Works + Discussion + Future Works

Methods

Expreiments + Conclusion

하면 괜찮을 것들

Depth Estimation 등으로 background 등을 code-base 로 mask 를 뽑는 과정

•

Manual 한 영역도 가능 + Background 는 code-base 로 가능하다고 주장할 수 있음.

Camera Motion (Path) 다양하게 만들어보기 (이슈가 있을수도 있음)

Final Project Script