VR/AR Assignment

Submitter: 2017-12433 Hyeonwoo Kim from Dept. Electric & Computer Engineering

The opensource library I used:

[CVPR 2023] RealFusion: 360° Reconstruction of Any Object from a Single Image

What’s the Algorithm (Goal, Input, Output)

The Goal of the RealFusion is to generate 3D implicit Neural Field (in this case, NeRF) using only the single image input. Thus, the input is only the single image containing object which we want to convert into 3D, and the output is optimized NeRF.

Method Overview

The method of RealFusion is to utilizing the prior of Image Diffusion Model with Textual Inversion. Textual Inversion is applied to given single image, and extract the token embedding of the specific object. Using score distillation loss based of 2D Diffusion Model with rendered image (using previous token) can construct NeRF. Whole pipeline is similar with previous research DreamFusion, but the core difference is leveraging Textual Inversion as a method for generating specific shape of object shown in single image.

How can the Method be used for AR/VR

RealFusion can be used to generate user-wanted 3D in VR/AR space. Since there are many methods for converting NeRF to mesh, VR/AR developer or user can upload 3D model into their VR/AR space. Recent single image to 3D methods are still lacking in stability and fidelity, but this kinds of techniques can significantly reduce the labor of developers and designers.

Run the demo with your own videos/images

Input Image

Input Mask

I used my own laptop (macbook) image for my own image. Then, I applied SAM (Segment-Anything Model) to extract the mask of the interested region. Using theses images, I can run the demo for reconstructing 3D NeRF and extract mesh.

Submit the results with descriptions

Azimuth (

+30\degree

) View

Frontal View

Azimuth (

-30\degree

) View

Above figures are the extracted mesh of the demo. RealFusion tried to express the details of the laptop (e.g., Apple logo, stickers) by specifying its shape & texture with Textual Inversion but the details were not appeared in good quality. I’m thinking this is the issue of using NeRF resolution as

96\times 96

in default setting (since the repository uses both diffusion and NeRF training, it easily ran out of memory with low resolution). However, RealFusion can be still useful to easily move real world object into VR/AR world.