They just use SMPL for initial bare model fitting to NeuS format, and render it in three types of forms (Textureless, Normal, Color).
And they just apply SDS for head and body separately.
For head model, they use ControlNet to augment lots of images (which will be used to fine-tune the Head Diffusion Model). They mentioned this process is “to enhance the facial details”.