As far as I understand:
There are a number of "diffusion models" created by commercial entities costing $x millions of gpu time. Stability AI released Stable Diffusion for "free" and all the hobbyist stuff started from there.
Techniques were found to "Finetune" a model, building on an existing base Diffusion model to specialize it's output or improve some aspects. This requires a significant effort in image curation and labeling, and then still needs large hardware investment (80 hours of 8 x A100 for example).
Then there is LORA creation (which to me is still somewhat mysterious) that allows an additional focusing of end stage diffusiuon model output. this requires a few hundred labelled images and can theoretically be done with a beefy home setup (e.g. 4x3090).
- Why do only a few people use 3D scenes with humanoid 3D models to create their own stable diffusion datasets to train their own models?
I guess because it's a lot of effort to get the training images sorted out, and you need significant hardware / time to learn how to setup and run the training using cloud GPU resources.
And for what benefit? You'll get a LORA that can make your e.g. PonyXL output look fairly close to a specific Daz rendered character. But you'll still have all the normal diffusion problems of hands / faces / everything looks weird / inconsistent clothing and backgrounds. It will take plenty of time and post-work to get good results from your model.
If you have the skills to make a good set of training images with e.g., Daz, you are probably good enouigh to just use that process for ALL your images. Even put them through a light-touch diffusion img2img process to get the AI look if thats what you want.