[Stable Diffusion] Prompt Sharing and Learning Thread

Gtg Gtg · Oct 28, 2023

Hi everyone, I was wondering if anybody knew the settings for kohya_ss for getting a realistic person, so face, body, everything? I tried many methods for "realism" online but they seem to fall short. I have a dataset of literally ~2000+ HQ images, some are professional images, while some are with phone cameras.
My current settings are here:
Batch 16 : Epoch 20 : Clip Skip 1 : all Learning Rates @ 0.0001 : Cosine Adafactor : No Warmup : 128 network/ 128 alpha
folder on 5 repeats e.g. 5_name

the 16 batch number was explained to me that since some of the images of my dataset is similar, I can make it 8, 16, 32 even and such

128 network so that I capture a lot of detail, and 128 alpha i'm not too sure, but I heard that the lower it is, the more flexible the AI?

devilkkw · Oct 28, 2023

Gtg Gtg said:
Hi everyone, I was wondering if anybody knew the settings for kohya_ss for getting a realistic person, so face, body, everything? I tried many methods for "realism" online but they seem to fall short. I have a dataset of literally ~2000+ HQ images, some are professional images, while some are with phone cameras.
My current settings are here:
Batch 16 : Epoch 20 : Clip Skip 1 : all Learning Rates @ 0.0001 : Cosine Adafactor : No Warmup : 128 network/ 128 alpha
folder on 5 repeats e.g. 5_name

the 16 batch number was explained to me that since some of the images of my dataset is similar, I can make it 8, 16, 32 even and such

128 network so that I capture a lot of detail, and 128 alpha i'm not too sure, but I heard that the lower it is, the more flexible the AI?

for character i get good result using realistic model for train, and Learning Rates default, with 1 epoch, 120 repeat (120_name) with 96 alpha and network size based on number of image.(add 10 for every image, for example if i use 8 image, my network size is 80).
Also if you want to train facial features, using less image( 2000 is too high) where the subject face is visible and view from different angle. remember to do good tag.
i usually use not over 12 image, and result is pretty good. use image with varius background give better result.

me3 · Oct 28, 2023

Gtg Gtg said:
Hi everyone, I was wondering if anybody knew the settings for kohya_ss for getting a realistic person, so face, body, everything? I tried many methods for "realism" online but they seem to fall short. I have a dataset of literally ~2000+ HQ images, some are professional images, while some are with phone cameras.
My current settings are here:
Batch 16 : Epoch 20 : Clip Skip 1 : all Learning Rates @ 0.0001 : Cosine Adafactor : No Warmup : 128 network/ 128 alpha
folder on 5 repeats e.g. 5_name

the 16 batch number was explained to me that since some of the images of my dataset is similar, I can make it 8, 16, 32 even and such

128 network so that I capture a lot of detail, and 128 alpha i'm not too sure, but I heard that the lower it is, the more flexible the AI?

If you're training just one person you don't need too many images, 15-30 is more than enough, so not sure what those 2k plus images are, but you really don't want to use that many. If they all are of the same person you'd pick the one that's most "clear" were you don't have things like shadows across the face etc, or weird facial expressions that can cause deformations in the trained face. If the images are the same person with different hair styles you can split it into different folders and name accordingly and have different triggers.

Assuming you're using standard lora, which you more than likely should be.
Learning rates, repeats and epoch will depend on what optimizer you use. With something like AdamW you want to have more repeats and less epochs. IE with 20 images you'd have something like 40-60 repeats and 2-3 epochs, it might be fine with just 1, but that depends on images, subject etc. You'll need some trial and error anyway so running it for an extra epoch isn't that big a deal.
If you use something from the dadapt/prodigy "family" you want less repeats and more epochs. Generally that might mean just 1-2 repeats and a lot of epochs. You also need a bunch of optimizer arguments.

Rank and alpha. While it's true that rank is the "size" of the lora, just think about it logically. A model contains insane amounts of "things" and those are just 2gb. Even if loras would be far less optimized there should be no need for just one "simple" person to need 128mb.
Having done A LOT of testing recently trying to figure out what's broken with kohya, one of the things i've found is that you can train a realistic quality "person" perfectly fine on rank 8. It was failing badly for me and for no other reason than to "just try" i dropped rank from 32 to 8 and likeness and quality increased drastically. Higher rank seems to allow too much additional data to be picked up and "contaminate" the training, too much junk to clear out to keep that details you are interested in. Observed result more than documented fact, but still.
Alpha is more of a limiter on learning, but i haven't really found a exact positive or negative consistent effect so far.

You didn't specify if this was on sd1.5 or xl, that has a huge impact on learning rates.
If it's for sd1.5 my suggested starting setup would be to pick 20-30 images, use a simple blip style captioning (long descriptive text and not comma separated tags). Around 40 repeats and 3 epochs, save each epoch for testing so you can see the progress. Use constant learning rate and adamw/adamw 8bit for simplicity. Network rank and alpha at 8 and 1 to start with.
For learning rate, batching is a bit of a variable. If you use batching you often have to increase your learning rate by a similar amount, i'm not sure if it's completely linear, but given you can run batch at 16 it shouldn't take you long to run this even at batch 1.
So use 0.0001 as learning rate and 0.00005 for text rate and run it at batch 1 just to start. Given it would take me over an hour to do this, it should be far far less of you.

Not sure what model you're planing to train on, but it seems like some "people" don't train well on some models, so even if you get good results no one model with one person it might not work as well for another. I've gotten good results on both deliberate and cyber realistic, but some times it's worked well on just one of them and failed completely on the other.

Lots of text and i might have forgotten something, but hopefully nothing too important.

(edit, knew there was something...)
For realistic images you might need to use regularization images, but try without them first as there's a bit of work involved using them. Can deal with that issue later if need be as it's more of a final tweak than absolute need.

Random test image from one of my test trainings, just to keep ppls attention.

sharlotte · Oct 29, 2023

For LORA training on SDXL, I've stuck to the tips indicated on Kohya Github page which is to use the adafactor optimizer:

You must be registered to see the links

Anyone else has tried something different where it worked?

me3 · Oct 29, 2023

Dragon born? Dragonkin?

Gtg Gtg · Oct 30, 2023

me3 said:
If you're training just one person you don't need too many images, 15-30 is more than enough, so not sure what those 2k plus images are, but you really don't want to use that many. If they all are of the same person you'd pick the one that's most "clear" were you don't have things like shadows across the face etc, or weird facial expressions that can cause deformations in the trained face. If the images are the same person with different hair styles you can split it into different folders and name accordingly and have different triggers.

Assuming you're using standard lora, which you more than likely should be.
Learning rates, repeats and epoch will depend on what optimizer you use. With something like AdamW you want to have more repeats and less epochs. IE with 20 images you'd have something like 40-60 repeats and 2-3 epochs, it might be fine with just 1, but that depends on images, subject etc. You'll need some trial and error anyway so running it for an extra epoch isn't that big a deal.
If you use something from the dadapt/prodigy "family" you want less repeats and more epochs. Generally that might mean just 1-2 repeats and a lot of epochs. You also need a bunch of optimizer arguments.

Rank and alpha. While it's true that rank is the "size" of the lora, just think about it logically. A model contains insane amounts of "things" and those are just 2gb. Even if loras would be far less optimized there should be no need for just one "simple" person to need 128mb.
Having done A LOT of testing recently trying to figure out what's broken with kohya, one of the things i've found is that you can train a realistic quality "person" perfectly fine on rank 8. It was failing badly for me and for no other reason than to "just try" i dropped rank from 32 to 8 and likeness and quality increased drastically. Higher rank seems to allow too much additional data to be picked up and "contaminate" the training, too much junk to clear out to keep that details you are interested in. Observed result more than documented fact, but still.
Alpha is more of a limiter on learning, but i haven't really found a exact positive or negative consistent effect so far.

You didn't specify if this was on sd1.5 or xl, that has a huge impact on learning rates.
If it's for sd1.5 my suggested starting setup would be to pick 20-30 images, use a simple blip style captioning (long descriptive text and not comma separated tags). Around 40 repeats and 3 epochs, save each epoch for testing so you can see the progress. Use constant learning rate and adamw/adamw 8bit for simplicity. Network rank and alpha at 8 and 1 to start with.
For learning rate, batching is a bit of a variable. If you use batching you often have to increase your learning rate by a similar amount, i'm not sure if it's completely linear, but given you can run batch at 16 it shouldn't take you long to run this even at batch 1.
So use 0.0001 as learning rate and 0.00005 for text rate and run it at batch 1 just to start. Given it would take me over an hour to do this, it should be far far less of you.

Not sure what model you're planing to train on, but it seems like some "people" don't train well on some models, so even if you get good results no one model with one person it might not work as well for another. I've gotten good results on both deliberate and cyber realistic, but some times it's worked well on just one of them and failed completely on the other.

Lots of text and i might have forgotten something, but hopefully nothing too important.

(edit, knew there was something...)
For realistic images you might need to use regularization images, but try without them first as there's a bit of work involved using them. Can deal with that issue later if need be as it's more of a final tweak than absolute need.

Random test image from one of my test trainings, just to keep ppls attention. View attachment 3040533

Thanks for the info I'll try out your setup and do test, now I need to go through the 2000+ images and choose the best 30? Or the best 30 while having variations? like different poses, lighting conditions and whatnot? The person I'm trying to replicate is a cosplayer, so I thought I would need as much data as possible to get the cosplays down but I guess not.

hkennereth · Oct 30, 2023

Gtg Gtg said:
Thanks for the info I'll try out your setup and do test, now I need to go through the 2000+ images and choose the best 30? Or the best 30 while having variations? like different poses, lighting conditions and whatnot? The person I'm trying to replicate is a cosplayer, so I thought I would need as much data as possible to get the cosplays down but I guess not.

As someone who have trained plenty of Dreambooth and LoRA models based on cosplayers, I can advice you to stick with the most natural looking pictures, and avoid the actual cosplay ones. You want variety, yes, but of lighting conditions, poses, environment, etc. If you try to use pictures with heavy makeup or clothing that occludes too much of their natural body shape you will have a hard time getting consistency from the model. The point is to teach the model how to draw that person, not how they look in cosplay; that you'll get from prompting.

Also avoid images that are too dark, too low resolution, too grainy, etc. You want high quality, not high volume. You'll need images cropped to 512px square if you're training for Stable Diffusion 1.5 models, and 1024px square if you're training for SDXL, so make sure that the pictures have big enough resolution so that when you crop the desired area from them, the resulting image is already equal or larger than those resolutions, meaning NEVER UPSCALE your source images.

And yes, you probably want to stick with about 30-50 pictures total, more or less in a proportion of 60% of headshots, 35% of medium shots (waist up), and 15% of full body shots -- that's a good ratio for images in my experience.

me3 · Oct 30, 2023

hkennereth said:
...
And yes, you probably want to stick with about 30-50 pictures total, more or less in a proportion of 60% of headshots, 35% of medium shots (waist up), and 15% of full body shots -- that's a good ratio for images in my experience.

If there's a bit of a problem finding images of one "type", you can help get that balance with splitting the images into different folders and adjusting the repeats. That said you probably want to avoid repeating just 1-2 images too many times in comparison to the rest as you might end up locked into that exact image.

hkennereth · Oct 30, 2023

me3 said:
If there's a bit of a problem finding images of one "type", you can help get that balance with splitting the images into different folders and adjusting the repeats. That said you probably want to avoid repeating just 1-2 images too many times in comparison to the rest as you might end up locked into that exact image.

Indeed. If you do want to use the same image, for example, for full body and face crop, I would recommend flipping one of them horizontally, which tends to help the training process see it as something new. Of course that is more helpful if the person is shot from the side or at 3/4, but it won't help much if they are directly facing the camera. But I would avoid that unless strictly necessary.

Gtg Gtg · Oct 31, 2023

hkennereth said:
As someone who have trained plenty of Dreambooth and LoRA models based on cosplayers, I can advice you to stick with the most natural looking pictures, and avoid the actual cosplay ones. You want variety, yes, but of lighting conditions, poses, environment, etc. If you try to use pictures with heavy makeup or clothing that occludes too much of their natural body shape you will have a hard time getting consistency from the model. The point is to teach the model how to draw that person, not how they look in cosplay; that you'll get from prompting.

Also avoid images that are too dark, too low resolution, too grainy, etc. You want high quality, not high volume. You'll need images cropped to 512px square if you're training for Stable Diffusion 1.5 models, and 1024px square if you're training for SDXL, so make sure that the pictures have big enough resolution so that when you crop the desired area from them, the resulting image is already equal or larger than those resolutions, meaning NEVER UPSCALE your source images.

And yes, you probably want to stick with about 30-50 pictures total, more or less in a proportion of 60% of headshots, 35% of medium shots (waist up), and 15% of full body shots -- that's a good ratio for images in my experience.

I'm just curious now about the regularisation images, do I really need them? what should they be? how do I set it up? how many of these do I even need? so when I do repeats like 50_filename for images folder, I would do the same for regularisation folder? I've never actually touched on regularisation that much and was just shoving random images

hkennereth · Oct 31, 2023

Gtg Gtg said:
I'm just curious now about the regularisation images, do I really need them? what should they be? how do I set it up? how many of these do I even need? so when I do repeats like 50_filename for images folder, I would do the same for regularisation folder? I've never actually touched on regularisation that much and was just shoving random images

I have trained models with and without regularization images, and honestly it's hard to really see any major difference that you can point as a direct result of using them. It's an optional step, and I'd say you can skip it; all my LoRAs were trained without it, and I can safely say that whenever I got a bad model it was due to my source images, not the lack of regularization images.

That said, if you do want to use them, I would recommend just finding a pack of regularization images somewhere online and using that, it's not worth the trouble to try to create a set of images yourself.

When training LoRAs for SD 1.5, I mostly followed this particular tutorial to the letter, and got some great results:

You must be registered to see the links

Gtg Gtg · Oct 31, 2023

hkennereth said:
I have trained models with and without regularization images, and honestly it's hard to really see any major difference that you can point as a direct result of using them. It's an optional step, and I'd say you can skip it; all my LoRAs were trained without it, and I can safely say that whenever I got a bad model it was due to my source images, not the lack of regularization images.

That said, if you do want to use them, I would recommend just finding a pack of regularization images somewhere online and using that, it's not worth the trouble to try to create a set of images yourself.

When training LoRAs for SD 1.5, I mostly followed this particular tutorial to the letter, and got some great results:

You must be registered to see the links

with your previous advice on the 512x squares, sometimes its hard to find full body images, how do you deal with it? the cosplayer I'm doing usually does vertical images and rarely ever does full body images.

hkennereth · Oct 31, 2023

Gtg Gtg said:
with your previous advice on the 512x squares, sometimes its hard to find full body images, how do you deal with it? the cosplayer I'm doing usually does vertical images and rarely ever does full body images.

You do the best with what you got. If the person doesn't have full body pictures available you get the closest you can find to that, and I suppose you won't get accurate depictions of their toes. All you are training is how that specific person looks like but SD already knows how "people" look like. You want full body pictures to make sure that their general body is portrayed with accurate proportions, but if you fail to provide pictures that show how their feet look SD will just extrapolate from what you did provide and draw "generic" legs.

To give you a better idea of what a training set looks like, this is (almost; it's missing like 6 headshots that didn't fit on my screen) the entire set I used to train a model of cosplayer Rolyat:

And here are couple of pictures I managed to make with the resulting LoRA. Now, I wouldn't call them flawless depictions of her likeness, but it's in the range of how accurate one can get, it depends a lot on the prompt you use as well, and here I was more concerned about making the images interesting than being 100% accurate.

Sepheyer · Oct 31, 2023

Gtg Gtg said:
with your previous advice on the 512x squares, sometimes its hard to find full body images, how do you deal with it? the cosplayer I'm doing usually does vertical images and rarely ever does full body images.

There's a mini-thread on training a Lara Croft LORA, with all the usual suspects from this thread: https://f95zone.to/threads/loras-for-wildeers-lara-croft-development-thread.173873/

There might be a few tidbits of wisdom. I saw you already encountered one earlier, which is: ~20 images is all you need. And in that thread I posed above we kinda empirically confirm that statement. There are quite a few other things that were stumbled upon during the discussion, I just don't recall what they were. Oh, I remember - I realized LORAs are a bit of a deadend for me personally and started looking into offshoots.

Gtg Gtg · Oct 31, 2023

Sepheyer said:
There's a mini-thread on training a Lara Croft LORA, with all the usual suspects from this thread: https://f95zone.to/threads/loras-for-wildeers-lara-croft-development-thread.173873/

There might be a few tidbits of wisdom. I saw you already encountered one earlier, which is: ~20 images is all you need. And in that thread I posed above we kinda empirically confirm that statement. There are quite a few other things that were stumbled upon during the discussion, I just don't recall what they were. Oh, I remember - I realized LORAs are a bit of a deadend for me personally and started looking into offshoots.

offshoots? like other ways to replicate somebody/something better?

me3 · Oct 31, 2023

Having nearly 500 images from the same "series" i posted a while ago, i thought i'd test some inpainting things.
Starting image and, well so far, end result. There's still things that should have been fixed...something for another day i guess...

You don't have permission to view the spoiler content. Log in or register now.

Sepheyer · Oct 31, 2023

Gtg Gtg said:
offshoots? like other ways to replicate somebody/something better?

For a few evenings we had the IPAdapter hopes flying high: https://f95zone.to/threads/stable-diffusion-prompt-sharing-and-learning-thread.146036/post-11985258

But alas, we conclusively proved it can't replace LORAs.

hkennereth · Nov 4, 2023

Not sure if this is something anyone wants to see on this thread, but since it's been a bit quiet I didn't think you guys would mind much. Here is a set of images I created to celebrate Halloween featuring some cosplayers and internet models of whom I have created LoRAs before.

(Bonus points to anyone who can recognize them lol)

Sharinel · Nov 4, 2023

hkennereth said:
Not sure if this is something anyone wants to see on this thread, but since it's been a bit quiet I didn't think you guys would mind much. Here is a set of images I created to celebrate Halloween featuring some cosplayers and internet models of whom I have created LoRAs before.

(Bonus points to anyone who can recognize them lol)

View attachment 3059650 View attachment 3059651 View attachment 3059652 View attachment 3059653 View attachment 3059654 View attachment 3059655 View attachment 3059656 View attachment 3059657 View attachment 3059658 View attachment 3059659 View attachment 3059660 View attachment 3059661 View attachment 3059662 View attachment 3059663 View attachment 3059664 View attachment 3059665 View attachment 3059666 View attachment 3059667 View attachment 3059668

I think the 3rd one is Olivia Casta? Which is some sort of LoRa-ception, as she is already an AI of another model

hkennereth · Nov 4, 2023

Sharinel said:
I think the 3rd one is Olivia Casta? Which is some sort of LoRa-ception, as she is already an AI of another model

Yes, it is. And... yes, she is. Although I think she just uses like Snapchat filters or something, she is "pre-AI"

[Stable Diffusion] Prompt Sharing and Learning Thread

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Well-Known Member

Member

Member

Well-Known Member

Member

Active Member

Member