[Stable Diffusion] Prompt Sharing and Learning Thread

fr34ky

Active Member
Oct 29, 2017
812
2,189
Hey guys, does anybody know why once you get a CUDA out of memory error, you start to get it every time, even with lower resolutions, etc.?

I mean, sometimes I can create a picture in 1024x1024, and after a couple of generations I start to get that CUDA error and cannot go back to generate pictures in that resolution.

Is there some kind of VRAM reset or something that can be done?


edit:

Sooo... ComfyUI doesn't have "restore face" feature. For the time being one can either use or or grab any other equivalent from GitHub. Naturally, rather messy.

Here we have before and after:


The prompt is inside the original file.

have you tried sending the txt2img result to inpaint and correct the face with a higher resolution? I mean instead of face restoration.
 
Last edited:
  • Like
Reactions: Sepheyer and Mr-Fox

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,802
Hey guys, does anybody know why once you get a CUDA out of memory error, you start to get it every time, even with lower resolutions, etc.?

I mean, sometimes I can create a picture in 1024x1024, and after a couple of generations I start to get that CUDA error and cannot go back to generate pictures in that resolution.

Is there some kind of VRAM reset or something that can be done?


edit:




have you tried sending the txt2img result to inpaint and correct the face with a higher resolution? I mean instead of face restoration.
I have noticed this also, it seems to be something like a cashing issue. I have also been wondering if we can empty this cash somehow. I have not found anything about this yet. There's an with tips and ideas about fixing cuda memory error. One thing is to use the of SD for low vram. I haven't tried it.
Another is to add
"set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:64" to webui-user.bat .
I have tried this one and it extends your absolute limit for resolution and hi res multiplier a bit.
Restarting the webui maybe helps. You can do this from settings / Reload Ui. Then you can use PNG Info to load back the prompt and settings. Running other things that is demanding for the GPU is of course adding to the issue, I'm sure you already understand this. Photoshop is one example and other similar heavy software.
I don't understand why it leads to an error if the vram is "full" though. Why can't it just continue generating but only take longer time?:unsure:
 
  • Like
Reactions: Sepheyer and fr34ky

fr34ky

Active Member
Oct 29, 2017
812
2,189
I have noticed this also, it seems to be something like a cashing issue. I have also been wondering if we can empty this cash somehow. I have not found anything about this yet. There's an with tips and ideas about fixing cuda memory error. One thing is to use the of SD for low vram. I haven't tried it.
Another is to add
"set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:64" to webui-user.bat .
I have tried this one and it extends your absolute limit for resolution and hi res multiplier a bit.
Restarting the webui maybe helps. You can do this from settings / Reload Ui. Then you can use PNG Info to load back the prompt and settings. Running other things that is demanding for the GPU is of course adding to the issue, I'm sure you already understand this. Photoshop is one example and other similar heavy software.
I don't understand why it leads to an error if the vram is "full" though. Why can't it just continue generating but only take longer time?:unsure:
Yeah, usually the only solution is to restart, I've read that parameter you posted before and was thinking it could be related, couldn't try it yet.

All this stuff is in a very experimental phase IMO so probably there are lots of bad use of memory and resources.
 
  • Like
Reactions: Sepheyer and Mr-Fox

Schlongborn

Member
May 4, 2019
432
1,533
I posted in the AI art thread about training a LORA. I used this project: which is like webui, but only for training. I found it easier to use then dreambooth (which also constantly broke my webui installation, so I uninstalled it). It is a dedicated thing just for training so you can have webui for the art, and kohya for the training.

For installing kohya_ss follow the instructions in the readme: I assume most here will run windows. There are instructions for ubuntu too.

Most difficult thing was putting together a training set. I just used a series of Daz renders I have: (I am actually too lazy to render more so I just keep reusing that one for my AI shenanigans since its kind of complete).

And then I cropped the images into "AI friendly" resolutions, so 512x512, 768x512, 1024x512 etc. I used mostly 512x512 pictures which is recommended, but it should be ok to use other resolution, even completely different ones from these "AI friendly" ones.

Kohya expects a certain directory structure for training, like this:
dirstructure.PNG
and then inside of "image" you need to create a directory like so:
sophiadir.PNG

That folder name is important like that, the 100 means how often each picture is sampled. So in my case I have 21 pictures, and 100 samples (or iterations) per picture, so the entire training will run for 2100 iterations. You can put higher values there if you have less images, youtube said around 1000-2000 iterations is enough.

and in there you can put all the images like so:
sophiaimages.PNG

And then in kohya, you can caption those images using BLIP like so:
sopiablip.PNG

The prefix there is optional, that string is just inserted at the beginning of each generated .txt file. This is useful because then the LORA is trained to recognize that word, so when I put "SophiaCollegeLectureFace" in my prompt, or negative prompt, then I can control my LORA that way. I am not actually sure this really worked, I just used the LORA through webui's "additional networks" which just added something to the prompt anyways. But it didn't hurt either.

Ok, once you ran BLIP the folder should look like this:
sophiacaptions.PNG

I edited all of these files because for my training data it all came out pretty much similar. To be honest I think my training data was kinda shitty. But it is just an example anyways. I'll attach a zip with my training data and the LORA, so there you can take a look what I used, most of these .txt files contain something like: "SophiaLectureCollegeFace a woman with red hair with glasses and a necklace on her neck".

Ok, now that everything is setup, you can go into the "Dreambooth LORA" tab and setup the folders like so:
kohyafolders.PNG

and also pick a source model like so:
kohyamodel.PNG

I just picked a v1.5 model, if you want to train a LORA for v2.1 or v2.0, you need to tick the v2 checkbox, if you want to train for v2.1-768 you need to also tick v_parameterization. Since I use a custom checkpoint, the model quick pick is "custom", otherwise you can select the base 1.5, 2.0 or 2.1 models there. I just save as safetensors because thats the default.

And then finally you can fiddle with the training parameters:
koyhatrain1.PNG
koyhatrain2.PNG
this what I used for the basics, and the advanced stuff I pretty much left everything as default.

One important thing is probably the resolution, I don't think I messed with it (made these screenshots after the fact so don't remeber exactly), but I wouldn't change it too much either way. 512x512 or 768x768 is probably the way to go there with a 1.5 model, maybe 768x768 only really works with v2.1-768 even. Might also depend on your training data. If you have only 768x768 images then maybe thats the better choice.

The other important one is the "Enable buckets" checkbox. That makes it so that your training data is sorted into resolution "buckets", not entirely sure how that works but it is meant to make it so you can use images with differing resolutions, or resolutions that don't "fit" the model way in your training data without distorting the result. So I'd make sure it is enabled (I believe it costs VRAM though).

And now, you can press the big orange button!
trainbutton.PNG

And your training should start.

Code:
(venv) → D:\AI\kohya_ss [master ≡ +1 ~0 -0 !]› .\webui-user.bat
Already up to date.
Validating that requirements are satisfied.
All requirements satisfied.
Load CSS...
Running on local URL:  http://127.0.0.1:7862

To create a public link, set `share=True` in `launch()`.
Folder 100_SophiaCollegeLectureFace: 2100 steps
max_train_steps = 2100
stop_text_encoder_training = 0
lr_warmup_steps = 210
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --enable_bucket --pretrained_model_name_or_path="D:/AI Models/comfyui/checkpoints/ProtoGen_X3.4.safetensors" --train_data_dir="D:/AI Training/SophiaCollegeLectureFace/image" --resolution=512,512 --output_dir="D:/AI Training/SophiaCollegeLectureFace/model" --logging_dir="D:/AI Training/SophiaCollegeLectureFace/log" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=128 --output_name="last" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --lr_warmup_steps="210" --train_batch_size="1" --max_train_steps="2100" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --bucket_reso_steps=64 --xformers --bucket_no_upscale
And, once it is all over, you have your model in the "model" folder. Here is my LORA + training data as an example:

You can just copy the .safetensors file into webui/models/lora, and then you can load it in webui with the "additional networks" button. It should insert something into your prompt like <SophiaCollegeLectureFace:1> where the 1 is a weight. I found 1 is way too much, maybe like 0.8 at most. Otherwise it resulted in kind of ugly pictures.

And if you want a youtube video instead, use this: that's where I got my information from mostly.
 
Last edited:

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,802
Yeah, usually the only solution is to restart, I've read that parameter you posted before and was thinking it could be related, couldn't try it yet.

All this stuff is in a very experimental phase IMO so probably there are lots of bad use of memory and resources.
I'm not actually sure what it's doing. I understand though that it sets something related to how memory is used and handled.
It does work, but it only gives a small boost. At least as far as I have seen. I'm a bit "ham fisted" though with my vram usage.
I run hi res multiplier as high as my GPU will allow atm. If I get an error I usually adjust down 0.05-0.1 and it fixes it most of the time. It's funny though, sometimes I can reach higher end resolution than other times. Meaning I can run the hi res multiplier higher than usual at times. No idea why.
 
  • Like
Reactions: fr34ky

fr34ky

Active Member
Oct 29, 2017
812
2,189
I'm not actually sure what it's doing. I understand though that it sets something related to how memory is used and handled.
It does work, but it only gives a small boost. At least as far as I have seen. I'm a bit "ham fisted" though with my vram usage.
I run hi res multiplier as high as my GPU will allow atm. If I get an error I usually adjust down 0.05-0.1 and it fixes it most of the time. It's funny though, sometimes I can reach higher end resolution than other times. Meaning I can run the hi res multiplier higher than usual at times. No idea why.
It's quite random, sometimes if the pictures needs that extra resolution you just have to insist until it comes right :ROFLMAO:

I'm curious on why you use hi-res fix instead of sending the picture to img2img and then use a higher resolution there.

Is there anything I'm missing with that? (I've read your explanation and tested it)


I posted in the AI art thread about training a LORA. I used this project: which is like webui, but only for training. I found it easier to use then dreambooth (which also constantly broke my webui installation, so I uninstalled it). It is a dedicated thing just for training so you can have webui for the art, and kohya for the training.

Most difficult thing was putting together a training set. I just used a series of Daz renders I have: (I am actually too lazy to render more so I just keep reusing that one for my AI shenanigans since its kind of complete).

And then I cropped the images into "AI friendly" resolutions, so 512x512, 768x512, 1024x512 etc. I used mostly 512x512 pictures which is recommended, but it should be ok to use other resolution, even completely different ones from these "AI friendly" ones.

Kohya expects a certain directory structure for training, like this:
View attachment 2470222
and then inside of "image" you need to create a directory like so:
View attachment 2470223

That folder name is important like that, the 100 means how often each picture is sampled. So in my case I have 21 pictures, and 100 samples (or iterations) per picture, so the entire training will run for 2100 iterations. You can put higher values there if you have less images, youtube said around 1000-2000 iterations is enough.

and in there you can put all the images like so:
View attachment 2470224

And then in kohya, you can caption those images using BLIP like so:
View attachment 2470231

The prefix there is optional, that string is just inserted at the beginning of each generated .txt file. This is useful because then the LORA is trained to recognize that word, so when I put "SophiaCollegeLectureFace" in my prompt, or negative prompt, then I can control my LORA that way. I am not actually sure this really worked, I just used the LORA through webui's "additional networks" which just added something to the prompt anyways. But it didn't hurt either.

Ok, once you ran BLIP the folder should look like this:
View attachment 2470245

I edited all of these files because for my training data it all came out pretty much similar. To be honest I think my training data was kinda shitty. But it is just an example anyways. I'll attach a zip with my training data and the LORA, so there you can take a look what I used, most of these .txt files contain something like: "SophiaLectureCollegeFace a woman with red hair with glasses and a necklace on her neck".

Ok, now that everything is setup, you can go into the "Dreambooth LORA" tab and setup the folders like so:
View attachment 2470226

and also pick a source model like so:
View attachment 2470249

I just picked a v1.5 model, if you want to train a LORA for v2.1 or v2.0, you need to tick the v2 checkbox, if you want to train for v2.1-768 you need to also tick v_parameterization. Since I use a custom checkpoint, the model quick pick is "custom", otherwise you can select the base 1.5, 2.0 or 2.1 models there. I just save as safetensors because thats the default.

And then finally you can fiddle with the training parameters:
View attachment 2470258
View attachment 2470264
this what I used for the basics, and the advanced stuff I pretty much left everything as default.

One important thing is probably the resolution, I don't think I messed with it (made these screenshots after the fact so don't remeber exactly), but I wouldn't change it too much either way. 512x512 or 768x768 is probably the way to go there with a 1.5 model, maybe 768x768 only really works with v2.1-768 even. Might also depend on your training data. If you have only 768x768 images then maybe thats the better choice.

The other important one is the "Enable buckets" checkbox. That makes it so that your training data is sorted into resolution "buckets", not entirely sure how that works but it is meant to make it so you can use images with differing resolutions, or resolutions that don't "fit" the model way in your training data without distorting the result. So I'd make sure it is enabled (I believe it costs VRAM though).

And now, you can press the big orange button!
View attachment 2470267

And your training should start.

Code:
(venv) → D:\AI\kohya_ss [master ≡ +1 ~0 -0 !]› .\webui-user.bat
Already up to date.
Validating that requirements are satisfied.
All requirements satisfied.
Load CSS...
Running on local URL:  http://127.0.0.1:7862

To create a public link, set `share=True` in `launch()`.
Folder 100_SophiaCollegeLectureFace: 2100 steps
max_train_steps = 2100
stop_text_encoder_training = 0
lr_warmup_steps = 210
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --enable_bucket --pretrained_model_name_or_path="D:/AI Models/comfyui/checkpoints/ProtoGen_X3.4.safetensors" --train_data_dir="D:/AI Training/SophiaCollegeLectureFace/image" --resolution=512,512 --output_dir="D:/AI Training/SophiaCollegeLectureFace/model" --logging_dir="D:/AI Training/SophiaCollegeLectureFace/log" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=0.0001 --network_dim=128 --output_name="last" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="constant" --lr_warmup_steps="210" --train_batch_size="1" --max_train_steps="2100" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --bucket_reso_steps=64 --xformers --bucket_no_upscale
And, once it is all over, you have your model in the "model" folder. Here is my LORA + training data as an example:

You can just copy the .safetensors file into webui/models/lora, and then you can load it in webui with the "additional networks" button. It should insert something into your prompt like <SophiaCollegeLectureFace:1> where the 1 is a weight. I found 1 is way too much, maybe like 0.8 at most. Otherwise it resulted in kind of ugly pictures.

And if you want a youtube video instead, use this: that's where I got my information from mostly.
That was a lot of work on that post man. I've use that Kohya to train my LORAs, it's the only way I could train LORA and works great.
 
  • Like
Reactions: Mr-Fox

Sepheyer

Well-Known Member
Dec 21, 2020
1,570
3,767
have you tried sending the txt2img result to inpaint and correct the face with a higher resolution? I mean instead of face restoration.
I think I came close to it, but not entirely. This girl link is a 2x upscale of a 512x768 original and produced using this link pipeline. The denoise setting was 0.5, I trust this is close enough to inpaint. So, if I wouldn't know better, the mere x2 upscale result is acceptable, although GFPGAN takes it to the next level link.
You don't have permission to view the spoiler content. Log in or register now.
Also, CUI doesn't really support inpaint as easily as WUIA1111 does. In CUI you need to do the mask in a graphics editor, connect it via a nod, etc. - so the workflow is high touch.

And finally, I have this strongly held belief, although it might be laughably incorrect, that face restoration algos are supreme to the alternative workflows, including rendering in higher resolution. Now, a higher resolution render does make GFPGAN produce a yet better result as the comparison linked above illustrates.


Hence I wanted to explore how I can get the proper face restoration, because otherwise CUI would be a non-starter without a way to do face restore.
 
  • Like
Reactions: fr34ky and Mr-Fox

Sepheyer

Well-Known Member
Dec 21, 2020
1,570
3,767
Guys, if you want me to pin something into the guides section, let me know. This is an open offer without expiration. If there are useful posts that are not in the Guides yet, it is only because I either overlooked them or forgot about them. Do ping me and let me know what should be added. I know I am missing a bunch on ControlNet, but out of energy right now to search the thread, my bad!

Untitled.png
 

Sepheyer

Well-Known Member
Dec 21, 2020
1,570
3,767
GodOfPandas got this image link. Great image, my post here is not critisism, god forbid.

Up until like 24 hours ago I didn't know that the grain/staircase in the girl's face are actually artefacts produced by face restoration algo CodeFormer that A1111 uses by default.

There are ways to cure it - using GFPGAN.

Option 1
This option is online and is not limited by your card's VRAM. "Option 2" below uses A1111, simplier, but limited by your card's VRAM.

A super quick tutorial on how to: . And the artefacts are gone once you run a GFPGAN on the original image.

00052rs.png

Option 2
As simple as a Sunday morning, but limited by your card's VRAM. The spidergirl blewout my VRAM using A1111, so here is using a 768x512 image:

Untitled.png

Option 3
Use the actual desktop GFPGAN. You merely need to "git clone" it, same what we did with A1111. The major benefit is this has light memory footprint and you can do larger files on your desktop in mere seconds; it chews on much larger files than A1111 - I guess that's because on start up A1111 reserves all the memory it needs for the model leaving very little memory for anything else. The dedicated face restorers, upscalers do not have the same appoach and thus your memory goes much further. I.e. I kept running out of memory when processing GodOfPandas's image in A1111, but had no issue with the desktop GFPGAN.

Here is the link:

And here is what the GFPGAN-processed image looks like:

00052_GFPGAN_14.png
 
Last edited:

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,802
It's quite random, sometimes if the pictures needs that extra resolution you just have to insist until it comes right :ROFLMAO:

I'm curious on why you use hi-res fix instead of sending the picture to img2img and then use a higher resolution there.

Is there anything I'm missing with that? (I've read your explanation and tested it)




That was a lot of work on that post man. I've use that Kohya to train my LORAs, it's the only way I could train LORA and works great.
A while back I noticed that a beautiful image had lost all texture and details after using the upscale in extra tab. Since then I have been using hi res fix because it doesn't remove the details. SD is indeed in an experimental state, so it could have been something else that caused it. For now I'm using hi res fix until I see a better solution.
 
  • Red Heart
Reactions: Sepheyer

Sepheyer

Well-Known Member
Dec 21, 2020
1,570
3,767
Whoa, the power of pipeline-based workflow. The workspace looks like:
You don't have permission to view the spoiler content. Log in or register now.
Here is the write-up: . Takes no time to run.

full_00036_.png
 
  • Like
Reactions: Mr-Fox

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,802
Original Generated ImageUpscaled in extra tab. It has higher resolution , yes but the skin is all smoothed out. The texture and fidelity has been lost.
00440-1449331821.png 00085.png

I had similar issue with the pinup Betty image.
 
  • Like
  • Red Heart
Reactions: Mark17 and Sepheyer

PandaRepublic

Member
May 18, 2018
211
2,140
I need help. I followed the guide in this thread on how to train LORA but it does not train. This is what I get when I click "Train Model" and it just stops there. I'm new to this so I don't know if this screenshot helps at all. Screenshot 2023-03-16 134253.png
 

Schlongborn

Member
May 4, 2019
432
1,533
I need help. I followed the guide in this thread on how to train LORA but it does not train. This is what I get when I click "Train Model" and it just stops there. I'm new to this so I don't know if this screenshot helps at all.
Screenshot 2023-03-16 134253.png
The screenshot helps, it contains the error:
ValueError: bf16 mixed precision requires PyTorch >= 1.10 and a supported device.

So that means you either don't have PyTorch >= 1.10 installed (you can check with: pip show torch), or your GPU does not support mixed precision. What GPU do you have?
 
Last edited:

PandaRepublic

Member
May 18, 2018
211
2,140
The screenshot helps, it contains the error:
ValueError: bf16 mixed precision requires PyTorch >= 1.10 and a supported device.

So that means you either don't have PyTorch >= 1.10 installed (you can check with: pip show torch), or your GPU does not support mixed precision. What GPU do you have?

See also:
I have a 2060 Super, It's probably because I don't have PyTorch
 
Last edited:
  • Like
Reactions: Jimwalrus