[Stable Diffusion] Prompt Sharing and Learning Thread

felldude

Active Member
Aug 26, 2017
572
1,695
Your maths might be a bit off, it's showing 17 secs or so for each generation? 18:10:06 it starts and last one kicks off at 18:10:58 . I have a 4090 so it's certainly a lot faster than 115 secs
EDIT, I see the up scaling process in their at 1.5x but its not image to image but that is taking some time.
I don't know how much time it is taking up though. Lets say its around half the time and give you an even

128 steps in 10 seconds = 10-12.8 IT/per second which is good....if the up scaling is taking that amount of time
Your at 7.11 IT/s with the up scaling

For reference I also did a batch size of 4, amazed it increased my IT per second (Relative to the number of steps)
But I am still around 1.0-1.2 IT/s if I was doing a unsupervised batch I might run a batch size of 4 instead of 1 to see if it is stable for that extra .2 seconds.

I didn't quite get the same level of clothing as you did...lol
ComfyUI_01375_.png ComfyUI_01386_.png
 
Last edited:

felldude

Active Member
Aug 26, 2017
572
1,695
Holy shit I was looking at the new SD 3

First Text encoder standard open clip 200MB
Second Text Encoder CLIP Vit-L used in XL 2GB
Third Text Encoder T5XXL 10GB

OK I cant even fit the TE in memory let alone the Unet

The pruned combined model isn't up yet but Im guessing it will be at least 12GB
If not over the 16GB threshold wich would put it into the 0.1% of users.
 
  • Wow
Reactions: Jimwalrus

devilkkw

Member
Mar 17, 2021
324
1,094
I have not used forge, keeping Auto1111, Comfy and Koyha all in different VENV's is taking up enough space with 98% the same files and 2% difference that makes them incompatible.

You have the #coder in your signature, have you gotten deepspeed to work with windows?
The precompiled one fails for me and when I compiled it with ninja it corrupted my cuda files. I tried multiple CUDA sdk's and fixed the reference to the Linux time .h
I'm using a custom build of comfy that I built with tensor flow RT .dll's other then a round off error I have no issue

View attachment 3729886

I'm not sure it is actually speeding things up because no one post thier IT/s or secs per IT...lol

For 1024x1024 I am at 1.0-1.5 IT/s with most samplers (Not Huen)
For 2048x2048 I am at 4.5 secs per IT

This is native render not High res fix or SEGS that breaks the image down.
Oh and my motherboard is only PCI 3.0 so I am at half bandwidth but I'm not sure it matters.
2,560 GPU cores on a RTX 3050
Holy shit I was looking at the new SD 3

First Text encoder standard open clip 200MB
Second Text Encoder CLIP Vit-L used in XL 2GB
Third Text Encoder T5XXL 10GB

OK I cant even fit the TE in memory let alone the Unet

The pruned combined model isn't up yet but Im guessing it will be at least 12GB
If not over the 16GB threshold wich would put it into the 0.1% of users.
I don't understand what appen on your pc, i'm on win11 and using miniconda for enviremont. I have a1111, forge and CUI with their venv. I don't have any problem on made it working, just download and extract github zip then run. Haven't any cuda problem, but in past i remember i've to download a specific version of cudnn precompiled, cuz my compiler getting error on building it from source.

Speaking about speed, in cui i've a 1.49it/s for a 1024x1280.
speed.jpg

I've see SD3 is out, but also reading about it, some sort of censor on it so we need to wait a trained model.
Seem working good on text, but censor is not what we want.
 

felldude

Active Member
Aug 26, 2017
572
1,695
I don't understand what appen on your pc, i'm on win11 and using miniconda for enviremont. I have a1111, forge and CUI with their venv. I don't have any problem on made it working, just download and extract github zip then run. Haven't any cuda problem, but in past i remember i've to download a specific version of cudnn precompiled, cuz my compiler getting error on building it from source.

Speaking about speed, in cui i've a 1.49it/s for a 1024x1280.
View attachment 3730569

I've see SD3 is out, but also reading about it, some sort of censor on it so we need to wait a trained model.
Seem working good on text, but censor is not what we want.
Yeah I have them all in VENV's my point with that is why I haven't tried Forge or Stability is do to each VENV having about 90,000 files at around 15GB I don't want to mess with anymore programs unless something ground breaking comes in.

....

So you have script working in Koyha its pretty sad they left references to Linux in the windows build but even following the guide to fix that I have not been able to compile it with any version of CUDA

The pre-compiled version did not function for me.

...

They did not train on the LAION-5B set, that set is facing legal issues and has been taken down.
They just say it was trained on 1Billion images and refined on 3M



Nude is at 16k
XXX is at 33923

They have 3 versions 4GB up to 10.9GB for the version with the new TE
Ok so that is how they did it....its only 10.9GB with the TE because it is in FP8 to use the FP16 TE with the CLIP and UNET its around 16GB


{
"_class_name": "SD3Transformer2DModel",
"_diffusers_version": "0.29.0.dev0",
"attention_head_dim": 64,
"caption_projection_dim": 1536,
"in_channels": 16,
"joint_attention_dim": 4096,
"num_attention_heads": 24,
"num_layers": 24,
"out_channels": 16,
"patch_size": 2,
"pooled_projection_dim": 2048,
"pos_embed_max_size": 192,
"sample_size": 128
}

Yeah I can't even use this model let alone train on it.
 
Last edited:
  • Like
Reactions: devilkkw

felldude

Active Member
Aug 26, 2017
572
1,695
Ok so regarding SD3

I can run the sd3_medium_incl_clips without the T5XXL even in fp8 that model is outside of my range.

PONY Loras do work with the model and improve the results slightly.

Without lora
ComfyUI_01406_.png

With lora

ComfyUI_01405_.png

The only sampler I found that doesn't corrupt is Euler without the TE
 

Sharinel

Active Member
Dec 23, 2018
598
2,511
Ok so regarding SD3

I can run the sd3_medium_incl_clips without the T5XXL even in fp8 that model is outside of my range.

PONY Loras do work with the model and improve the results slightly.

Without lora
View attachment 3730872

With lora

View attachment 3730873

The only sampler I found that doesn't corrupt is Euler without the TE
The supergirl image and the generation times from my earlier post were the SD3 model with clip/t5 (the 10 gig one), and I also posted a quick example try at generation on the other thread - https://f95zone.to/threads/ai-art-show-us-your-ai-skill-no-teens.138575/post-13995058

I think it needs a lot of work, but from the way they have worded the licence it looks like it's not worth it for the people who normally do finetunes
 

felldude

Active Member
Aug 26, 2017
572
1,695
The supergirl image and the generation times from my earlier post were the SD3 model with clip/t5 (the 10 gig one), and I also posted a quick example try at generation on the other thread - https://f95zone.to/threads/ai-art-show-us-your-ai-skill-no-teens.138575/post-13995058

I think it needs a lot of work, but from the way they have worded the licence it looks like it's not worth it for the people who normally do finetunes
Trying to do the math on what a native finetune using Adam would require....
It might be out of the range of the 80GB A100

I'd be curious if someone who has a 24GB card could finetune with Lion or Ada
 

Synalon

Member
Jan 31, 2022
225
663
I managed to get this out of SD3 so far, if anybody has any prompts they want me to test on it for them send my a message with the prompts, it only takes a few seconds to render.

This is with the most basic workflow I could make, as the multiple clips wasn't working for me.

I'm trying to fix the mutiple clips workflow now, so it might get better later.

You don't have permission to view the spoiler content. Log in or register now.

*Edit. Added two more example pictures*
 
Last edited:

felldude

Active Member
Aug 26, 2017
572
1,695
I managed to get this out of SD3 so far, if anybody has any prompts they want me to test on it for them send my a message with the prompts, it only takes a few seconds to render.

This is with the most basic workflow I could make, as the multiple clips wasn't working for me.

I'm trying to fix the mutiple clips workflow now, so it might get better later.

You don't have permission to view the spoiler content. Log in or register now.
I didn't notice much of a difference with the 3 clip SD3 encoder that comfy had a heads up on. Then again I am using the model with only 2 TE's so...

Have you tried the FP16 version they posted
 

Synalon

Member
Jan 31, 2022
225
663
I didn't notice much of a difference with the 3 clip SD3 encoder that comfy had a heads up on. Then again I am using the model with only 2 TE's so...

Have you tried the FP16 version they posted
Currently I'm just using sd3_medium_incl_clips_t5xxlfp8.safetensor, If you can link me to the FP16 version I'll give it a try.
I just downloaded and installed the clip files and FP16 I'll give it a try now.
 
Last edited:

felldude

Active Member
Aug 26, 2017
572
1,695
Currently I'm just using sd3_medium_incl_clips_t5xxlfp8.safetensor, If you can link me to the FP16 version I'll give it a try.
I'm still fairly new to Comfy, even though I've had it for a while I don't use it much so if I need to adjust the workflow a lot for it link me to a how to guide as well please.


This image has SD3 Clip just delete the lora and change the checkpoint to base SD3

ComfyUI_00013_.png
 
  • Like
Reactions: VanMortis

Synalon

Member
Jan 31, 2022
225
663


This image has SD3 Clip just delete the lora and change the checkpoint to base SD3

View attachment 3732948
TY I think I have it sorted out now. The image is using FP16.

You don't have permission to view the spoiler content. Log in or register now.

I changed the sampler and scheduler as an experiment, the image came out ok but could be better.

Do you know where to install samplers and schedulers in comfy, I have way more in forge I can copy over to try.
 
Last edited:

felldude

Active Member
Aug 26, 2017
572
1,695
TY I think I have it sorted out now. The image is using FP16.

You don't have permission to view the spoiler content. Log in or register now.

I changed the sampler and scheduler as an experiment, the image came out ok but could be better.

Do you know where to install samplers and schedulers in comfy, I have way more in forge I can copy over to try.
Comfy should have most of the new samplers the turbo one is in a different list and you can use the custom sampler to make your own setting offset and complex things.

My understanding is that SD3 was only trained to work with Euler, also they claim to have removed, or maybe censored in some way most of the adult content that was in the 5 billion image set.

I am assuming they used AI tools to tag the and prune the data set down to a mere 1 billion.

If I did the math right with a batch size of 48 if you could maintain 10IT/s you could train that model in 578 hours
 
Last edited:

Synalon

Member
Jan 31, 2022
225
663
The nude stuff certainly isn't working but I have had some reasonable outputs considering the minimal prompts I've used while testing different schedulers and samplers.

This one is using DDim as a Sampler and DDim_uniform as a scheduler.

Its not good, but at least its another direction to experiment with.

SD3_00080_.png
 

felldude

Active Member
Aug 26, 2017
572
1,695
With all the experimenting and work I was doing on SD3 I was able to fit a finetuning in overnight, with everything optimized I was able to do a full native fine tuning of SD 1.5

(Yeah not SD3 Im not even sure I could fine tune the UNET only diffusers)
But maybe with the new offloading dynamic loading stuff Microsoft or IBM is developing.
If I told anyone in 2022 that I did a finetune with Adam-8 bit with full accumulated and parallel distribution on a 8GB card.....


My thoughts:

Epic Realism, Realistic Vision, Absolute reality, Juggernaut, they all are using the same base training as a start and its not SD 1.5-EMA
(Probably one of those models did use SD1.5 but that might be lost to time)

If you take the prompts from the following images from my fine tune (They are not perfect)
Run them in any of those checkpoints with the full list of negatives, you will see a pattern.

I thought AI girl was part of SD 1.5, its my opinion it's just a symptom of fine-tuning a finetune at best.

ComfyUI_00172_.png ComfyUI_00170_.png ComfyUI_00167_.png ComfyUI_00166_.png ComfyUI_00164_.png
 
Last edited:

felldude

Active Member
Aug 26, 2017
572
1,695

People probably wont use it with the new SD3 but...
ComfyUI_00218_.png
ComfyUI_00217_.png ComfyUI_00216_.png ComfyUI_00213_.png
 
Last edited:

Sharinel

Active Member
Dec 23, 2018
598
2,511
With all the experimenting and work I was doing on SD3 I was able to fit a finetuning in overnight, with everything optimized I was able to do a full native fine tuning of SD 1.5

(Yeah not SD3 Im not even sure I could fine tune the UNET only diffusers)
But maybe with the new offloading dynamic loading stuff Microsoft or IBM is developing.
If I told anyone in 2022 that I did a finetune with Adam-8 bit with full accumulated and parallel distribution on a 8GB card.....


My thoughts:

Epic Realism, Realistic Vision, Absolute reality, Juggernaut, they all are using the same base training as a start and its not SD 1.5-EMA
(Probably one of those models did use SD1.5 but that might be lost to time)

If you take the prompts from the following images from my fine tune (They are not perfect)
Run them in any of those checkpoints with the full list of negatives, you will see a pattern.

I thought AI girl was part of SD 1.5, its my opinion it's just a symptom of fine-tuning a finetune at best.

View attachment 3735222 View attachment 3735223 View attachment 3735224 View attachment 3735225 View attachment 3735226
Every single word of that was in English, and I understood none of it :)

However there were boobas at the end so good job!
 
  • Haha
Reactions: CBTWizard

felldude

Active Member
Aug 26, 2017
572
1,695
Every single word of that was in English, and I understood none of it :)

However there were boobas at the end so good job!
Lol, I am saying that it appears the most popular checkpoints are copy paste of one good training, with very little difference.
But the loss on some of the models is pretty high thus the 2GB
 

S-G-H

Newbie
May 10, 2024
17
4
for anyone who strugle with the Stable-diffusion installation there is this app called very easy to use and you don't need to have a specifique version of python in your system, Pinokio works on ALL operating systems and it's very useful for linux users so you don't have to create virtual envirement or mess with your system.
it's not only for Stable-diffusion