[Stable Diffusion] Prompt Sharing and Learning Thread

hkennereth

Member
Mar 3, 2019
236
756
FLEX model memory workaround.

According to Aitrepreneur getting this setting changed lets your video card use your RAM at the expense of speed:

CUDA - Sysmem Fallback Policy: Prefer no Sysmem Fallback

View attachment 3895640
From the way it's written, I would think it's the other way around, no? If you select "no sysmem fallback", it will disable the ability to use the system memory as fallback when there isn't enough VRAM. If you select the last option, "prefer sysmem fallback", it should allow that to happen. At least that's my understanding.
 
  • Like
Reactions: Sepheyer

Sepheyer

Well-Known Member
Dec 21, 2020
1,542
3,667
ComfyUI noob FLUX workflows:

To get one started with the least demanding model (schnell), one needs a few files: the "vae", the model, the workflow, and the encoders.

The "vae" (ae.sft), the model (flux1-schnell.sft) are here:

The encoders:

The very first link here explains where each file goes.

Looks like a cool 40GB in new downloads when all said and done.

The workflow for schnell is last one in this table:

DevFP8Schnell
flux_dev_example.png flux_dev_checkpoint_example.png flux_schnell_example.png
 
Last edited:

Sepheyer

Well-Known Member
Dec 21, 2020
1,542
3,667
Yea, I dunno, I keep getting error "!!! Exception during processing!!! Error while deserializing header: InvalidHeaderDeserialization" when trying the FLUX workflow.

Can't immediately troubleshoot what's up :(

You don't have permission to view the spoiler content. Log in or register now.
 

felldude

Active Member
Aug 26, 2017
543
1,564
Someone merged the into FP8, you need to use the newest version of COMFY

17 and 21 GB so if you have a 3090 or 4090 your golden.

For me its around 100 seconds per IT so I am out on using it.

Make sure your not using any crazy launch arguments like --force-fp32
If your AMD and using --directml, I don't think Olive has been updated to convert the model yet

I'm not sure if the defualt XFormers or --use-pytorch-cross-attention would be faster.


Yea, I dunno, I keep getting error "!!! Exception during processing!!! Error while deserializing header: InvalidHeaderDeserialization" when trying the FLUX workflow.

Can't immediately troubleshoot what's up :(
Did you pull the latest version of COMFY, I had a float error until I updated.

I also built CUPY but im not sure it matters it adds in CuBlass, part of my mission to get Flash Attention, Cutlass and Deepspeed working together.
It would be my only shot at using Flux at a decent speed
 
Last edited:

Sepheyer

Well-Known Member
Dec 21, 2020
1,542
3,667
Someone merged the into FP8, you need to use the newest version of COMFY

17 and 21 GB so if you have a 3090 or 4090 your golden.

For me its around 100 seconds per IT so I am out on using it.

Make sure your not using any crazy launch arguments like --force-fp32
If your AMD and using --directml, I don't think Olive has been updated to convert the model yet

I'm not sure if the defualt XFormers or --use-pytorch-cross-attention would be faster.




Did you pull the latest version of COMFY, I had a float error until I updated.

I also built CUPY but im not sure it matters it adds in CuBlass, part of my mission to get Flash Attention, Cutlass and Deepspeed working together.
It would be my only shot at using Flux at a decent speed
Indeed, I have the freshest CUI. Also, my card is 4070ti with 12GB, so maybe that's the reason it drops dead but it is not immediately clear - especially since I did that memory tweak. Welp, maybe this is rather a torch thing, gonna be reinstalling that when the muse strikes.
 

Synalon

Member
Jan 31, 2022
219
647
Someone merged the into FP8, you need to use the newest version of COMFY

17 and 21 GB so if you have a 3090 or 4090 your golden.

For me its around 100 seconds per IT so I am out on using it.

Make sure your not using any crazy launch arguments like --force-fp32
If your AMD and using --directml, I don't think Olive has been updated to convert the model yet

I'm not sure if the defualt XFormers or --use-pytorch-cross-attention would be faster.




Did you pull the latest version of COMFY, I had a float error until I updated.

I also built CUPY but im not sure it matters it adds in CuBlass, part of my mission to get Flash Attention, Cutlass and Deepspeed working together.
It would be my only shot at using Flux at a decent speed
2 minutes per iteration is considered to long for you?
 

felldude

Active Member
Aug 26, 2017
543
1,564
By force installing the following packages from and copying the 12.6 .dll files from CUDA and CuTensor

I was able to get a small increase 2-5% in IT's per second in Comfy (I think I could get more by building Torch myself with CUPY but I will wait for a programmer/engineer to do so)

nvidia_cublas_cu12-12.6.0.22-py3-none-win_amd64.whl
nvidia_cuda_opencl_cu12-12.6.37-py3-none-win_amd64.whl
nvidia_cudnn_cu12-9.3.0.75-py3-none-win_amd64.whl
nvidia_cusolver_cu12-11.6.4.38-py3-none-win_amd64.whl

Pip install --force-reinstall CUPY and the packs above when your venv is active

If you copy the 12.6 .dll's make sure NOT to copy cuinj64_126.dll or delete it from the Torch/lib folder after copying.

EDIT:

So I asked the person who uploaded the FLUX FP8 models, and they used --bf16 upcasting
Thus the message " model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16"

In my testing Xformers vs pytorch-cross-attention had no difference in speed but this could be do to the model not fitting into VRAM

Someone with access to a A100 or a system with a killer CPU and a ton of ram could convert the FLUX model to Dtype torch.int8
 
Last edited:

Sepheyer

Well-Known Member
Dec 21, 2020
1,542
3,667
If anybody wants to see what Flux can do just message me the Prompts, which version of Flux you want me to use along with the scheduler and sampler and amount of steps and I'll run a batch of images off for you.
What's the prettiest female the model can generate? Asking for a simplish midsized photo against a simple background.
 

Synalon

Member
Jan 31, 2022
219
647
What's the prettiest female the model can generate? Asking for a simplish midsized photo against a simple background.
This is using the default workflow provided with Flux, if you have a better workflow and want me to use that instead feel free to share it.

Prompt:
a HD photograph of a beautiful young woman in the countryside.
Heavenly Features. Glamour Photograph. Professional Make-up.

You don't have permission to view the spoiler content. Log in or register now.

You don't have permission to view the spoiler content. Log in or register now.

You don't have permission to view the spoiler content. Log in or register now.

You don't have permission to view the spoiler content. Log in or register now.

You don't have permission to view the spoiler content. Log in or register now.

You don't have permission to view the spoiler content. Log in or register now.

You don't have permission to view the spoiler content. Log in or register now.

You don't have permission to view the spoiler content. Log in or register now.

1504x1504

Euler and Simple.

I'll try the other Flux models I have as well and will update this post.

It does seem to me that longer more precise prompts will get better images, I'll try that later.

Changing the Scheduler and Sampler does tend to get better photorealistic images after a bit of experimentation, I'm not sure yet how to add negative prompts with flux as the basic workflow didn't have one but after I figure out how to add that I can probably get better images.
 
Last edited:

dranosty

New Member
Oct 12, 2021
12
1
Hi guys.

Do someone know how to reproduce this exact artstyle pls?

Few examples :




(There are many more artists with this artstyle)

I tried several models on Civitai but I can't get close to this... It's some kind of highres anime screencap with shiny skin, I wonder what model / loras they use...