[Stable Diffusion] Prompt Sharing and Learning Thread

Sepheyer

Well-Known Member
Dec 21, 2020
1,571
3,768
Someone merged the into FP8, you need to use the newest version of COMFY

17 and 21 GB so if you have a 3090 or 4090 your golden.

For me its around 100 seconds per IT so I am out on using it.

Make sure your not using any crazy launch arguments like --force-fp32
If your AMD and using --directml, I don't think Olive has been updated to convert the model yet

I'm not sure if the defualt XFormers or --use-pytorch-cross-attention would be faster.




Did you pull the latest version of COMFY, I had a float error until I updated.

I also built CUPY but im not sure it matters it adds in CuBlass, part of my mission to get Flash Attention, Cutlass and Deepspeed working together.
It would be my only shot at using Flux at a decent speed
Indeed, I have the freshest CUI. Also, my card is 4070ti with 12GB, so maybe that's the reason it drops dead but it is not immediately clear - especially since I did that memory tweak. Welp, maybe this is rather a torch thing, gonna be reinstalling that when the muse strikes.
 

Synalon

Member
Jan 31, 2022
225
663
Someone merged the into FP8, you need to use the newest version of COMFY

17 and 21 GB so if you have a 3090 or 4090 your golden.

For me its around 100 seconds per IT so I am out on using it.

Make sure your not using any crazy launch arguments like --force-fp32
If your AMD and using --directml, I don't think Olive has been updated to convert the model yet

I'm not sure if the defualt XFormers or --use-pytorch-cross-attention would be faster.




Did you pull the latest version of COMFY, I had a float error until I updated.

I also built CUPY but im not sure it matters it adds in CuBlass, part of my mission to get Flash Attention, Cutlass and Deepspeed working together.
It would be my only shot at using Flux at a decent speed
2 minutes per iteration is considered to long for you?
 

felldude

Active Member
Aug 26, 2017
572
1,695
By force installing the following packages from and copying the 12.6 .dll files from CUDA and CuTensor

I was able to get a small increase 2-5% in IT's per second in Comfy (I think I could get more by building Torch myself with CUPY but I will wait for a programmer/engineer to do so)

nvidia_cublas_cu12-12.6.0.22-py3-none-win_amd64.whl
nvidia_cuda_opencl_cu12-12.6.37-py3-none-win_amd64.whl
nvidia_cudnn_cu12-9.3.0.75-py3-none-win_amd64.whl
nvidia_cusolver_cu12-11.6.4.38-py3-none-win_amd64.whl

Pip install --force-reinstall CUPY and the packs above when your venv is active

If you copy the 12.6 .dll's make sure NOT to copy cuinj64_126.dll or delete it from the Torch/lib folder after copying.

EDIT:

So I asked the person who uploaded the FLUX FP8 models, and they used --bf16 upcasting
Thus the message " model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16"

In my testing Xformers vs pytorch-cross-attention had no difference in speed but this could be do to the model not fitting into VRAM

Someone with access to a A100 or a system with a killer CPU and a ton of ram could convert the FLUX model to Dtype torch.int8
 
Last edited:

Sepheyer

Well-Known Member
Dec 21, 2020
1,571
3,768
If anybody wants to see what Flux can do just message me the Prompts, which version of Flux you want me to use along with the scheduler and sampler and amount of steps and I'll run a batch of images off for you.
What's the prettiest female the model can generate? Asking for a simplish midsized photo against a simple background.
 

Synalon

Member
Jan 31, 2022
225
663
What's the prettiest female the model can generate? Asking for a simplish midsized photo against a simple background.
This is using the default workflow provided with Flux, if you have a better workflow and want me to use that instead feel free to share it.

Prompt:
a HD photograph of a beautiful young woman in the countryside.
Heavenly Features. Glamour Photograph. Professional Make-up.

You don't have permission to view the spoiler content. Log in or register now.

You don't have permission to view the spoiler content. Log in or register now.

You don't have permission to view the spoiler content. Log in or register now.

You don't have permission to view the spoiler content. Log in or register now.

You don't have permission to view the spoiler content. Log in or register now.

You don't have permission to view the spoiler content. Log in or register now.

You don't have permission to view the spoiler content. Log in or register now.

You don't have permission to view the spoiler content. Log in or register now.

1504x1504

Euler and Simple.

I'll try the other Flux models I have as well and will update this post.

It does seem to me that longer more precise prompts will get better images, I'll try that later.

Changing the Scheduler and Sampler does tend to get better photorealistic images after a bit of experimentation, I'm not sure yet how to add negative prompts with flux as the basic workflow didn't have one but after I figure out how to add that I can probably get better images.
 
Last edited:

dranosty

Newbie
Oct 12, 2021
21
13
Hi guys.

Do someone know how to reproduce this exact artstyle pls?

Few examples :




(There are many more artists with this artstyle)

I tried several models on Civitai but I can't get close to this... It's some kind of highres anime screencap with shiny skin, I wonder what model / loras they use...
 

felldude

Active Member
Aug 26, 2017
572
1,695
I merged the Schnell and Dev FP8 models and made these renders as an experiment.
Have you tried using the launch argument --use-pytorch-cross-attention in Comfy vs default Xfomers
I'm not sure any of the attention's are helping with the model being upcast.

The only attention's being updated are for linux.
I have manged to build the 1.0.9 version of Flash attention but I would need to build torch for it.
(Hopefully version will be supported by windows and torch)

Windows is a pain in the ass but I don't want to switch to Linux despite the DALI docker builds being pre-built with the best stuff.
 
Last edited:
  • Like
Reactions: Synalon and DD3DD

Synalon

Member
Jan 31, 2022
225
663
Have you tried using the launch argument --use-pytorch-cross-attention in Comfy vs default Xfomers
I'm not sure any of the attention's are helping with the model being upcast.

The only attention's being updated are for linux.
I have manged to build the 1.0.9 version of Flash attention but I would need to build torch for it.
(Hopefully version will be supported by windows and torch)

Windows is a pain in the ass but I don't want to switch to Linux despite the DALI docker builds being pre-built with the best stuff.
I haven't tried any extra arguments in Comfy yet, I'll give it a try tomorrow.
 

felldude

Active Member
Aug 26, 2017
572
1,695
Using the merge I made and messing about with the sampler and scheduler and some prompts I managed to get slightly better images.
I don't know if the attention would affect the image quality, but did you compare your Iterations per second vs XFormers
 

Synalon

Member
Jan 31, 2022
225
663
I don't know if the attention would affect the image quality, but did you compare your Iterations per second vs XFormers
It seems slower overall now, it was taking roughly 320 seconds for 4 images before now its taking 1200 seconds
 

felldude

Active Member
Aug 26, 2017
572
1,695
It seems slower overall now, it was taking roughly 320 seconds for 4 images before now its taking 1200 seconds
Interesting thanks for testing, in my test I had no speed difference on any model, But I do have the CuDnn files in the Comfy Torch build.

Good to know Xformers works for an FP8 model at least when its upcast with BF16
 
  • Like
Reactions: Synalon

Markbestmark

Member
Oct 14, 2018
302
330
I can't remember tbh, it was a long time since I experimented with it.

This was only a test I did to see how it would work with faceswap:

Source (right click and set to loop)The Result.. (right click and set to loop)
View attachment 3589592 View attachment 3589593

I made this by using the batch function in img2img. First open the source in photoshop, crop it to the right resolution and adjust the length of the video. Then export it as video frames (images). Put these in an input folder. Now you can use these video frames as input or source images in img2img and do what ever changes you wish, including using controlnet. Save the output images in an output folder. Now you need to put it back together into a video again. I used Flowframes. It can double the fps if you wish.
Tutorial:

Source (right click and set to loop)The Result.. (right click and set to loop)
View attachment 3589551 View attachment 3589555

Keep in mind that converting the files to webm is decreasing the quality.
I can't remember tbh, it was a long time since I experimented with it.

This was only a test I did to see how it would work with faceswap:

Source (right click and set to loop)The Result.. (right click and set to loop)
View attachment 3589592
View attachment 3589593

I made this by using the batch function in img2img. First open the source in photoshop, crop it to the right resolution and adjust the length of the video. Then export it as video frames (images). Put these in an input folder. Now you can use these video frames as input or source images in img2img and do what ever changes you wish, including using controlnet. Save the output images in an output folder. Now you need to put it back together into a video again. I used Flowframes. It can double the fps if you wish.
Tutorial:

Source (right click and set to loop)The Result.. (right click and set to loop)
View attachment 3589551
View attachment 3589555

Keep in mind that converting the files to webm is decreasing the quality.
I think for the faceswaps it's easier just to use face fusion, and the result should be good. I do love though how you made vide2video by actually making a frames animation. I was trying to do something simillar but tried only with cartoons. but my results are not stable at all. Could you give some tips for me? test_animation.gif
View attachment 1st_PASS_00041.mp4
View attachment 1st_PASS_00039.mp4
tom_holland_legolas_youtube_00115_.png tom_holland_legolas_youtube_00114_.png