[Stable Diffusion] Prompt Sharing and Learning Thread

Sepheyer · Aug 5, 2024

felldude said:
Someone merged the
You must be registered to see the links
into FP8, you need to use the newest version of COMFY

17 and 21 GB so if you have a 3090 or 4090 your golden.

For me its around 100 seconds per IT so I am out on using it.

Make sure your not using any crazy launch arguments like --force-fp32
If your AMD and using --directml, I don't think Olive has been updated to convert the model yet

I'm not sure if the defualt XFormers or --use-pytorch-cross-attention would be faster.

Did you pull the latest version of COMFY, I had a float error until I updated.

I also built CUPY but im not sure it matters it adds in CuBlass, part of my mission to get Flash Attention, Cutlass and Deepspeed working together.
It would be my only shot at using Flux at a decent speed

Indeed, I have the freshest CUI. Also, my card is 4070ti with 12GB, so maybe that's the reason it drops dead but it is not immediately clear - especially since I did that memory tweak. Welp, maybe this is rather a torch thing, gonna be reinstalling that when the muse strikes.

Synalon · Aug 5, 2024

felldude said:
Someone merged the
You must be registered to see the links
into FP8, you need to use the newest version of COMFY

17 and 21 GB so if you have a 3090 or 4090 your golden.

For me its around 100 seconds per IT so I am out on using it.

Make sure your not using any crazy launch arguments like --force-fp32
If your AMD and using --directml, I don't think Olive has been updated to convert the model yet

I'm not sure if the defualt XFormers or --use-pytorch-cross-attention would be faster.

Did you pull the latest version of COMFY, I had a float error until I updated.

I also built CUPY but im not sure it matters it adds in CuBlass, part of my mission to get Flash Attention, Cutlass and Deepspeed working together.
It would be my only shot at using Flux at a decent speed

2 minutes per iteration is considered to long for you?

felldude · Aug 5, 2024

Synalon said:
2 minutes per iteration is considered to long for you?

Extremely that would be 30 minutes per image compared to 20 seconds for PONY or XL

1 image to 150 is not a good ratio for me lol

Synalon · Aug 5, 2024

felldude said:
Extremely that would be 30 minutes per image compared to 20 seconds for PONY or XL

1 image to 150 is not a good ratio for me lol

I see, but iteration I thought you meant for the entire image.

It takes longer than usual for me but not to the 30 minute mark. I can say though that Flux isn't doing to well with nipples for me.

Synalon · Aug 5, 2024

Other Random Flux Images.

You don't have permission to view the spoiler content. Log in or register now.

felldude · Aug 6, 2024

By force installing the following packages from

You must be registered to see the links

and copying the 12.6 .dll files from CUDA and CuTensor

I was able to get a small increase 2-5% in IT's per second in Comfy (I think I could get more by building Torch myself with CUPY but I will wait for a programmer/engineer to do so)

nvidia_cublas_cu12-12.6.0.22-py3-none-win_amd64.whl
nvidia_cuda_opencl_cu12-12.6.37-py3-none-win_amd64.whl
nvidia_cudnn_cu12-9.3.0.75-py3-none-win_amd64.whl
nvidia_cusolver_cu12-11.6.4.38-py3-none-win_amd64.whl

Pip install --force-reinstall CUPY and the packs above when your venv is active

If you copy the 12.6 .dll's make sure NOT to copy cuinj64_126.dll or delete it from the Torch/lib folder after copying.

EDIT:

So I asked the person who uploaded the FLUX FP8 models, and they used --bf16 upcasting
Thus the message " model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16"

In my testing Xformers vs pytorch-cross-attention had no difference in speed but this could be do to the model not fitting into VRAM

Someone with access to a A100 or a system with a killer CPU and a ton of ram could convert the FLUX model to Dtype torch.int8

Synalon · Aug 6, 2024

If anybody wants to see what Flux can do just message me the Prompts, which version of Flux you want me to use along with the scheduler and sampler and amount of steps and I'll run a batch of images off for you.

Sepheyer · Aug 6, 2024

Synalon said:
If anybody wants to see what Flux can do just message me the Prompts, which version of Flux you want me to use along with the scheduler and sampler and amount of steps and I'll run a batch of images off for you.

What's the prettiest female the model can generate? Asking for a simplish midsized photo against a simple background.

Synalon · Aug 6, 2024

Sepheyer said:
What's the prettiest female the model can generate? Asking for a simplish midsized photo against a simple background.

This is using the default workflow provided with Flux, if you have a better workflow and want me to use that instead feel free to share it.

Prompt:
a HD photograph of a beautiful young woman in the countryside.
Heavenly Features. Glamour Photograph. Professional Make-up.

You don't have permission to view the spoiler content. Log in or register now.

1504x1504

Euler and Simple.

I'll try the other Flux models I have as well and will update this post.

It does seem to me that longer more precise prompts will get better images, I'll try that later.

Changing the Scheduler and Sampler does tend to get better photorealistic images after a bit of experimentation, I'm not sure yet how to add negative prompts with flux as the basic workflow didn't have one but after I figure out how to add that I can probably get better images.

dranosty · Aug 7, 2024

Hi guys.

Do someone know how to reproduce this exact artstyle pls?

Few examples :

You must be registered to see the links

(There are many more artists with this artstyle)

I tried several models on Civitai but I can't get close to this... It's some kind of highres anime screencap with shiny skin, I wonder what model / loras they use...

Synalon · Aug 7, 2024

Using image to image, ddim sampler, ddim scheduler and flux.1_schnell_16x16-marduk191.

Synalon · Aug 7, 2024

I merged the Schnell and Dev FP8 models and made these renders as an experiment.

You don't have permission to view the spoiler content. Log in or register now.

Edit: Randomly Added new images.

felldude · Aug 7, 2024

Synalon said:
I merged the Schnell and Dev FP8 models and made these renders as an experiment.

Have you tried using the launch argument --use-pytorch-cross-attention in Comfy vs default Xfomers
I'm not sure any of the attention's are helping with the model being upcast.

The only attention's being updated are for linux.
I have manged to build the 1.0.9 version of Flash attention but I would need to build torch for it.
(Hopefully version

You must be registered to see the links

will be supported by windows and torch)

Windows is a pain in the ass but I don't want to switch to Linux despite the DALI docker builds being pre-built with the best stuff.

Synalon · Aug 7, 2024

felldude said:
Have you tried using the launch argument --use-pytorch-cross-attention in Comfy vs default Xfomers
I'm not sure any of the attention's are helping with the model being upcast.

The only attention's being updated are for linux.
I have manged to build the 1.0.9 version of Flash attention but I would need to build torch for it.
(Hopefully version
You must be registered to see the links
will be supported by windows and torch)

Windows is a pain in the ass but I don't want to switch to Linux despite the DALI docker builds being pre-built with the best stuff.

I haven't tried any extra arguments in Comfy yet, I'll give it a try tomorrow.

felldude · Aug 7, 2024

Synalon said:
I haven't tried any extra arguments in Comfy yet, I'll give it a try tomorrow.

It is supposed to activate the CuDNN attention, If we get a windows build of 3 it should use CuBlass for those using an Ada Lovelace card

Synalon · Aug 8, 2024

Using the merge I made and messing about with the sampler and scheduler and some prompts I managed to get slightly better images.

You don't have permission to view the spoiler content. Log in or register now.

felldude · Aug 8, 2024

Synalon said:
Using the merge I made and messing about with the sampler and scheduler and some prompts I managed to get slightly better images.

I don't know if the attention would affect the image quality, but did you compare your Iterations per second vs XFormers

Synalon · Aug 8, 2024

felldude said:
I don't know if the attention would affect the image quality, but did you compare your Iterations per second vs XFormers

It seems slower overall now, it was taking roughly 320 seconds for 4 images before now its taking 1200 seconds

felldude · Aug 8, 2024

Synalon said:
It seems slower overall now, it was taking roughly 320 seconds for 4 images before now its taking 1200 seconds

Interesting thanks for testing, in my test I had no speed difference on any model, But I do have the CuDnn files in the Comfy Torch build.

Good to know Xformers works for an FP8 model at least when its upcast with BF16

Markbestmark · Aug 10, 2024

Mr-Fox said:
I can't remember tbh, it was a long time since I experimented with it.

This was only a test I did to see how it would work with faceswap:

Source (right click and set to loop) The Result.. (right click and set to loop)
View attachment 3589592 View attachment 3589593

I made this by using the batch function in img2img. First open the source in photoshop, crop it to the right resolution and adjust the length of the video. Then export it as video frames (images). Put these in an input folder. Now you can use these video frames as input or source images in img2img and do what ever changes you wish, including using controlnet. Save the output images in an output folder. Now you need to put it back together into a video again. I used Flowframes. It can double the fps if you wish.
Tutorial:
You must be registered to see the links

Source (right click and set to loop) The Result.. (right click and set to loop)
View attachment 3589551 View attachment 3589555

Keep in mind that converting the files to webm is decreasing the quality.

Mr-Fox said:
I can't remember tbh, it was a long time since I experimented with it.

This was only a test I did to see how it would work with faceswap:

Source (right click and set to loop) The Result.. (right click and set to loop)
View attachment 3589592
View attachment 3589593

I made this by using the batch function in img2img. First open the source in photoshop, crop it to the right resolution and adjust the length of the video. Then export it as video frames (images). Put these in an input folder. Now you can use these video frames as input or source images in img2img and do what ever changes you wish, including using controlnet. Save the output images in an output folder. Now you need to put it back together into a video again. I used Flowframes. It can double the fps if you wish.
Tutorial:
You must be registered to see the links

Source (right click and set to loop) The Result.. (right click and set to loop)
View attachment 3589551
View attachment 3589555

Keep in mind that converting the files to webm is decreasing the quality.

I think for the faceswaps it's easier just to use face fusion, and the result should be good. I do love though how you made vide2video by actually making a frames animation. I was trying to do something simillar but tried only with cartoons. but my results are not stable at all. Could you give some tips for me?

View attachment 1st_PASS_00041.mp4
View attachment 1st_PASS_00039.mp4

Source (right click and set to loop)	The Result.. (right click and set to loop)
View attachment 3589592	View attachment 3589593

Source (right click and set to loop)	The Result.. (right click and set to loop)
View attachment 3589551	View attachment 3589555

[Stable Diffusion] Prompt Sharing and Learning Thread

Well-Known Member

Member

Active Member

Member

Member

Active Member

Member

Well-Known Member

Member

Newbie

Member

Member

Active Member

Member

Active Member

Member

Active Member

Member

Active Member

Member