Someone merged the
You must be registered to see the links
into FP8, you need to use the newest version of COMFY
17 and 21 GB so if you have a 3090 or 4090 your golden.
For me its around 100 seconds per IT so I am out on using it.
Make sure your not using any crazy launch arguments like --force-fp32
If your AMD and using --directml, I don't think Olive has been updated to convert the model yet
I'm not sure if the defualt XFormers or --use-pytorch-cross-attention would be faster.
Did you pull the latest version of COMFY, I had a float error until I updated.
I also built CUPY but im not sure it matters it adds in CuBlass, part of my mission to get Flash Attention, Cutlass and Deepspeed working together.
It would be my only shot at using Flux at a decent speed