[Stable Diffusion] Prompt Sharing and Learning Thread

Sepheyer

Well-Known Member
Dec 21, 2020
1,523
3,589
First of all i've never used Comfyui before so probably a lot of basics done horribly wrong, even more than usual.
Second, never used SDXL so no idea how prompting differs.
But it was the only thing i could get the model to even load in without OOM so needs must...
So with the ideal situation of using multiple unknowns i don't really no if the base model is working correctly, the UI setup is even remotely behaving well, nor if the refiner being applied in any way close to what it's meant to.

So here some test images, base and refiner "pairs"...
View attachment 2760431 View attachment 2760432

View attachment 2760455 View attachment 2760456

just a base image to show that there still seem to be an issue with multiple subjects (didn't try to fix it with just prompts) the rest of the image didn't seem too bad thought.
View attachment 2760451
Nice! I am struggling to see the purpose for the SDXL. I watched a few videos, but still am in WTF mode. Hear me out. I think what the SDXL actually under the hood is is an upscaler workflow. But the user gets stuck with a single model. Naturally, if there are other applications, I am merely uninformed.

But if one picks up SDXL for upscales, then ComfyUI already has a bunch of approaches that one can mix and match based on one's own machine capabilities, desired rendering time, and model preference. These approaches are posted on Civitai, sometimes with "ComfyUI" tag or somesuch. So, I think the solutions are already there and none require a cool 20gb download and get you stuck on a single model.

I attached my go-to upscale CUI approach using your prompt but Zovya's RPGArtist model:

a_12500_.png
If you pop this into CUI you'll see that you can add upscale blocks up until your VRAM faints. Again, granted I heard about SDXL entire long minute ago, I'd say using any other but SDXL upscale methods where you have flexibility of changing the actual model is more valuable (at least to me). Tho surprisingly three other models I tried produce notably crappier results which I am attaching for science:
You don't have permission to view the spoiler content. Log in or register now.
And one more render using OpenPose ControlNet with the Zovya's RPGArtist model:

a_12510_.png

What I would love to know if anyone has a ComfyUI workflow for these kind of .

I picked these from here:



(all the way in the end)

I can't figure out a way to map these into ComfyUI. Naturally, I mean this in a certain context - that there are tile-like workarounds that would let me trade limited VRAM for time yet upscale a fukton into 10k.
 
  • Like
Reactions: Mr-Fox

me3

Member
Dec 31, 2016
316
708
SDXL is meant to be much better at prompting, like needing less/no negative prompts and being better at understanding shorter tagging etc.
Bit hard for me to tell if that's the case since everything i used were new to me and quickest way to deal with it was to throw weighting and negatives, it's also version 0.9 so it's probably safe to assume there's some issues.
Smallest size you can use it at is >1024x and even at that it seemed like there was some face crushing going on, but that could also be something i was/wasn't doing.
I did notice that it was mainly drawing full body shots, usually when you have square images other models zoom in to fill the width with as much of the character as possible and it actually filled in quite a bit of background on its own. I might have been "lucky" with seeds.

Even in the little time i spent there is clearly issues with SDXL, the size is obviously one, both in GB and vram needs. Still have a way to go to catch some LLM models :p
Finger seems to be an issue, same with the distorted faces at "range", but since it's meant to be very well suited for finetuning this should be fixable for those that do that awesome work and create models. It's also not gonna have sd2's issues with nsfw since they claim it should just be a simple matter of finetuning. Almost sounded like they'd done it, but couldn't/wouldn't release it just to avoid backlash.

I'm sure there will be a bunch of fixes and optimising done as there's been with other base models/systems time will tell.
There's more than enough things still needing optimising in older version, but we've all seen the improvements so far.
As an example, even on my bad 6gb card i can batch 1024x images at 8 (with a bit of luck), meaning 8 images generated at once, but i can't use highres.fix to "upscale" to 1024 in anything else than just single image. So some "features" have some way to go in older stuff too
 

Sepheyer

Well-Known Member
Dec 21, 2020
1,523
3,589
Upscaler Tips

So, I was pondering. A latent with 100 steps is markedly larger, takes more memory than a latent with 20 steps. May be I incorrectly attribute it to memory, but those refinement steps are not free, you keep paying for them even after you ran them and they are sitting inside your latent. When you manipulate the latent that has more steps you keep paying for those extra steps.

I emperically arrived at it when reducing the refinement in an upscale workflow. The first latent had 18 iterations, the upscale latent denoised 0.5 and ran 7 iterations more.

Turns out the workflow executes exponentially faster when the first latent has less refinement steps. Hmmm.

refinement.png

So, naturally, each "refinement step" is probably a big ass vector/matrix that the GPU adds to the previous already large collection of big ass vectors to start with.

Which made me re-try a resolution I never had enough memory for: 1536 x 2304. This time I lowered the steps and it worked.

A 1536 x 2304 image on a 6GB card, 13/6 steps, 17 minutes to render:

a_12715_.png

The point of the exercise was that I never knew that the extra steps do limit one's ability to upscale an image.
 

devilkkw

Member
Mar 17, 2021
284
965
interesting, need to try this.

Guy's, during my testing on my , i found a trick these seem really work great.
Sometimes we need to weight it, but some weight put bad result.
After some test i found weight it then re-add it without weight work like a charm.
Adding (kkw-new-neg-v1.4:1.8) kkw-new-neg-v1.4 work's better then adding only weighted.
Also work adding it multiple time without weight.
Did someone have made some test on textual inversion like these?
Testing it also on positive, had same result, adding textual inversion in weighted followed with non weighted seem really useful.


I hope it work in other Textual inverison.
 

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793
Upscaler Tips

So, I was pondering. A latent with 100 steps is markedly larger, takes more memory than a latent with 20 steps. May be I incorrectly attribute it to memory, but those refinement steps are not free, you keep paying for them even after you ran them and they are sitting inside your latent. When you manipulate the latent that has more steps you keep paying for those extra steps.

I emperically arrived at it when reducing the refinement in an upscale workflow. The first latent had 18 iterations, the upscale latent denoised 0.5 and ran 7 iterations more.

Turns out the workflow executes exponentially faster when the first latent has less refinement steps. Hmmm.

View attachment 2764860

So, naturally, each "refinement step" is probably a big ass vector/matrix that the GPU adds to the previous already large collection of big ass vectors to start with.

Which made me re-try a resolution I never had enough memory for: 1536 x 2304. This time I lowered the steps and it worked.

A 1536 x 2304 image on a 6GB card, 13/6 steps, 17 minutes to render:

View attachment 2764848

The point of the exercise was that I never knew that the extra steps do limit one's ability to upscale an image.
I could not replicate this with hiresfix. Just to be clear. Did you talk about "normal" upscalers? I have an overclocked GTX1070 with 8Gb vram and I'm stuck with 1280x1920. I can crank up the sampling steps and hires steps, it only takes ages but with very low amount of steps I can't get over that resolution without getting cuda memory error.
 
  • Like
Reactions: Sepheyer

me3

Member
Dec 31, 2016
316
708
Comfy seems to do a few things differently including how it loads models. IE it can load the ~12gb SDXL base model in less than 6gb vram, while a1111 and the SD.next fork can't even load the pruned 7gb version without OOM.
Looking at the operations i'm guessing a way to describe what gets done is that the first steps generate one image, then that image is used in a img2img way and the final steps are applied to it.
So to replicate it in A1111 you'd probably need to pass on the image to that and and apply the "finishing touches". Not really used that so don't know how or if it would work
 
  • Like
Reactions: Mr-Fox

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793
Comfy seems to do a few things differently including how it loads models. IE it can load the ~12gb SDXL base model in less than 6gb vram, while a1111 and the SD.next fork can't even load the pruned 7gb version without OOM.
Looking at the operations i'm guessing a way to describe what gets done is that the first steps generate one image, then that image is used in a img2img way and the final steps are applied to it.
So to replicate it in A1111 you'd probably need to pass on the image to that and and apply the "finishing touches". Not really used that so don't know how or if it would work
AFAIK, upscaling in img2img doesn't work like hiresfix. Hiresfix is part of the generative process and "creates" new pixels and thus improves the image quality while "normal" upscaling can't "invent" pixels that aren't already there, so it only makes the image larger without the same bump in quality. So in my opinion and many other's, hiresfix is superior. This is why I wanted to try to replicate what seph had discovered but with hiresfix. I'm sticking to A1111 for now, I never got used to the node system in any of the many softwares I have messed with. Blender, 4d wrap etc.
 
  • Like
Reactions: Sepheyer

Mimic22

Newbie
Jul 2, 2018
25
10
Hi, i have a question, if i take several pictures of a real life person, do you think i should be able to create a model of that person ?
 

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793
Hi, i have a question, if i take several pictures of a real life person, do you think i should be able to create a model of that person ?
Yes, this is what a Lora or Textual Inversion is. If you are proficient with SD, this would be the next step. If you search in this thread you can find a lot of information and links about this. I recommend to read the awesome guide by Schlongborn that you can find the link to on the first page if you decide to try it out.
 
  • Like
Reactions: Sepheyer

Sepheyer

Well-Known Member
Dec 21, 2020
1,523
3,589
I could not replicate this with hiresfix. Just to be clear. Did you talk about "normal" upscalers? I have an overclocked GTX1070 with 8Gb vram and I'm stuck with 1280x1920. I can crank up the sampling steps and hires steps, it only takes ages but with very low amount of steps I can't get over that resolution without getting cuda memory error.
I could easily draw incorrect conclusion from my observations. May be steps don't matter at all. I won't be surprised to eventually reach that conclusion.

So, here is the proof that back in the day I couldn't render 1536x2304 using ComfyUI:

"Not enough memory to render 1536x2304 on a 6gb GPU."
https://f95zone.to/threads/ai-art-show-us-your-ai-skill.138575/post-10795995

Naturally, the issue there might be a different workflow where instead of upscaling 512x768 > 1536x2304 I actually upscaled the original 4 times with 1.5 zoom increments. May be the pipeline itself was consuming too much memory storing four different latents for each upscaler.
 
  • Like
Reactions: Mr-Fox

me3

Member
Dec 31, 2016
316
708
When i tested how large images i could get with SDXL and when it would start making duplicate subjects, i noticed that Comfyui automatically kicked in tiled diffusion when it got an OOM. Not sure what version i was using compared to yours, but could be possible that's what was involved, or tiled vae. That drastically affects speed, but it was shown in my console output so unless there's a version difference or some suppressed output it should have been mentioned for you as well
 
  • Like
Reactions: Mr-Fox

Sepheyer

Well-Known Member
Dec 21, 2020
1,523
3,589
When i tested how large images i could get with SDXL and when it would start making duplicate subjects, i noticed that Comfyui automatically kicked in tiled diffusion when it got an OOM. Not sure what version i was using compared to yours, but could be possible that's what was involved, or tiled vae. That drastically affects speed, but it was shown in my console output so unless there's a version difference or some suppressed output it should have been mentioned for you as well
Yes, I saw VAE switched to tile VAE during my ControlNet experiments, but that was a different thing. The post above about that "out of memory error" -- the error was generated not at the VAE level but when the 1536x2304 latent was passed into respective sampler.

It literally kept failing at the sampler level. I lowered the latent to some odd value but gave up after a few tries as I wasn't finding the "breakeven".
 
Last edited:
  • Like
Reactions: Mr-Fox

me3

Member
Dec 31, 2016
316
708
Comfy UI inside a1111 is released. check .
Not sure that's worth using compared to just setting it up separate. If it still uses a1111 to load and use models you won't get any of those advantages.
If you want to use that type of UI it'd be much easier to just install comfy, single download and launch for windows (might be for other os too) and you edit one line in a config file to make it read your models/loras/embeddings from a1111 or other installs so you don't need symlinks or copies of that.
 
  • Like
Reactions: Mr-Fox

devilkkw

Member
Mar 17, 2021
284
965
i'm testing it now, it launch itself server and run as self. tab is only a frame page of comfyUi, good is with extension is all configured and ready to use with all model you have.
I'm able to use one model in a1111 and another in comfyUI.
I think is useful for fast comparing prompt and result.
Actually seem not different in perfomance, but i'm totally noob on comfy, i'm just curius on how it work and for me a1111 is my standard.
 
  • Like
Reactions: Mr-Fox and Sepheyer

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793
i'm testing it now, it launch itself server and run as self. tab is only a frame page of comfyUi, good is with extension is all configured and ready to use with all model you have.
I'm able to use one model in a1111 and another in comfyUI.
I think is useful for fast comparing prompt and result.
Actually seem not different in perfomance, but i'm totally noob on comfy, i'm just curius on how it work and for me a1111 is my standard.
I would be curious to try it for the benefit of making widescreen images. Can the extension in A1111 do widescreen? And with hiresfix? With the subject standing? This would be the ideal.
 

me3

Member
Dec 31, 2016
316
708
I would be curious to try it for the benefit of making widescreen images. Can the extension in A1111 do widescreen? And with hiresfix? With the subject standing? This would be the ideal.
do you have an example image of the type of thing you're after? obviously doesn't need to be anything generated, just to have something to ballpark it