[Stable Diffusion] Prompt Sharing and Learning Thread

Sepheyer · Jul 11, 2023

me3 said:
First of all i've never used Comfyui before so probably a lot of basics done horribly wrong, even more than usual.
Second, never used SDXL so no idea how prompting differs.
But it was the only thing i could get the model to even load in without OOM so needs must...
So with the ideal situation of using multiple unknowns i don't really no if the base model is working correctly, the UI setup is even remotely behaving well, nor if the refiner being applied in any way close to what it's meant to.

So here some test images, base and refiner "pairs"...
View attachment 2760431 View attachment 2760432

View attachment 2760455 View attachment 2760456

just a base image to show that there still seem to be an issue with multiple subjects (didn't try to fix it with just prompts) the rest of the image didn't seem too bad thought.
View attachment 2760451

Nice! I am struggling to see the purpose for the SDXL. I watched a few videos, but still am in WTF mode. Hear me out. I think what the SDXL actually under the hood is is an upscaler workflow. But the user gets stuck with a single model. Naturally, if there are other applications, I am merely uninformed.

But if one picks up SDXL for upscales, then ComfyUI already has a bunch of approaches that one can mix and match based on one's own machine capabilities, desired rendering time, and model preference. These approaches are posted on Civitai, sometimes with "ComfyUI" tag or somesuch. So, I think the solutions are already there and none require a cool 20gb download and get you stuck on a single model.

I attached my go-to upscale CUI approach using your prompt but Zovya's RPGArtist model:

If you pop this into CUI you'll see that you can add upscale blocks up until your VRAM faints. Again, granted I heard about SDXL entire long minute ago, I'd say using any other but SDXL upscale methods where you have flexibility of changing the actual model is more valuable (at least to me). Tho surprisingly three other models I tried produce notably crappier results which I am attaching for science:

You don't have permission to view the spoiler content. Log in or register now.

And one more render using OpenPose ControlNet with the Zovya's RPGArtist model:

What I would love to know if anyone has a ComfyUI workflow for these kind of

You must be registered to see the links

.

I picked these from here:

You must be registered to see the links

(all the way in the end)

I can't figure out a way to map these into ComfyUI. Naturally, I mean this in a certain context - that there are tile-like workarounds that would let me trade limited VRAM for time yet upscale a fukton into 10k.

felldude · Jul 11, 2023

You must be registered to see the links

can be used in Stable-Diffusion Webui

If you download the pretrained 4x model you can rename it to ERGAN4x (Not 4x+) and use it in stable diffusion

me3 · Jul 11, 2023

SDXL is meant to be much better at prompting, like needing less/no negative prompts and being better at understanding shorter tagging etc.
Bit hard for me to tell if that's the case since everything i used were new to me and quickest way to deal with it was to throw weighting and negatives, it's also version 0.9 so it's probably safe to assume there's some issues.
Smallest size you can use it at is >1024x and even at that it seemed like there was some face crushing going on, but that could also be something i was/wasn't doing.
I did notice that it was mainly drawing full body shots, usually when you have square images other models zoom in to fill the width with as much of the character as possible and it actually filled in quite a bit of background on its own. I might have been "lucky" with seeds.

Even in the little time i spent there is clearly issues with SDXL, the size is obviously one, both in GB and vram needs. Still have a way to go to catch some LLM models

Finger seems to be an issue, same with the distorted faces at "range", but since it's meant to be very well suited for finetuning this should be fixable for those that do that awesome work and create models. It's also not gonna have sd2's issues with nsfw since they claim it should just be a simple matter of finetuning. Almost sounded like they'd done it, but couldn't/wouldn't release it just to avoid backlash.

I'm sure there will be a bunch of fixes and optimising done as there's been with other base models/systems time will tell.
There's more than enough things still needing optimising in older version, but we've all seen the improvements so far.
As an example, even on my bad 6gb card i can batch 1024x images at 8 (with a bit of luck), meaning 8 images generated at once, but i can't use highres.fix to "upscale" to 1024 in anything else than just single image. So some "features" have some way to go in older stuff too

me3 · Jul 11, 2023

Since i mentioned SDXL still having issues with a basic concept as "fingers" i guess this illustrates it

No idea what's really going on but does seem like "something" is getting cooked...

Changed element and did some more tests, some turned out usable

Sepheyer · Jul 12, 2023

Upscaler Tips

So, I was pondering. A latent with 100 steps is markedly larger, takes more memory than a latent with 20 steps. May be I incorrectly attribute it to memory, but those refinement steps are not free, you keep paying for them even after you ran them and they are sitting inside your latent. When you manipulate the latent that has more steps you keep paying for those extra steps.

I emperically arrived at it when reducing the refinement in an upscale workflow. The first latent had 18 iterations, the upscale latent denoised 0.5 and ran 7 iterations more.

Turns out the workflow executes exponentially faster when the first latent has less refinement steps. Hmmm.

So, naturally, each "refinement step" is probably a big ass vector/matrix that the GPU adds to the previous already large collection of big ass vectors to start with.

Which made me re-try a resolution I never had enough memory for: 1536 x 2304. This time I lowered the steps and it worked.

A 1536 x 2304 image on a 6GB card, 13/6 steps, 17 minutes to render:

The point of the exercise was that I never knew that the extra steps do limit one's ability to upscale an image.

devilkkw · Jul 12, 2023

interesting, need to try this.

Guy's, during my testing on my

You must be registered to see the links

, i found a trick these seem really work great.
Sometimes we need to weight it, but some weight put bad result.
After some test i found weight it then re-add it without weight work like a charm.
Adding (kkw-new-neg-v1.4:1.8) kkw-new-neg-v1.4 work's better then adding only weighted.
Also work adding it multiple time without weight.
Did someone have made some test on textual inversion like these?
Testing it also on positive, had same result, adding textual inversion in weighted followed with non weighted seem really useful.

I hope it work in other Textual inverison.

Mr-Fox · Jul 12, 2023

Sepheyer said:
Upscaler Tips

So, I was pondering. A latent with 100 steps is markedly larger, takes more memory than a latent with 20 steps. May be I incorrectly attribute it to memory, but those refinement steps are not free, you keep paying for them even after you ran them and they are sitting inside your latent. When you manipulate the latent that has more steps you keep paying for those extra steps.

I emperically arrived at it when reducing the refinement in an upscale workflow. The first latent had 18 iterations, the upscale latent denoised 0.5 and ran 7 iterations more.

Turns out the workflow executes exponentially faster when the first latent has less refinement steps. Hmmm.

View attachment 2764860

So, naturally, each "refinement step" is probably a big ass vector/matrix that the GPU adds to the previous already large collection of big ass vectors to start with.

Which made me re-try a resolution I never had enough memory for: 1536 x 2304. This time I lowered the steps and it worked.

A 1536 x 2304 image on a 6GB card, 13/6 steps, 17 minutes to render:

View attachment 2764848

The point of the exercise was that I never knew that the extra steps do limit one's ability to upscale an image.

I could not replicate this with hiresfix. Just to be clear. Did you talk about "normal" upscalers? I have an overclocked GTX1070 with 8Gb vram and I'm stuck with 1280x1920. I can crank up the sampling steps and hires steps, it only takes ages but with very low amount of steps I can't get over that resolution without getting cuda memory error.

me3 · Jul 12, 2023

Comfy seems to do a few things differently including how it loads models. IE it can load the ~12gb SDXL base model in less than 6gb vram, while a1111 and the SD.next fork can't even load the pruned 7gb version without OOM.
Looking at the operations i'm guessing a way to describe what gets done is that the first steps generate one image, then that image is used in a img2img way and the final steps are applied to it.
So to replicate it in A1111 you'd probably need to pass on the image to that and and apply the "finishing touches". Not really used that so don't know how or if it would work

Mr-Fox · Jul 13, 2023

me3 said:
Comfy seems to do a few things differently including how it loads models. IE it can load the ~12gb SDXL base model in less than 6gb vram, while a1111 and the SD.next fork can't even load the pruned 7gb version without OOM.
Looking at the operations i'm guessing a way to describe what gets done is that the first steps generate one image, then that image is used in a img2img way and the final steps are applied to it.
So to replicate it in A1111 you'd probably need to pass on the image to that and and apply the "finishing touches". Not really used that so don't know how or if it would work

AFAIK, upscaling in img2img doesn't work like hiresfix. Hiresfix is part of the generative process and "creates" new pixels and thus improves the image quality while "normal" upscaling can't "invent" pixels that aren't already there, so it only makes the image larger without the same bump in quality. So in my opinion and many other's, hiresfix is superior. This is why I wanted to try to replicate what seph had discovered but with hiresfix. I'm sticking to A1111 for now, I never got used to the node system in any of the many softwares I have messed with. Blender, 4d wrap etc.

Mimic22 · Jul 13, 2023

Hi, i have a question, if i take several pictures of a real life person, do you think i should be able to create a model of that person ?

Mr-Fox · Jul 13, 2023

Mimic22 said:
Hi, i have a question, if i take several pictures of a real life person, do you think i should be able to create a model of that person ?

Yes, this is what a Lora or Textual Inversion is. If you are proficient with SD, this would be the next step. If you search in this thread you can find a lot of information and links about this. I recommend to read the awesome guide by Schlongborn that you can find the link to on the first page if you decide to try it out.

Sepheyer · Jul 13, 2023

Mr-Fox said:
I could not replicate this with hiresfix. Just to be clear. Did you talk about "normal" upscalers? I have an overclocked GTX1070 with 8Gb vram and I'm stuck with 1280x1920. I can crank up the sampling steps and hires steps, it only takes ages but with very low amount of steps I can't get over that resolution without getting cuda memory error.

I could easily draw incorrect conclusion from my observations. May be steps don't matter at all. I won't be surprised to eventually reach that conclusion.

So, here is the proof that back in the day I couldn't render 1536x2304 using ComfyUI:

"Not enough memory to render 1536x2304 on a 6gb GPU."
https://f95zone.to/threads/ai-art-show-us-your-ai-skill.138575/post-10795995

Naturally, the issue there might be a different workflow where instead of upscaling 512x768 > 1536x2304 I actually upscaled the original 4 times with 1.5 zoom increments. May be the pipeline itself was consuming too much memory storing four different latents for each upscaler.

me3 · Jul 13, 2023

When i tested how large images i could get with SDXL and when it would start making duplicate subjects, i noticed that Comfyui automatically kicked in tiled diffusion when it got an OOM. Not sure what version i was using compared to yours, but could be possible that's what was involved, or tiled vae. That drastically affects speed, but it was shown in my console output so unless there's a version difference or some suppressed output it should have been mentioned for you as well

Mr-Fox · Jul 13, 2023

Rub.. rub.. scrub.. scrub.. Jiggle.. Jiggle..

Sepheyer · Jul 13, 2023

me3 said:
When i tested how large images i could get with SDXL and when it would start making duplicate subjects, i noticed that Comfyui automatically kicked in tiled diffusion when it got an OOM. Not sure what version i was using compared to yours, but could be possible that's what was involved, or tiled vae. That drastically affects speed, but it was shown in my console output so unless there's a version difference or some suppressed output it should have been mentioned for you as well

Yes, I saw VAE switched to tile VAE during my ControlNet experiments, but that was a different thing. The post above about that "out of memory error" -- the error was generated not at the VAE level but when the 1536x2304 latent was passed into respective sampler.

It literally kept failing at the sampler level. I lowered the latent to some odd value but gave up after a few tries as I wasn't finding the "breakeven".

devilkkw · Jul 14, 2023

Comfy UI inside a1111 is released. check

You must be registered to see the links

.

me3 · Jul 14, 2023

devilkkw said:
Comfy UI inside a1111 is released. check
You must be registered to see the links
.

Not sure that's worth using compared to just setting it up separate. If it still uses a1111 to load and use models you won't get any of those advantages.
If you want to use that type of UI it'd be much easier to just install comfy, single download and launch for windows (might be for other os too) and you edit one line in a config file to make it read your models/loras/embeddings from a1111 or other installs so you don't need symlinks or copies of that.

devilkkw · Jul 14, 2023

i'm testing it now, it launch itself server and run as self. tab is only a frame page of comfyUi, good is with extension is all configured and ready to use with all model you have.
I'm able to use one model in a1111 and another in comfyUI.
I think is useful for fast comparing prompt and result.
Actually seem not different in perfomance, but i'm totally noob on comfy, i'm just curius on how it work and for me a1111 is my standard.

Mr-Fox · Jul 14, 2023

devilkkw said:
i'm testing it now, it launch itself server and run as self. tab is only a frame page of comfyUi, good is with extension is all configured and ready to use with all model you have.
I'm able to use one model in a1111 and another in comfyUI.
I think is useful for fast comparing prompt and result.
Actually seem not different in perfomance, but i'm totally noob on comfy, i'm just curius on how it work and for me a1111 is my standard.

I would be curious to try it for the benefit of making widescreen images. Can the extension in A1111 do widescreen? And with hiresfix? With the subject standing? This would be the ideal.

me3 · Jul 14, 2023

Mr-Fox said:
I would be curious to try it for the benefit of making widescreen images. Can the extension in A1111 do widescreen? And with hiresfix? With the subject standing? This would be the ideal.

do you have an example image of the type of thing you're after? obviously doesn't need to be anything generated, just to have something to ballpark it

[Stable Diffusion] Prompt Sharing and Learning Thread

Well-Known Member

Member

Member

Member

Well-Known Member

Member

Well-Known Member

Member

Well-Known Member

Newbie

Well-Known Member

Well-Known Member

Member

Well-Known Member

Well-Known Member

Member

Member

Member

Well-Known Member

Member