[Stable Diffusion] Prompt Sharing and Learning Thread

hkennereth

Member
Mar 3, 2019
228
737
Quick question (another one, I know): I'm struggling with getting some decent "photo realistic" pictures with ComfyUI, and I think I'm probably still missing something in my workflow. When I look at some "models" and "LoRAs" posted on , I am able to see all the details they (apparently) used to create said picture. Positive prompt, negative prompt, cfg and seed values as well at the sampler/scheduler used.

But when I try to reproduce some of them, my results are way off from what is posted on ComfyUI.

Example: I've been trying to reproduce something like , but no matter what workflow I'm trying to create (using the same checkpoints and LoRAs as posted in the link), the images generated are blurry (especially when upscaled) and don't look realistic at all.

Are these images posted on post-processed through something like Photoshop?
What additional (essential) nodes do I need to add to my workflow to make the results more "crisp" and realistic?

This is what my current (basic) workflow looks like:

There's a couple of bypassed nodes which I was experimenting with, but without success.

This is what I want to achieve (all credits for this image goes to on CivitzAI!):

but this is what I get (using the exact same checkpoint, LoRA, prompts, cfg and seed values as well as the same sampler/scheduler and a square image):

Any help would be much appreciated!
Try to remove the negative prompt to begin with. There are a lot of terms there that are absolutely unnecessary and most often might cause issues. Negative prompting stuff like "acne" or "overexposure" doesn't work. Especially when using SDXL, do not add anything to the negative unless it's an actual element (like an object or person) that you don't want to see in the image. The idea that you can ask it to "don't make it look bad" is a fallacy; training images are not tagged with their defects so SD has no idea what you're talking about.
 
  • Red Heart
Reactions: theMickey_

theMickey_

Engaged Member
Mar 19, 2020
2,091
2,627
Try to remove the negative prompt to begin with.
Thank you for the reply!

I did read a lot about negative prompts, and although a lot of people tend to add a while bunch of it, most of them are unnecessary, I agree! With negative prompt the AI will try to create the quite the opposite of what you've prompted, and this might lead to unwanted results in the first place. So my guess is "less is more" when it comes to negative prompting.

So this was the first thing I've tried, but I still wasn't able to achieve what I was looking for unfortunately :cautious:. The pictures I get still look kinda "fake" and blurry... (but to be fair, they do look better than the example I've posted above!)
 

me3

Member
Dec 31, 2016
316
708
Quick question (another one, I know): I'm struggling with getting some decent "photo realistic" pictures with ComfyUI, and I think I'm probably still missing something in my workflow. When I look at some "models" and "LoRAs" posted on , I am able to see all the details they (apparently) used to create said picture. Positive prompt, negative prompt, cfg and seed values as well at the sampler/scheduler used.

But when I try to reproduce some of them, my results are way off from what is posted on ComfyUI.

Example: I've been trying to reproduce something like , but no matter what workflow I'm trying to create (using the same checkpoints and LoRAs as posted in the link), the images generated are blurry (especially when upscaled) and don't look realistic at all.

Are these images posted on post-processed through something like Photoshop?
What additional (essential) nodes do I need to add to my workflow to make the results more "crisp" and realistic?

This is what my current (basic) workflow looks like:

There's a couple of bypassed nodes which I was experimenting with, but without success.

This is what I want to achieve (all credits for this image goes to on !):

but this is what I get (using the exact same checkpoint, LoRA, prompts, cfg and seed values as well as the same sampler/scheduler and a square image):

Any help would be much appreciated!
XL has 2 text encoders, L and G, while it can work just fine passing the same prompt to both, there are times you can get some "interesting" differences. Just as a general note, might not make any difference in this case.
Not sure how that specific node works but if "clip scale" is a rewording or linked to "clip skip", in comfyui the clip skip values are negative. So if a image is from something like A1111 and has a clip skip of 2, in comfyui that would be -2.

Considering there's seems to be no prompt data in the image, it's rather hard to recreate it exactly.
Going by your posted workflow image though; remove the <lora....> bit from the prompt, and adjust the "strength" in your load lora node to 0.75, your step count is off, says 30 on site, probably shouldn't matter but better safe. Your sampler is also wrong, that can have a huge difference.
Posted image is also at a different width/height, probably upscaled/highres.fix, if it's generated in that width from the start it will look different than yours at 1024x1024.
Prompt is handled slightly different in comfyui compared to A1111, so you can try changing the prompt parser to use a1111 style, there's a node for it.
Might be more difference but you got somewhere to start, i doubt you'll get it 100% considering there might be important details missing since the full prompt/generation isn't included in the image.
 
  • Like
Reactions: sharlotte

hkennereth

Member
Mar 3, 2019
228
737
Thank you for the reply!

I did read a lot about negative prompts, and although a lot of people tend to add a while bunch of it, most of them are unnecessary, I agree! With negative prompt the AI will try to create the quite the opposite of what you've prompted, and this might lead to unwanted results in the first place. So my guess is "less is more" when it comes to negative prompting.

So this was the first thing I've tried, but I still wasn't able to achieve what I was looking for unfortunately :cautious:. The pictures I get still look kinda "fake" and blurry... (but to be fair, they do look better than the example I've posted above!)
The concept of "less is more" is also valid for the positive prompt. Looking at it I also see a whole bunch of things that don't help at all (for example lora:TWBabe... is how you load LoRAs in A1111, and doesn't work on Comfy, therefore can be interpreted as anything), are repeated (best quality, ultra highres, which are also pretty useless), are contradictory (Mandalorian armor covers? On a swimsuit picture?), or are not meant to be used with SDXL (1girl is a tag used on SD1.5 models trained for anime pictures).

I also don't know what many of the nodes are meant to do, and a simpler Comfy workflow would probably help make images better by not adding stuff to the process that isn't being used. May I assume that this is a workflow that you got from someone else? I think it would be beneficial to start with something simpler and add more nodes yourself as you better understand your needs, and exactly what each node is adding to the process. I can recommend , one of the team members at Stability AI responsible for creating SDXL, where he talks about how to create a node set up for SDXL from scratch.
 

Sepheyer

Well-Known Member
Dec 21, 2020
1,523
3,589
So, Mattheo is prolly the IPAdapter expert given he converted/implemented the IPA for ComfyUI.

He got a new video out an hour ago about repeatability of the characters:

I'll be testing his ideas in the due course. But I gotta say for a quick second that somehow while being on the IPA kick for a while I eventually moved away from it towards the ControlNet's tile control net. I think it was the IPA's 512x512 (?) requirement limitation that eventuall made me say fuck this, imma tile my image-to-image workflows from now on rather than IPA them.

Yeaa. Oh, yea, and the mandatory post pic:

a_03125_.png
 
Last edited:

me3

Member
Dec 31, 2016
316
708
So sticking to form, this time the shirt decides to completely freak out and she has some hairstyle issues...and some smaller stuff, but getting closer. No character lora used.
webp would be far to big to post so hopefully mp4 works, at least it let me upload and attach, hopefully the forum hasn't screwed with the file too much, for reference it should be 488x800 at 60fps

View attachment 1.mp4
 

Sepheyer

Well-Known Member
Dec 21, 2020
1,523
3,589
So sticking to form, this time the shirt decides to completely freak out and she has some hairstyle issues...and some smaller stuff, but getting closer. No character lora used.
webp would be far to big to post so hopefully mp4 works, at least it let me upload and attach, hopefully the forum hasn't screwed with the file too much, for reference it should be 488x800 at 60fps
Is this an image-to-image rendering under the hood?
 

me3

Member
Dec 31, 2016
316
708
Is this an image-to-image rendering under the hood?
it's much closer to it than i'd like at least. Background is a single image added as the "back" layer to all frames. To try and keep the movement of everything else there's a combination of a low weight lineart and a pose skeleton.
Affecting colors work and in some ways shape/size, but you can clearly see there's a fight going on, specially with the shirts neckline
It restrict things far too much for my intention and what i'd like. If you look at many of the clips/videos ppl are posting you start to notice that a huge amount of them are just "reskinned" videos processed in a img2img way which locks you into not just movement but also the general "shape" of what ever is in the clip originally.
I mainly wanted to see how it worked with a single background image and it wasn't really meant to be on the full 660 frames, but by the time i'd gotten back and noticed the folder path was wrong it wasn't much point in not finishing the whole run.
Hoping to find a way to have a simple "skeleton" that you can wrap any character to and not be locked into the same/look of what ever it's from. This isn't it, and i suspected it wouldn't be, but i've tried quite a few other ways that doesn't work either. Any control net i've found (besides "pose") locks you in too much. Including temporal which for me at least seems to fuck up colors too.
Considering some method of sampling > masking > unsampling > resampling... etc. It works with small things like expressions, but i fear it'll be a lot of stuff to keep track of with whole body movements. Not sure if there's a simple way to track/detect differences. Anyway, long road, but you learn something along the way...

Edit:
Adding 2 group images so you can see some of the images involved and "stages". Not from the generation of the video by same setup with just slightly altered prompt and different model.
comb_0001.png comb_0002.png
 
Last edited:
  • Like
Reactions: Mr-Fox

Sepheyer

Well-Known Member
Dec 21, 2020
1,523
3,589
it's much closer to it than i'd like at least. Background is a single image added as the "back" layer to all frames. To try and keep the movement of everything else there's a combination of a low weight lineart and a pose skeleton.
Affecting colors work and in some ways shape/size, but you can clearly see there's a fight going on, specially with the shirts neckline
It restrict things far too much for my intention and what i'd like. If you look at many of the clips/videos ppl are posting you start to notice that a huge amount of them are just "reskinned" videos processed in a img2img way which locks you into not just movement but also the general "shape" of what ever is in the clip originally.
I mainly wanted to see how it worked with a single background image and it wasn't really meant to be on the full 660 frames, but by the time i'd gotten back and noticed the folder path was wrong it wasn't much point in not finishing the whole run.
Hoping to find a way to have a simple "skeleton" that you can wrap any character to and not be locked into the same/look of what ever it's from. This isn't it, and i suspected it wouldn't be, but i've tried quite a few other ways that doesn't work either. Any control net i've found (besides "pose") locks you in too much. Including temporal which for me at least seems to fuck up colors too.
Considering some method of sampling > masking > unsampling > resampling... etc. It works with small things like expressions, but i fear it'll be a lot of stuff to keep track of with whole body movements. Not sure if there's a simple way to track/detect differences. Anyway, long road, but you learn something along the way...
Indeed, I have the same experience when attempting i2i frames using controlnets, the end results are nothing but acid trips.
 

me3

Member
Dec 31, 2016
316
708
Indeed, I have the same experience when attempting i2i frames using controlnets, the end results are nothing but acid trips.
I put them in spoiler just incase it might be annoying/"trippy" to look at for someone, so view at your own risk.
You don't have permission to view the spoiler content. Log in or register now.
 
  • Haha
Reactions: sharlotte

me3

Member
Dec 31, 2016
316
708
So i did some tests on seeing how it would work with weighting prompts to deal with the controlnet restrictions, basically trying to brute force it do allow for "change". I had to get right the limit of causing very bad distortions for some of it to even work, this is slightly lowered weights.
In the first clip you can see how the "long dress" is glitching in to some frames, it seems fairly stable as a dress but that's it.
Second clip i tried to adjust the body type/size...and it has some issues...
So purely prompting out of this seems unlike, the hunt continues...
View attachment w1.mp4
View attachment w2.mp4
 
  • Like
Reactions: Mr-Fox

Sepheyer

Well-Known Member
Dec 21, 2020
1,523
3,589
So i did some tests on seeing how it would work with weighting prompts to deal with the controlnet restrictions, basically trying to brute force it do allow for "change". I had to get right the limit of causing very bad distortions for some of it to even work, this is slightly lowered weights.
In the first clip you can see how the "long dress" is glitching in to some frames, it seems fairly stable as a dress but that's it.
Second clip i tried to adjust the body type/size...and it has some issues...
So purely prompting out of this seems unlike, the hunt continues...
View attachment 3163295
View attachment 3163294
In this case the IPAdapter is your tool. You show it what dress you want, then inject it's output into the sampler. It is a "more better" approach then using prompt weight's. In fact it becomes a micromodel that must improve consistency.

If you need a snippet of the workflow let me know.
 

hkennereth

Member
Mar 3, 2019
228
737
So i did some tests on seeing how it would work with weighting prompts to deal with the controlnet restrictions, basically trying to brute force it do allow for "change". I had to get right the limit of causing very bad distortions for some of it to even work, this is slightly lowered weights.
In the first clip you can see how the "long dress" is glitching in to some frames, it seems fairly stable as a dress but that's it.
Second clip i tried to adjust the body type/size...and it has some issues...
So purely prompting out of this seems unlike, the hunt continues...
You should check out this video here for an alternative method for video generation that is perhaps better at temporal coherence:
 

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793
You should check out this video here for an alternative method for video generation that is perhaps better at temporal coherence:
This is Ai enhanced animation rather than Ai created animation. I think that most would prefer to not create an animation in a 3d editor such as blender but rather rely mostly on SD instead. SD is getting better and better so with time it will probably happen that it's actually possible and more straight forward than it currently is. As far as I have seen, animatediff is the only tool or method that is only using Ai to create the animation. All other options are some form of Ai enhancement or hybrid videos. Live caption to SD video is also something that is getting more and more possible with such extensions as LCM Live.
Though with all that waffle being waffled, there is nothing wrong with creative ways of using Ai to enhance animations or videos obliviously. There are new tools and extensions and models etc coming very soon that will expand on what is currently possible with Ai created animations and videos. such as "Loose Control" and "Motion Control".
Watch this video to learn about it: .
 
Last edited:

rogue_69

Newbie
Nov 9, 2021
78
235
This is Ai enhanced animation rather than Ai created animation. I think that most would prefer to not create an animation in a 3d editor such as blender but rather rely mostly on SD instead.
I think there will be a big demand for both. Personally, I like animating in Daz, Blender, and Unity; but I'm not happy with the "polish" of those renderers. That's why I like the advancements in Video to Video. SD can really make a plastic looking animation look cinematic. It just needs to get more consistent.
 

DreamingAway

Member
Aug 24, 2022
228
597
I think there will be a big demand for both. Personally, I like animating in Daz, Blender, and Unity; but I'm not happy with the "polish" of those renderers. That's why I like the advancements in Video to Video. SD can really make a plastic looking animation look cinematic. It just needs to get more consistent.
Before stable diffusion - I always wanted my 3D Models to have a painted / artistic aesthetic to them. As if hand drawn.
I would of expected a solution to this be available through custom shaders but I was always disappointed with the results.

AI + Rotoscoping feels like the more likely technology to get there. Imagine extremely detailed hand draw art, animated and rendered at the speed of 3D - if that can be achieved it's almost the best of both worlds.

--

Before SD this was impossible.

Animation in old Disney movies always had extremely detailed backgrounds then simple flat shaded characters / animation because they had to draw them frame-by-frame. If a single frame takes an artist 3 weeks, you can't possibly achieve 24 frames per second and the likelihood of consistency falls dramatically as well.

This would be something AI could do (hopefully) that is essentially impossible today.
 

rogue_69

Newbie
Nov 9, 2021
78
235
Before stable diffusion - I always wanted my 3D Models to have a painted / artistic aesthetic to them. As if hand drawn.
I would of expected a solution to this be available through custom shaders but I was always disappointed with the results.

AI + Rotoscoping feels like the more likely technology to get there. Imagine extremely detailed hand draw art, animated and rendered at the speed of 3D - if that can be achieved it's almost the best of both worlds.
The beauty is that you'll be able to apply whatever style you want. I go back and forth between wanting to do realistic and cartoonish animation.
 

DreamingAway

Member
Aug 24, 2022
228
597
The beauty is that you'll be able to apply whatever style you want. I go back and forth between wanting to do realistic and cartoonish animation.
I just wish more work had been done in the industry to support "hand drawn styles" in 3D. Genshin did a pretty good job emulating "anime" and Arcane seems to be pushing the boundary on that "hand drawn" style but it's still not there for me.

When I see concept arts for certain video games or MMO's I'm usually disappointed in how they translated into the 3D world. I hope AI can make those a reality.
 
  • Like
Reactions: Mr-Fox