[Stable Diffusion] Prompt Sharing and Learning Thread

ririmudev

Member
Dec 15, 2018
304
308
Sounds like the prompt i used when testing different UI and SDXL, so here's a couple pretty clean prompt runs of yours. I've had far worse horrors from things you'd expect be pretty safe. Missing/skipped images are more due to being "boring" than any horror.
View attachment 2945034
View attachment 2945033 View attachment 2945035 View attachment 2945037 View attachment 2945048

Models:
#1-2:
#3-4:
#5:
Those are a pretty fair representation; in my mind I was going for a little more cutesy, more humanoid, but to be fair, I didn't specify that.
Ok fine... here's a couple images that I got (hope I don't get banned, though the pics are just unpleasant, nothing rule-breaking):
You don't have permission to view the spoiler content. Log in or register now.
A few others were just as bad, maybe slightly worse, but I'll leave it at this.
One was bad, but pretty abstract, and almost kind of cool (but I'll still put it in a spoiler):
You don't have permission to view the spoiler content. Log in or register now.
<End transmission>
 
  • Wow
Reactions: Mr-Fox and Dagg0th

me3

Member
Dec 31, 2016
316
708
Just to potentially up the "learning" part of this thread again.
As i've been doing a lot of repeated training lately i thought i'd share some minor things that might help others do things a bit faster.
Since i have a slow, old and just 6gb vram card, reducing memory needs have always been one of the priorities, which means that any sort of "speed" goes out the window.

When i started this endlessly long repeated training...sigh.... it was running at 6.8 to 8.4 s/it...yes that's the correct way...seconds per iteration...
which means a 10k step training would be >20 hours, easily. This was without dropping into shared memory usage etc and purely running at 95-98% vram. Training i'm currently running, still on the same dataset etc, is running at 1.6 to 1.8 s/it...it's still "the slow way", but that's still very noticeable. Now if all those with 30xx and 40xx are done laughing, this might potentially help you as well. Obviously it's a bit hard for me to test, unless i'm somehow gifted a massive new computer or wins some kind of lottery.

While these things are mainly reducing memory needs, it can work for those with much more vram too since it might allow you to add more batches which would speed things up greatly.

First off, many guides and what not complain about buckets being horrible without any reason as to why or some crappy excuse suggesting they don't even know how it works. In this case the point with buckets will let you reduce the number of px in most/all images as you can crop out a lot of pointless background from tall/wide images. 100 images at 512x or 768x squares are a lot more to work through than the same images and you can cut off half because it's "empty space". Just remember to do the settings for it so you don't get more cropping or weird bucketing.

Second, optimizer, if you're on low/lower specs you're probably already familiar with adamw8bit, however there's a more "optimized" version called PagedAdamW8bit.
For me with adamw8bit i had to run training with both "gradient checkpointing" and "memory efficient attention" and training ran constant at 5.8gb. When doing sample images it would then offload to generate the image, which was a noticeable lag, it's not major, but it was there.
Using the Paged version i can remove "memory efficient attention" and training is running at 4.4-4.6gb and it only spikes up for sample images, but it doesn't offload or "reorganize" any memory usage to do it. So it still keeps everything in memory which speeds up sampling too. Not that it's a function you need. But because of not needing the ME attention and still having vram to spare, it speeds up everything.
(I haven't tested if ME attention allows for +1 batch and if that would be faster in total, i doubt it for me though)

Third small thing, Cache latents, it's generally checked by default in kohya_ss gui, but for some reason i've seen guide/training files turn that off. It might be because it keeps all the latents in memory and ppl haven't really thought what that means. As a simple explanation, you keep a "version" of the image in memory instead of reading it each time. "Each time" in this case means every repetition and epoch, but unlike much of the other "keeps data in memory" option this is fairly little. Mostly less than 100kb per image, so unless you're going way overboard with the amount of images you use, this should not be the reason you OOM and it does speed things up. Your mileage will obviously vary with system. Don't "cache to disk" though.

As a final note, i know these things worked very well for me, on my low spec old comp, in theory they should work for others as well, but the effect/impact will obviously depend on what's running it.
If you're already running at full speed it's unlikely to do much, but as i mentioned earlier, if it means you can increase your batch count by 1 or more, it would make this faster even for those systems.

And if anyone bothered reading all this, Hi...
As a follow up to this, it turns out that it's just possible for me to disable gradient checkpointing, which further increase speed.
Now i've gotten down to 1.14 to 1.16 s/it, the vram spike with sampling cause it to go up to 1.2 to 1.24 s/it, but sampling can be disabled if needed/wanted.
This is probably gonna depend on model size etc, as it's very borderline, but cuts of 1/3 of the time and i'm closing in on seeing it/s.
Any further improvements is probably gonna depend on code or driver changes and tbh i don't think nvidias focus is on improving as old cards as mine :p
 
  • Like
Reactions: Mr-Fox and Sepheyer

me3

Member
Dec 31, 2016
316
708
Follow up #2
Part of testing/science/learning is to make mistakes, be wrong, etc and learn from those.
So it seems i was wrong in my previous post, it seems you don't need a code or driver update to speed things up further :p
I'm currently running at a very stable 1.08 s/it, for a very short while it even ran at a speed where the readout kept changing back and forth between s/it and it/s.
Only thing i changed was network rank and alpha, don't know if it's related to both or just one of them but testing is ongoing.
I know this is probably uninteresting etc for most ppl, but i'm basically running training at 1 sec iterations on a almost 7.5 year old 6gb card and it's currently just using 5.4gb of which includes what ever the OS etc is still using.
 
  • Like
Reactions: VanMortis

sharlotte

Member
Jan 10, 2019
299
1,591
Been away for a bit, and started a couple of days ago to use ComfyUI as much as possible. Still testing a lot of flows out there or creating my own, making sure to understand what the various steps and settings actually do. I find it great so far at creating objects, nature... but really awful at creating faces. Have not read the thread for a while so will be going (slowly) over the past few (dozens) of pages.
Meanwhile here is some of the stuff I generated (anyone wondering I've been playing BG3 lately), as usual flow inside.
ComfyUI_00003_.png ComfyUI_00004_.png ComfyUI_00005_.png ComfyUI_00010_.png ComfyUI_00012_.png ComfyUI_00013_.png ComfyUI_00019_.png ComfyUI_00018_.png
 

me3

Member
Dec 31, 2016
316
708
Been away for a bit, and started a couple of days ago to use ComfyUI as much as possible. Still testing a lot of flows out there or creating my own, making sure to understand what the various steps and settings actually do. I find it great so far at creating objects, nature... but really awful at creating faces. Have not read the thread for a while so will be going (slowly) over the past few (dozens) of pages.
Meanwhile here is some of the stuff I generated (anyone wondering I've been playing BG3 lately), as usual flow inside.
View attachment 2947610 View attachment 2947611 View attachment 2947612 View attachment 2947613 View attachment 2947614 View attachment 2947615 View attachment 2947617 View attachment 2947616
i think i've figured out why you got problems with faces, you've forgotten something, the skin ;)
 

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,802
Follow up #2
Part of testing/science/learning is to make mistakes, be wrong, etc and learn from those.
So it seems i was wrong in my previous post, it seems you don't need a code or driver update to speed things up further :p
I'm currently running at a very stable 1.08 s/it, for a very short while it even ran at a speed where the readout kept changing back and forth between s/it and it/s.
Only thing i changed was network rank and alpha, don't know if it's related to both or just one of them but testing is ongoing.
I know this is probably uninteresting etc for most ppl, but i'm basically running training at 1 sec iterations on a almost 7.5 year old 6gb card and it's currently just using 5.4gb of which includes what ever the OS etc is still using.
Try using a little token merging in optimizations settings. 0.2 is fine for "Token merging ratio", and 0.08 ish is fine for "Negative Guidance minimum sigma" in my experience. You can ofc experiment and try higher settings.
 
  • Like
Reactions: Sepheyer

me3

Member
Dec 31, 2016
316
708
Try using a little token merging in optimizations settings. 0.2 is fine for "Token merging ratio", and 0.08 ish is fine for "Negative Guidance minimum sigma" in my experience. You can ofc experiment and try higher settings.
it didn't really have an effect of my generation speed unfortunately, maybe it's more apparent with high resolution, upscaling or with controlnet involved. Also the affect it had on the images i were generating at the time was a bit "unfortunate"

As a sidenote, since i can't use xformers atm because there's something wrong in the code/setup, i'm forced to use sdp, so it might work better with xformers
 
  • Like
Reactions: Mr-Fox

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,802
it didn't really have an effect of my generation speed unfortunately, maybe it's more apparent with high resolution, upscaling or with controlnet involved. Also the affect it had on the images i were generating at the time was a bit "unfortunate"

As a sidenote, since i can't use xformers atm because there's something wrong in the code/setup, i'm forced to use sdp, so it might work better with xformers
It has made a big difference for me, not only when using hiresfix but "normal" generations as well. I don't have any data to show right now but I know it has cut down on my generation times significantly.
 

Sharinel

Active Member
Dec 23, 2018
598
2,509
This might be of interest to some people.
TestMerge.jpg

The above pic shows the same prompt/seed combination using 2 different checkpoints.
The left hand pic is using Dreamshaper 8 while the right hand is using EpicRealism.
The one in the middle is using both. It starts off with Dreamshaper then uses the Refiner tool in Automatic1111 to then morph into the EpicRealism. You can get some interesting outcomes depending on how you do the merging.
1695491277070.png

Prompt is "beautiful female standing next to desk wearing __CC_female_clothing_set_business__****, deep cleavage, photorealistic, wide hips, closeup, textured skin, skin pores, looking down at camera, thicc thighs, gigapixel, 8k, cinematic, fov 60 photo of perfecteyes eyes, perfecteyes eyes, <lora:more_details:1> <lora:GoodHands-beta2:1>

****This is a wildcard, came out as Trousers and boat neck top
 

Dagg0th

Member
Jan 20, 2022
279
2,746
This might be of interest to some people.
View attachment 2951512

The above pic shows the same prompt/seed combination using 2 different checkpoints.
The left hand pic is using Dreamshaper 8 while the right hand is using EpicRealism.
The one in the middle is using both. It starts off with Dreamshaper then uses the Refiner tool in Automatic1111 to then morph into the EpicRealism. You can get some interesting outcomes depending on how you do the merging.
View attachment 2951501

Prompt is "beautiful female standing next to desk wearing __CC_female_clothing_set_business__****, deep cleavage, photorealistic, wide hips, closeup, textured skin, skin pores, looking down at camera, thicc thighs, gigapixel, 8k, cinematic, fov 60 photo of perfecteyes eyes, perfecteyes eyes, <lora:more_details:1> <lora:GoodHands-beta2:1>

****This is a wildcard, came out as Trousers and boat neck top
I do something similar, but instead of using refiner, I do the checkpoint switch on highres.fix, i'll do a comparative of wich one works better, stay tune.
 
  • Like
Reactions: Sharinel and Mr-Fox

felldude

Active Member
Aug 26, 2017
572
1,694
I've installed both the and the for normally I like building my own pipelines but if anyone has a good SEGS setup or can link to one.

For those not familiar it almost turns SD into DALI with the way it handles drawing objects, less turning clouds into hair.

Here is a by someone with way more experience then I currently have using it, he also created a UI manger but I haven't gone that path.

Here is a showing a form of auto mask generation for inpainting, I have seen combinations of SEGS with Automasking and auto prompting from image detection but have never found a shared workflow.
 
Last edited:
  • Like
Reactions: Mr-Fox

me3

Member
Dec 31, 2016
316
708
There any other way of getting the same angle/level and "head to thigh/knee" as "cowboy shot" without the obvious issues that causes in prompts.
Pure prompt only, NO lora, TI, controlnet, openpose, easy-whatevershit...
 

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,802
There any other way of getting the same angle/level and "head to thigh/knee" as "cowboy shot" without the obvious issues that causes in prompts.
Pure prompt only, NO lora, TI, controlnet, openpose, easy-whatevershit...
Here's an image with some useful photography terms that describes the composition. Instead of "cowboyshot" you can try "medium full shot". I guess you could also try to describe what should be included in the image.
"knees thighs torso and head in image" or similar phrasing.
Alternatively simply use the DeepBooru style of tags: knees, thighs, torso, head (face).

1695918801769.png
 

me3

Member
Dec 31, 2016
316
708
Hmmm, forgot about the "medium full"...should fix all the cowboy dressup issues, question is if the AI will know it and if it'll split it up in many cases.
So far it seems to be treating it as just a medium shot, with the occasional full.
This could obviously be down to model and/or seed
 
  • Like
Reactions: Mr-Fox

felldude

Active Member
Aug 26, 2017
572
1,694

Trained for native 1k generation and up scaling to 2k (No Highrez fix)

I thought with the BF16 and ability to train beyond 1k, why not try to teach SD to put details in the right spots at high resolutions. (Even at .5 normal sampler img to img at 2k)

Nude female image training, non pornographic

ComfyUI_00440_.png
 

alij8000

New Member
May 6, 2018
4
20
anyone have a navel penetration lora or prompt? i have managed to create the scenes with controlnet but it takes too much time doing it manually since stable diffusion really doesn't understand the concept, understandably xD.
 

felldude

Active Member
Aug 26, 2017
572
1,694
How do I put two separate character LoRA's in one image?
Putting both at the same time will combine the two, you can in-paint the face or body using masking or adetailer.

Or simply run the image twice with one lora then the other and combine the images in Gimp.
 
  • Like
Reactions: Mr-Fox

felldude

Active Member
Aug 26, 2017
572
1,694
Testing how the works with other lora's.
In this case


Native Generation in SD at 768x1024
ComfyUI_00658_.png


2k Upscale

ComfyUI_00659_.png

While it looses some of the body detail, considering 2048 is more the double the expected render resolution, I would be curious to see if you could train XL to generate at 2k and upscale natively at 4k, I don't have the PC stats for that though.
 
  • Like
Reactions: devilkkw and Mr-Fox