[Stable Diffusion] Prompt Sharing and Learning Thread

me3

Member
Dec 31, 2016
316
708
You should be able to do both 4 and 8k natively just fine, both with and without upscaling, at very little to no loss.
I can't remember what it's for XL as that is far beyond what i could even hope to work with so i've not been digging for info about it.
Also interested in seeing how Würstchen will work and do.

On a sidenote, that lora seem to not care about likeness, but considering they don't seem to have cropped out the text in the training data it's probably too big an ask to make it look like the actual person.
 
  • Like
Reactions: Mr-Fox

felldude

Active Member
Aug 26, 2017
572
1,694
You should be able to do both 4 and 8k natively just fine, both with and without upscaling, at very little to no loss.
I can't remember what it's for XL as that is far beyond what i could even hope to work with so i've not been digging for info about it.
Also interested in seeing how Würstchen will work and do.

On a sidenote, that lora seem to not care about likeness, but considering they don't seem to have cropped out the text in the training data it's probably too big an ask to make it look like the actual person.
XL was trained at 1024x1024 it appears Würstchen V2.0 is also. According to their paper they have generated 1024x2048
 
  • Like
Reactions: Mr-Fox

me3

Member
Dec 31, 2016
316
708
I know what they were trained at, not what i meant.
Anyway with Würstchen i was more thinking about their claim of being much faster than XL and what that means for memory usage and compatibility with older system.
Same with AITemplate i guess, but i can't set that up because ppl don't bother buildiing things well, unfortunately.
 
  • Like
Reactions: Mr-Fox

felldude

Active Member
Aug 26, 2017
572
1,694
I know what they were trained at, not what i meant.
Anyway with Würstchen i was more thinking about their claim of being much faster than XL and what that means for memory usage and compatibility with older system.
Same with AITemplate i guess, but i can't set that up because ppl don't bother buildiing things well, unfortunately.
Read both papers and draw your own conclusion. Both models are limited to about 2x the trained latent image generation size even when finetuned.





Almost all the models self evaluate or AI evaluate against recreation of the FID-COCO-30k set of images
 
Last edited:
  • Like
Reactions: Mr-Fox

hkennereth

Member
Mar 3, 2019
237
775
Thought I could share a tip here for anyone looking to create consistent-looking characters without having to rely on LoRAs, using just some openly available checkpoint and prompting: if you ask for a mix of a few known celebrities, SD will create a person that merges the facial features of all of them into a "new" person, but this way you can pretty consistently output that same person.

For example, on every image below I had mix of Sarah Shahi and Vanessa Hudgens and Nina Dobrev as part of the prompt, with a few differences on the rest of the prompt for each picture to describe clothing, visual style, etc. Hope this helps (and that it wasn't already explored on the previous 100-something pages).

img_upscaled_0043.jpeg img_upscaled_0044.jpeg img_upscaled_0042.jpeg img_upscaled_0041.jpeg img_upscaled_0039.jpeg img_upscaled_0038.jpeg img_upscaled_0037.jpeg img_upscaled_0036.jpeg img_upscaled_0035.jpeg
 

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,802
Thought I could share a tip here for anyone looking to create consistent-looking characters without having to rely on LoRAs, using just some openly available checkpoint and prompting: if you ask for a mix of a few known celebrities, SD will create a person that merges the facial features of all of them into a "new" person, but this way you can pretty consistently output that same person.

For example, on every image below I had mix of Sarah Shahi and Vanessa Hudgens and Nina Dobrev as part of the prompt, with a few differences on the rest of the prompt for each picture to describe clothing, visual style, etc. Hope this helps (and that it wasn't already explored on the previous 100-something pages).

View attachment 2983515 View attachment 2983514 View attachment 2983517 View attachment 2983518 View attachment 2983519 View attachment 2983520 View attachment 2983521 View attachment 2983522 View attachment 2983523
You can also do this either with tag mixing or keyword weighting.

Tag mixing:

1696683032563.png
As you can see it's essentially like using a "refiner". The first person is the main checkpoint and the second is the "refiner" but the weight is reversed. This means that if you want the "refiner" to have more impact you need to increase the number. If you want the checkpoint to have more power the opposite is true. Don't use "[ ]", use normal brackets "( )" instead. It tend to give you and error with SD if you use "[ ]" .

Keyword weighting:

This can be used for blending more than 2 faces.
(Emma Watson:0.5), (Tara Reid:0.9), (Ana de Armas:1.2)
1696683348734.png

Source:
 
Last edited:

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,802
Bonus tips..

You can generate a portrait with either SD1.5 or XL and let the SD only focus on the face regardless of being a mix or not. This will give the face more detail. Then you simply use this face with the extension "roop" when you are generating your character. You can also use this method with img2img to make "photobashing". Meaning taking a photo and using "roop" on top of the photo to create fakes.
When using img2img roop will not be part of the generative process, it will only paste the face on top of the existing bone structure. This will obviously not give a good result in all scenarios. For best result use this method with txt2img when generating a new image.

Face:
00015-3350911957.png

You don't have permission to view the spoiler content. Log in or register now.

Character using roop:
00021-2124235687.png

Using a photoshoped photo of Angelina with blonde hair in roop:
00019-2124235687.png
You don't have permission to view the spoiler content. Log in or register now.

Without roop:
00023-2124235687.png
 

hkennereth

Member
Mar 3, 2019
237
775
You can also do this either with tag mixing or keyword weighting.

Tag mixing:

View attachment 2987332
As you can see it's essentially like using a "refiner". The first person is the main checkpoint and the second is the "refiner" but the weight is reversed. This means that if you want the "refiner" to have more impact you need to increase the number. If you want the checkpoint to have more power the opposite is true. Don't use "[ ]", use normal brackets "( )" instead. It tend to give you and error with SD if you use "[ ]" .

Keyword weighting:

This can be used for blending more than 2 faces.
(Emma Watson:0.5), (Tara Reid:0.9), (Ana de Armas:1.2)
View attachment 2987341

Source:
The tag mixing shown on your first image does work, however it is as far as I know a feature of Automatic1111, and it's either not supported or works differently on other UIs since it changes how the diffusion process works on that image. ComfyUI, my app of choice, doesn't really support that, and I didn't find that trying to change weights works as reliably as I'd like, the results are not as consistent across a wide range of images, which is the point of my original post.

The method I suggested is more flexible and works on any image generation app since it's just basic prompting. It doesn't allow the same level of control, of course, but it is in my experience better for getting a consistent "new" person across many images even when changing styles or checkpoints. Just something to keep in mind.
 
  • Like
Reactions: Mr-Fox

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,802
The tag mixing shown on your first image does work, however it is as far as I know a feature of Automatic1111, and it's either not supported or works differently on other UIs since it changes how the diffusion process works on that image. ComfyUI, my app of choice, doesn't really support that, and I didn't find that trying to change weights works as reliably as I'd like, the results are not as consistent across a wide range of images, which is the point of my original post.

The method I suggested is more flexible and works on any image generation app since it's just basic prompting. It doesn't allow the same level of control, of course, but it is in my experience better for getting a consistent "new" person across many images even when changing styles or checkpoints. Just something to keep in mind.
You get the most consistent result with something like roop or controlnet models since it's the same input every time. I don't know if these are available for comfyui though.
 
  • Like
Reactions: devilkkw

hkennereth

Member
Mar 3, 2019
237
775
You get the most consistent result with something like roop or controlnet models since it's the same input every time. I don't know if these are available for comfyui though.
I don't think that ControlNet is really a good solution for this particular problem, as it allow following a pre-existing composition, but not create a new image only by prompt with the flexibility that this gives as far as completely different result each time you run the prompt. For example, those images of the girl on the bar on my original example were the result of the exact same prompt, I just asked it to generate X images, without needing to prepare any source images for it. That said, if you do want a specific composition, you can surely use ControlNet in addition to the technique above, and yes, ControlNet is available on ComfyUI.

The best alternative is really using a LoRA or Dreambooth so you can train a model to create images of that specific person, but that is better used when you want to reproduce a pre-existing person, not a new fictional one. So if you want to make a game with Angelina Jolie as your main character, training a LoRA or Dreambooth model of her would be the best solution for sure. But that does require a lot of work. My suggestion, or yours of using tag mixing, are better when you want to create a new character "from scratch", and you're just providing some "DNA" to help direct the prompt make that same character consistently.

I know nothing about "roop", so I can't really speak much to it as a good alternative here, but from a very quick look at their GitHub page it also seems better for cases where you're trying to reproduce an existing real person, not create one from scratch. Please correct me if I'm wrong.
 
  • Like
Reactions: Mr-Fox

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,802
I don't think that ControlNet is really a good solution for this particular problem, as it allow following a pre-existing composition, but not create a new image only by prompt with the flexibility that this gives as far as completely different result each time you run the prompt. For example, those images of the girl on the bar on my original example were the result of the exact same prompt, I just asked it to generate X images, without needing to prepare any source images for it. That said, if you do want a specific composition, you can surely use ControlNet in addition to the technique above, and yes, ControlNet is available on ComfyUI.

The best alternative is really using a LoRA or Dreambooth so you can train a model to create images of that specific person, but that is better used when you want to reproduce a pre-existing person, not a new fictional one. So if you want to make a game with Angelina Jolie as your main character, training a LoRA or Dreambooth model of her would be the best solution for sure. But that does require a lot of work. My suggestion, or yours of using tag mixing, are better when you want to create a new character "from scratch", and you're just providing some "DNA" to help direct the prompt make that same character consistently.

I know nothing about "roop", so I can't really speak much to it as a good alternative here, but from a very quick look at their GitHub page it also seems better for cases where you're trying to reproduce an existing real person, not create one from scratch. Please correct me if I'm wrong.
If you have not tried the tools or techniques I'm talking about then you can't know the potential. I would suggest that you try them before writing it off or assuming the best use case scenario. Just to be clear I'm not yelling at you, I'm only stating my thoughts and opinions. :) Roop is very useful for creating a character from scratch also, not only to make fakes of real people. I am suggesting that if you generate a face only, SD will have more resources to give it more detail and quality, then you can use roop with this generated face to generate the entire character with body. In this scenario you would get much nicer faces with nice looking bodies as well. Ofc you can use any tag mixing or other method that works for you when you generate the face portrait. The new controlnet has a model named ip adapter. With this you can use a character that you have created and then simply change the composition or pose, this makes it very consistent and you can make your dataset for a Lora or checkpoint this way. I think it is available for comfy also. Openpose has a face only model, with this you will only give SD the bare bones so to speak, SD will still generate different results with every seed. Or you can of course use the full openpose model and get the same but with the body also. These tools can be used in many ways. They will not limit you, it's only about how creative you can be in using the tools and how imaginative you are.

Here's a demo/tutorial video for Ip adapter:
 

hkennereth

Member
Mar 3, 2019
237
775
If you have not tried the tools or techniques I'm talking about then you can't know the potential. I would suggest that you try them before writing it off or assuming the best use case scenario. Just to be clear I'm not yelling at you, I'm only stating my thoughts and opinions. :) Roop is very useful for creating a character from scratch also, not only to make fakes of real people. I am suggesting that if you generate a face only, SD will have more resources to give it more detail and quality, then you can use roop with this generated face to generate the entire character with body. In this scenario you would get much nicer faces with nice looking bodies as well. Ofc you can use any tag mixing or other method that works for you when you generate the face portrait. The new controlnet has a model named ip adapter. With this you can use a character that you have created and then simply change the composition or pose, this makes it very consistent and you can make your dataset for a Lora or checkpoint this way. I think it is available for comfy also. Openpose has a face only model, with this you will only give SD the bare bones so to speak, SD will still generate different results with every seed. Or you can of course use the full openpose model and get the same but with the body also. These tools can be used in many ways. They will not limit you, it's only about how creative you can be in using the tools and how imaginative you are.

Here's a demo/tutorial video for Ip adapter:
Of course. Out of those only Roop is the one I'm not familiar with, mostly because I don't really use A1111 anymore, and I don't think it's available for ComfyUI. But I am familiar with IP Adapter, and while I haven't been making a ton of art lately, I have had the chance to play around with it and found that it has a ton of potential. The images below were made with it, and a SD1.5 model.
img_00209_.png img_00206_.png img_00203_.png img_00201_.png img_00199_.png img_00197_.png
 
  • Love
Reactions: Mr-Fox

Fuchsschweif

Well-Known Member
Sep 24, 2019
1,143
1,954
Do I have to pay for stable diffusion membership in order to do all of that or is downloading the github stuff alone enough already?
 

me3

Member
Dec 31, 2016
316
708
It's free.
Not sure which UI/system you're planing to use but both and is pretty easy to get to run and use, a1111 being the simplest to use of the two
There are those that sell models and loras, but that really isn't worth even considering for 99% of the usage. You can get very good models on sites like , same with loras. There's obviously badly made ones too but you usually notice those either by images, comments and amounts of downloads.
 

Fuchsschweif

Well-Known Member
Sep 24, 2019
1,143
1,954
It's free.
Not sure which UI/system you're planing to use but both comfyui and automatic1111 is pretty easy to get to run and use.
There are those that sell models and loras, but that really isn't worth even considering for 99% of the usage. You can get very good models on sites like , same with loras. There's obviously badly made ones too but you usually notice those either by images, comments and amounts of downloads.
So basically it's free because I have to "train" it by myself first? Or are models some kind of pre-trained modules?
 

Jimwalrus

Well-Known Member
Sep 15, 2021
1,045
3,994
So basically it's free because I have to "train" it by myself first? Or are models some kind of pre-trained modules?
SD is completely free because Hugging Face released it as such!
I know, right?
The only possible expense* with using Civitai is some creators will set their models as 'Early Release' for which you have to be a paid member to access for a small number of days.
No training is required to use SD, including any of the models on Civitai or elsewhere (they are, as you put it "pre-trained modules"), but you're free to follow tutorials and do some training if you wish. It's also possible you'll want something that no-one else has trained yet.


*Electricity bills aside - they shouldn't be too much unless you have a crazy multi-GPU set up
 

Fuchsschweif

Well-Known Member
Sep 24, 2019
1,143
1,954
SD is completely free because Hugging Face released it as such!
I know, right?
The only possible expense* with using Civitai is some creators will set their models as 'Early Release' for which you have to be a paid member to access for a small number of days.
No training is required to use SD, including any of the models on Civitai or elsewhere (they are, as you put it "pre-trained modules"), but you're free to follow tutorials and do some training if you wish. It's also possible you'll want something that no-one else has trained yet.


*Electricity bills aside - they shouldn't be too much unless you have a crazy multi-GPU set up
I somehow thought SD is Dalle2 and since they charge premium that there is no free version. Or is Dalle just another big model that gives you external rendering, making use of SD, therefore charge you?

I've got a GTX 1070, does it make sense to generate pictures with that or will it take an eternity to get pictures generated?

Thanks for all the info :)
 

me3

Member
Dec 31, 2016
316
708
I somehow thought SD is Dalle2 and since they charge premium that there is no free version. Or is Dalle just another big model that gives you external rendering, making use of SD, therefore charge you?

I've got a GTX 1070, does it make sense to generate pictures with that or will it take an eternity to get pictures generated?

Thanks for all the info :)
Considering i'm using a 1060 (and 1050 in some cases), you should to pretty fine on a 1070
 
  • Like
Reactions: Fuchsschweif