[Stable Diffusion] Prompt Sharing and Learning Thread

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793


Prompt by

Cast a net and see what you catch. Hold it in your hand, the stars of the universe...

00031-1900657289.png 00034-1900657289.png 00033-1900657289.png

I know corny af, but it was what came to mind when I got these Images in sequence.
 

Jimwalrus

Active Member
Sep 15, 2021
853
3,179
Just follow this video and install Automatic1111, it's only different user interfaces so you don't loose anything by choosing this over the other. In fact just forget about any other UI for now.
In the video he show you how to install and use a basic version of Stable diffusion 1.5 .
Stable Diffusion 1.5 or 2.1 is the actual "models", then you go to civitai.com and get a trained checkpoint or merged checkpoint.
This is stable diffusion 1.5 that has been trained for specific genre's and niche's. This is what you use as a "generator".
Then you can add other things like lora's for a more "controlled" result. Stable Diffusion 2.1 is not NSFW content for now as far as I know so not interesting to us perverts.. :D
Exactly this - A1111 is really just an automatic way of installing and starting SD Webui. Don't worry about the tiny differences, A1111 will probably take over as it's the automated option!


Please note, we're not being dismissive of Easy Diffusion or, for that matter, you. We're not experts in this - very few people are. This technology is just over six months old, the only real experts are those who developed it in the first place! Everyone else is on a steep learning curve. Having an "easy" version now might seem like a great idea, it's not. It will get left in the dust and you won't know how to use the complex version everyone will be using in six months' time. It's like Photoshop - those who have used it for decades forget it used to be reasonably simple and mostly intuitive. I never used it and now it's moved on so much it's absolutely impenetrable.

TL;DR - surfing a big wave is easier when it starts as a small one and you're carried along with it as it grows.


Quick glossary:

(N.B. Some of the 'definitions' below may be laughably incorrect to those with deep technical knowledge - those people are not the intended audience. These are broad analogies to allow people to understand the concepts!)

- Model. The big Checkpoint file, 2-8GB in size. You load one of these in to get a 'world' or genre i.e. different ones for anime, photorealism, oil paintings, landscapes.These have a .ckpt or .safetensors file extension.
- LoRA. Low Rate Adaptation. 9-400MB in size. An advanced type of embedding that used to require crazy amounts of vRAM on your GPU, now works OK with as little as 8GB. If you want to generate a very specific type of thing such as a specific model of car or a celebrity, you might use a LoRA. Advantage is they can be trained on a small number of images (as little as 3!). Disadvantage is that they often take over and don't always play well with other LoRAs. They have .ckpt, .pt or .safetensors file extensions. Make sure you don't put them in the \models\stable-diffusion folder, they go in the models\Lora folder.
- Textual Inversion (or TI). Tiny files, a few kB. The older, slower way of training an embedding. Requires a lot of images (min 20). Effectively recreates in miniature how SD was trained originally - by showing it pictures of a thing and hitting it repeatedly over the head with lots of maths until it associates a word with the essence of those pictures. These have a .pt or .bin extension. They go in the \embeddings folder.
- Hypernetwork. Another form of training. Conceptually they sit in a similar space to LoRAs and are a similar size, although they tend to work as if they were a small collection of LoRAs. Separate tab in WebUI. Tend to 'play well' with other embeddings. They go in the models\hypernetworks folder.
- Token. Each word (or part thereof if SD doesn't recognise the whole word and has to split it down to work it out) of a prompt is a separate token. Used to be limited to 75 tokens, but long ago got expanded to effectively infinite, but in blocks of 75.
- Vector. Effectively 'concepts' for SD to remember for a given word (i.e. token). When training a TI you will be asked how many 'Vectors per token' to use. If training a very simple concept (metallic paint for instance), 1 vector per token is all you need. If training a TI on a person's face, it's best to have at least 2. I've had best results with 10-12 for a person, especially if training on whole body images where the person has an unusual body e.g. very slim. The usual rule of thumb is at least 5 images per vector, with the usual minimum of 20 images for TIs.
- txt2img. The classic way to operate SD, by entering text strings as tokens for it to compare against its training as it carves images out of the static. There are plenty of explanations as to how SD works elsewhere on the Internet, I'm sticking with "carves images out of static"!
- img2img. Takes a starting image and applies SD to it, instead of almost random static. Very good way of getting what you actually want, particularly poses.
- ControlNet. A development of img2img that allows a user to pose one or more stick figures, then run SD against that. Works very well most of the time and is pretty intuitive.
- Seed. A usually numerical code introduced to the initial random static for two reasons. Firstly it gives SD some variance to coalesce an image from ("Shapes in the smoke"), secondly it allows for repeatability, which wouldn't be possible with purely random static. What a seed does not do is provide consistency of face, clothing, accessories, background or anything really between prompts or models. The same seed & prompts will usually give recognisably related images across different models - but there will still be very considerable variation (a house in the background in one model becomes a car in another; a bandana in one becomes a hat in another, a pathway in one becomes a fashion show runway etc.).
- Steps. How many times SD runs its prompts against the static. Less than 10 is unlikely to give anything worthwhile, less than 20 is generally not recommended. Exactly how many is 'best' depends on many, many factors including the depths of your patience / pockets. Play around with more or fewer steps to see what works best for what you're aiming for, but be prepared to increase or decrease them at any time. Very small numbers can result in poor shaping or complete ignoring of tokens in a prompt, seemingly at random (because it pretty much is!). Very large numbers can result in images that look 'overworked', 'airbrushed' effects and/or SD suddenly deciding after 120 steps that Taylor Swift has nipples on her neck! Try X/Y/Z plotter to test a sample image with different numbers of steps.
- Sampler. Unless you're a very, very clever mathematician you don't need (or want) to know what the difference between samplers is in mathematical terms. Just consider them to be words that represent the minutiae of how SD does its thing. Experiment with them, see which ones you prefer for different situations. Can be included in an X/Y/Z plot, so worth playing there.
 
Last edited:

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793
OK, BUT!
I need the other one too! I'm like Sheldon, I need to find it at all cost, or I will feel anxious.
It hits my EGO mega search engine.
Even if I won't use it at all
Then you will be more busy looking for all the different Ui's instead of actually making awesome images.
It' doesn't look to me that you would even know what to do with them.
You need to start somewhere and Automatic1111 is the best place period.
Now get going with it. :LOL:
 

Jimwalrus

Active Member
Sep 15, 2021
853
3,179
OK, BUT!
I need the other one too! I'm like Sheldon, I need to find it at all cost, or I will feel anxious.
It hits my EGO mega search engine.
Even if I won't use it at all
If you really want a starter version that's pretty much as powerful as A1111, try NMKD. It's a lot more powerful than Easy Diffusion seems to be. But I'd recommend moving to the main version as soon as you're comfortable with it.
 
  • Like
Reactions: Sepheyer and Mr-Fox

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793
Exactly this - A1111 is really just an automatic way of installing and starting SD Webui. Don't worry about the tiny differences, A1111 will probably take over as it's the automated option!


Please note, we're not being dismissive of Easy Diffusion or, for that matter, you. We're not experts in this - very few people are. This technology is just over six months old, the only real experts are those who developed it in the first place! Everyone else is on a steep learning curve. Having an "easy" version now might seem like a great idea, it's not. It will get left in the dust and you won't know how to use the complex version everyone will be using in six months' time. It's like Photoshop - those who have used it for decades forget it used to be reasonably simple and mostly intuitive. I never used it and now it's moved on so much it's absolutely impenetrable.

TL;DR - surfing a big wave is easier when it starts as a small one and you're carried along with it as it grows.


Quick glossary:
- Model. The big Checkpoint file, 2-8GB in size. You load one of these in to get a 'world' or genre i.e. different ones for anime, photorealism, oil paintings, landscapes. .ckpt or .safetensors file extension.
- LoRA. Low Rate Adaptation. 9-400MB in size. An advanced type of embedding that used to require crazy amounts of vRAM on your GPU, now works OK with as little as 8GB. If you want to generate a very specific type of thing such as a specific model of car or a celebrity, you might use a LoRA. Advantage is they can be trained on a small number of images (as little as 3!). Disadvantage is that they often take over and don't always play well with other LoRAs. .ckpt, .pt or .safetensors file extensions. Make sure you don't put them in the \models\stable-diffusion folder, they go in the models\Lora folder.
- Textual Inversion (or TI). Tiny files, a few kB. The older, slower way of training an embedding. Requires a lot of images (min 20). Effectively recreates in miniature how SD was trained originally - by showing it pictures of a thing and hitting it repeatedly over the head with lots of maths until it associates a word with the essence of those pictures. .pt or .bin extension. They go in the \embeddings folder.
Thank you for all this awesome information. I think it's not only complete beginners that is learning.. :giggle:
What about Hypernetworks? you skipped over that part.:)
I know how to use them but doesn't know much about them. :sneaky:
 
Last edited:

Jimwalrus

Active Member
Sep 15, 2021
853
3,179
Thank you for all this awesome information. I think it's not only complete beginners that is learning.. :giggle:
What about Hypernetworks? you skipped over that part.:)
I know how to use them but doesn't know much about them. :sneaky:
Yeah, same here! That's probably why it didn't occur to me to include them in the glossary. I just treat them as another type of embedding.
 
  • Like
Reactions: Mr-Fox

Synalon

Member
Jan 31, 2022
191
617
Is it not better that you learn to do it yourself? What happens next time you want to fix an image?


This video shows the basics of inpainting, you can use the same method for the nipples.
I've been trying to learn and gave up on it so I put it on here just in case.

I've loaded it into img2img and spent a few hours trying to refine it to fix the eyes, and tried adjusting the sliders a bit at a time in Extras, in case upgrading it would help with different filters.
 
Last edited:
  • Like
Reactions: Sepheyer and Mr-Fox

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793
I've been trying to learn and gave up on it so I put it on here just in case.
Well it would help if you post the generated image that has the meta data instead of a photo editor copy...
I mean if you expect any of us to take a stab at it..;)
 
  • Like
Reactions: Sepheyer

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793
It's only added to the png files that are generated by SD. If you take that image and upscale or edit in a photo editor, the meta data is lost.
Something tells me that you have simply missed an important part. That is you can't use a static seed when you try to fix something with inpaint. So switch it to -1 and it should work much better for you.
 
Last edited:

Mr-Fox

Well-Known Member
Jan 24, 2020
1,401
3,793
It's getting late here so I had to do it in a faster and a little "dirty" way. I used your prompt and generated an image that I then used for "parts" and edited the image in photoshop and then used img2img to get back the meta data on the file. Meaning I used img2img with 0 denoising strength. If I had more time I know I could make it even better. I hope that you'll be happy with the result.

00155-3120258005.png

*Completely forgot to remove the text yesterday. Now it's gone.
 
Last edited:

Synalon

Member
Jan 31, 2022
191
617
It's getting late here so I had to do it in a faster and a little "dirty" way. I used your prompt and generated an image that I then used for "parts" and edited the image in photoshop and then used img2img to get back the meta data on the file. Meaning I used img2img with 0 denoising strength. If I had more time I know I could make it even better. I hope that you'll be happy with the result.

View attachment 2447569
Its great thank you.
 
  • Hey there
Reactions: Mr-Fox

Nano999

Member
Jun 4, 2022
152
68
Can you kindly tell what model this is? I tried mine, based on this image and simple promt
japanese girl, standing, blue frilled bra
but got nothing similar to this

 

Jimwalrus

Active Member
Sep 15, 2021
853
3,179
Can you kindly tell what model this is? I tried mine, based on this image and simple promt
japanese girl, standing, blue frilled bra
but got nothing similar to this

Without a seed number it's basically impossible to reproduce. Obviously it's one of the "realistic" models, the most common for NSFW being (which, stylistically, is also a good candidate - you kind of get a feel for different models).
There are lots of different realism-focused models though.
There's also the possibility this is a 'home brew': any SD user has the facility to merge more than one model at whatever percentages they wish. URPM is itself a merge of many, many different models by one particular user.
 
  • Like
Reactions: Mr-Fox and Sepheyer

Nano999

Member
Jun 4, 2022
152
68
Without a seed number it's basically impossible to reproduce. Obviously it's one of the "realistic" models, the most common for NSFW being (which, stylistically, is also a good candidate - you kind of get a feel for different models).
There are lots of different realism-focused models though.
There's also the possibility this is a 'home brew': any SD user has the facility to merge more than one model at whatever percentages they wish. URPM is itself a merge of many, many different models by one particular user.
The seed is something that refers to an image? Like a model was trained based on 10 000 images, and seed 5 467 will refer to that exact image?


Also a quick question.
What is Custom VAE?
I have "vae-ft-mse-840000-ema-pruned" by default
Is it a thingy to improve the eyes specifically?
And should I look for other type at civitai?
 
Last edited:
  • Like
Reactions: Jimwalrus

Jimwalrus

Active Member
Sep 15, 2021
853
3,179
The seed is something that refers to an image? Like a model was trained based on 10 000 images, and seed 5 467 will refer to that exact image?


Also a quick question.
What is Custom VAE?
I have "vae-ft-mse-840000-ema-pruned" by default
Is it a thingy to improve the eyes specifically?
And should I look for other type at civitai?
The seed is nothing to do with training. It's a number that represents an introduced tiny variation in the otherwise random static that the image is carved from - think of it as a very vague "shape in the smoke". It guides SD as it creates the image. A famous sculptor (so famous I've forgotten their name) described sculpting as "Taking a block of marble and removing everything that isn't the subject". The seed is effectively the grain of the marble, imperfections that shape the finished product slightly. A seed means it's reproducible - if it were pure random static nothing could be recreated.
To reproduce an image you need the same:
- Seed
- Model (exact one!)
- VAE
- Prompts, both +ve & -ve
- Steps
- CFG level
- Hi-res Fix steps (if used)
- Denoising strength (if Hi-res used)
- Whether 'Restore Faces' was used, and if so which type and strength
- Size of image (not just the same aspect ratio, although that's a big part of it, the exact width x height in pixels)
- Embeddings, LoRAs etc.

So, very difficult, but it is still possible to recreate an image. Fortunately almost all of this is saved by default in the metadata of a PNG generated by SD. Just enough of this information isn't stored to make it really frustrating! If you're using the full version of SD there is a tab called PNG info. Drop an image in there and the full parameters will be displayed. There's also the option to send to txt2img, img2img etc. to automate it a little.

A VAE is a mathematical overlay on a model. Not quite sure how they work, but custom ones are available. Most people, if a model doesn't have a VAE 'baked in' (and that is the term used!), use vae-ft-mse-840000-ema-pruned. It's generally a "set it and forget it" thing as by default SD will use a baked-in VAE unless the model doesn't have one, then SD will use whatever you've set as your default.
VAE's do provide substantial improvement for eyes, faces and other stuff - I'd certainly never go without one. But, as I say, everyone uses the baked-in one or vae-ft-mse-840000-ema-pruned.
 

Nano999

Member
Jun 4, 2022
152
68
It it true that promts (negative and positive) are limited in tokens? Like I have 1000 words negative promt xD