[Stable Diffusion] Prompt Sharing and Learning Thread

Mr-Fox · Mar 7, 2023

You must be registered to see the links

Prompt by

You must be registered to see the links

Cast a net and see what you catch. Hold it in your hand, the stars of the universe...

I know corny af, but it was what came to mind when I got these Images in sequence.

Jimwalrus · Mar 7, 2023

Mr-Fox said:
Just follow this video and install Automatic1111, it's only different user interfaces so you don't loose anything by choosing this over the other. In fact just forget about any other UI for now.
In the video he show you how to install and use a basic version of Stable diffusion 1.5 .
Stable Diffusion 1.5 or 2.1 is the actual "models", then you go to civitai.com and get a trained checkpoint or merged checkpoint.
This is stable diffusion 1.5 that has been trained for specific genre's and niche's. This is what you use as a "generator".
Then you can add other things like lora's for a more "controlled" result. Stable Diffusion 2.1 is not NSFW content for now as far as I know so not interesting to us perverts..

Exactly this - A1111 is really just an automatic way of installing and starting SD Webui. Don't worry about the tiny differences, A1111 will probably take over as it's the automated option!

Please note, we're not being dismissive of Easy Diffusion or, for that matter, you. We're not experts in this - very few people are. This technology is just over six months old, the only real experts are those who developed it in the first place! Everyone else is on a steep learning curve. Having an "easy" version now might seem like a great idea, it's not. It will get left in the dust and you won't know how to use the complex version everyone will be using in six months' time. It's like Photoshop - those who have used it for decades forget it used to be reasonably simple and mostly intuitive. I never used it and now it's moved on so much it's absolutely impenetrable.

TL;DR - surfing a big wave is easier when it starts as a small one and you're carried along with it as it grows.

Quick glossary:

(N.B. Some of the 'definitions' below may be laughably incorrect to those with deep technical knowledge - those people are not the intended audience. These are broad analogies to allow people to understand the concepts!)

- Model. The big Checkpoint file, 2-8GB in size. You load one of these in to get a 'world' or genre i.e. different ones for anime, photorealism, oil paintings, landscapes.These have a .ckpt or .safetensors file extension.
- LoRA. Low Rate Adaptation. 9-400MB in size. An advanced type of embedding that used to require crazy amounts of vRAM on your GPU, now works OK with as little as 8GB. If you want to generate a very specific type of thing such as a specific model of car or a celebrity, you might use a LoRA. Advantage is they can be trained on a small number of images (as little as 3!). Disadvantage is that they often take over and don't always play well with other LoRAs. They have .ckpt, .pt or .safetensors file extensions. Make sure you don't put them in the \models\stable-diffusion folder, they go in the models\Lora folder.
- Textual Inversion (or TI). Tiny files, a few kB. The older, slower way of training an embedding. Requires a lot of images (min 20). Effectively recreates in miniature how SD was trained originally - by showing it pictures of a thing and hitting it repeatedly over the head with lots of maths until it associates a word with the essence of those pictures. These have a .pt or .bin extension. They go in the \embeddings folder.
- Hypernetwork. Another form of training. Conceptually they sit in a similar space to LoRAs and are a similar size, although they tend to work as if they were a small collection of LoRAs. Separate tab in WebUI. Tend to 'play well' with other embeddings. They go in the models\hypernetworks folder.
- Token. Each word (or part thereof if SD doesn't recognise the whole word and has to split it down to work it out) of a prompt is a separate token. Used to be limited to 75 tokens, but long ago got expanded to effectively infinite, but in blocks of 75.
- Vector. Effectively 'concepts' for SD to remember for a given word (i.e. token). When training a TI you will be asked how many 'Vectors per token' to use. If training a very simple concept (metallic paint for instance), 1 vector per token is all you need. If training a TI on a person's face, it's best to have at least 2. I've had best results with 10-12 for a person, especially if training on whole body images where the person has an unusual body e.g. very slim. The usual rule of thumb is at least 5 images per vector, with the usual minimum of 20 images for TIs.
- txt2img. The classic way to operate SD, by entering text strings as tokens for it to compare against its training as it carves images out of the static. There are plenty of explanations as to how SD works elsewhere on the Internet, I'm sticking with "carves images out of static"!
- img2img. Takes a starting image and applies SD to it, instead of almost random static. Very good way of getting what you actually want, particularly poses.
- ControlNet. A development of img2img that allows a user to pose one or more stick figures, then run SD against that. Works very well most of the time and is pretty intuitive.
- Seed. A usually numerical code introduced to the initial random static for two reasons. Firstly it gives SD some variance to coalesce an image from ("Shapes in the smoke"), secondly it allows for repeatability, which wouldn't be possible with purely random static. What a seed does not do is provide consistency of face, clothing, accessories, background or anything really between prompts or models. The same seed & prompts will usually give recognisably related images across different models - but there will still be very considerable variation (a house in the background in one model becomes a car in another; a bandana in one becomes a hat in another, a pathway in one becomes a fashion show runway etc.).
- Steps. How many times SD runs its prompts against the static. Less than 10 is unlikely to give anything worthwhile, less than 20 is generally not recommended. Exactly how many is 'best' depends on many, many factors including the depths of your patience / pockets. Play around with more or fewer steps to see what works best for what you're aiming for, but be prepared to increase or decrease them at any time. Very small numbers can result in poor shaping or complete ignoring of tokens in a prompt, seemingly at random (because it pretty much is!). Very large numbers can result in images that look 'overworked', 'airbrushed' effects and/or SD suddenly deciding after 120 steps that Taylor Swift has nipples on her neck! Try X/Y/Z plotter to test a sample image with different numbers of steps.
- Sampler. Unless you're a very, very clever mathematician you don't need (or want) to know what the difference between samplers is in mathematical terms. Just consider them to be words that represent the minutiae of how SD does its thing. Experiment with them, see which ones you prefer for different situations. Can be included in an X/Y/Z plot, so worth playing there.

Mr-Fox · Mar 7, 2023

Nano999 said:
OK, BUT!
I need the other one too! I'm like Sheldon, I need to find it at all cost, or I will feel anxious.
It hits my EGO mega search engine.
Even if I won't use it at all

Then you will be more busy looking for all the different Ui's instead of actually making awesome images.
It' doesn't look to me that you would even know what to do with them.
You need to start somewhere and Automatic1111 is the best place period.
Now get going with it.

Jimwalrus · Mar 7, 2023

Nano999 said:
OK, BUT!
I need the other one too! I'm like Sheldon, I need to find it at all cost, or I will feel anxious.
It hits my EGO mega search engine.
Even if I won't use it at all

If you really want a starter version that's pretty much as powerful as A1111, try NMKD. It's a lot more powerful than Easy Diffusion seems to be. But I'd recommend moving to the main version as soon as you're comfortable with it.

Mr-Fox · Mar 7, 2023

Jimwalrus said:
Exactly this - A1111 is really just an automatic way of installing and starting SD Webui. Don't worry about the tiny differences, A1111 will probably take over as it's the automated option!

Please note, we're not being dismissive of Easy Diffusion or, for that matter, you. We're not experts in this - very few people are. This technology is just over six months old, the only real experts are those who developed it in the first place! Everyone else is on a steep learning curve. Having an "easy" version now might seem like a great idea, it's not. It will get left in the dust and you won't know how to use the complex version everyone will be using in six months' time. It's like Photoshop - those who have used it for decades forget it used to be reasonably simple and mostly intuitive. I never used it and now it's moved on so much it's absolutely impenetrable.

TL;DR - surfing a big wave is easier when it starts as a small one and you're carried along with it as it grows.

Quick glossary:
- Model. The big Checkpoint file, 2-8GB in size. You load one of these in to get a 'world' or genre i.e. different ones for anime, photorealism, oil paintings, landscapes. .ckpt or .safetensors file extension.
- LoRA. Low Rate Adaptation. 9-400MB in size. An advanced type of embedding that used to require crazy amounts of vRAM on your GPU, now works OK with as little as 8GB. If you want to generate a very specific type of thing such as a specific model of car or a celebrity, you might use a LoRA. Advantage is they can be trained on a small number of images (as little as 3!). Disadvantage is that they often take over and don't always play well with other LoRAs. .ckpt, .pt or .safetensors file extensions. Make sure you don't put them in the \models\stable-diffusion folder, they go in the models\Lora folder.
- Textual Inversion (or TI). Tiny files, a few kB. The older, slower way of training an embedding. Requires a lot of images (min 20). Effectively recreates in miniature how SD was trained originally - by showing it pictures of a thing and hitting it repeatedly over the head with lots of maths until it associates a word with the essence of those pictures. .pt or .bin extension. They go in the \embeddings folder.

Thank you for all this awesome information. I think it's not only complete beginners that is learning..

What about Hypernetworks? you skipped over that part.

I know how to use them but doesn't know much about them.

Sepheyer · Mar 7, 2023

I gotta start pinning these kinda posts to the very first post in this thread. Si, maniana, seniores.

Jimwalrus · Mar 7, 2023

Mr-Fox said:
Thank you for all this awesome information. I think it's not only complete beginners that is learning..
What about Hypernetworks? you skipped over that part.
I know how to use them but doesn't know much about them.

Yeah, same here! That's probably why it didn't occur to me to include them in the glossary. I just treat them as another type of embedding.

Synalon · Mar 7, 2023

Mr-Fox said:
Is it not better that you learn to do it yourself? What happens next time you want to fix an image?

You must be registered to see the links

This video shows the basics of inpainting, you can use the same method for the nipples.

I've been trying to learn and gave up on it so I put it on here just in case.

I've loaded it into img2img and spent a few hours trying to refine it to fix the eyes, and tried adjusting the sliders a bit at a time in Extras, in case upgrading it would help with different filters.

Mr-Fox · Mar 7, 2023

Synalon said:
I've been trying to learn and gave up on it so I put it on here just in case.

Well it would help if you post the generated image that has the meta data instead of a photo editor copy...
I mean if you expect any of us to take a stab at it..

Synalon · Mar 7, 2023

Mr-Fox said:
Well it would help if you post the generated image that has the meta data instead of a photo editor copy...

It would help if I had known that I had done that as I thought it was just added automatically.

Mr-Fox · Mar 7, 2023

Synalon said:
It would help if I knew that I had done that as I thought it was just added automatically.

View attachment 2447348

It's only added to the png files that are generated by SD. If you take that image and upscale or edit in a photo editor, the meta data is lost.

Mr-Fox · Mar 7, 2023

Mr-Fox said:
It's only added to the png files that are generated by SD. If you take that image and upscale or edit in a photo editor, the meta data is lost.

Something tells me that you have simply missed an important part. That is you can't use a static seed when you try to fix something with inpaint. So switch it to -1 and it should work much better for you.

Mr-Fox · Mar 7, 2023

It's getting late here so I had to do it in a faster and a little "dirty" way. I used your prompt and generated an image that I then used for "parts" and edited the image in photoshop and then used img2img to get back the meta data on the file. Meaning I used img2img with 0 denoising strength. If I had more time I know I could make it even better. I hope that you'll be happy with the result.

*Completely forgot to remove the text yesterday. Now it's gone.

Synalon · Mar 8, 2023

Mr-Fox said:
It's getting late here so I had to do it in a faster and a little "dirty" way. I used your prompt and generated an image that I then used for "parts" and edited the image in photoshop and then used img2img to get back the meta data on the file. Meaning I used img2img with 0 denoising strength. If I had more time I know I could make it even better. I hope that you'll be happy with the result.

View attachment 2447569

Its great thank you.

Mr-Fox · Mar 8, 2023

I forgot to remove the text yesterday. Now it's fixed

Nano999 · Mar 8, 2023

Can you kindly tell what model this is? I tried mine, based on this image and simple promt
japanese girl, standing, blue frilled bra
but got nothing similar to this

Jimwalrus · Mar 8, 2023

Nano999 said:
Can you kindly tell what model this is? I tried mine, based on this image and simple promt
japanese girl, standing, blue frilled bra
but got nothing similar to this

Without a seed number it's basically impossible to reproduce. Obviously it's one of the "realistic" models, the most common for NSFW being

You must be registered to see the links

(which, stylistically, is also a good candidate - you kind of get a feel for different models).
There are lots of different realism-focused models though.
There's also the possibility this is a 'home brew': any SD user has the facility to merge more than one model at whatever percentages they wish. URPM is itself a merge of many, many different models by one particular user.

Nano999 · Mar 8, 2023

Jimwalrus said:
Without a seed number it's basically impossible to reproduce. Obviously it's one of the "realistic" models, the most common for NSFW being
You must be registered to see the links
(which, stylistically, is also a good candidate - you kind of get a feel for different models).
There are lots of different realism-focused models though.
There's also the possibility this is a 'home brew': any SD user has the facility to merge more than one model at whatever percentages they wish. URPM is itself a merge of many, many different models by one particular user.

The seed is something that refers to an image? Like a model was trained based on 10 000 images, and seed 5 467 will refer to that exact image?

Also a quick question.
What is Custom VAE?
I have "vae-ft-mse-840000-ema-pruned" by default
Is it a thingy to improve the eyes specifically?
And should I look for other type at civitai?

Jimwalrus · Mar 8, 2023

Nano999 said:
The seed is something that refers to an image? Like a model was trained based on 10 000 images, and seed 5 467 will refer to that exact image?

Also a quick question.
What is Custom VAE?
I have "vae-ft-mse-840000-ema-pruned" by default
Is it a thingy to improve the eyes specifically?
And should I look for other type at civitai?

The seed is nothing to do with training. It's a number that represents an introduced tiny variation in the otherwise random static that the image is carved from - think of it as a very vague "shape in the smoke". It guides SD as it creates the image. A famous sculptor (so famous I've forgotten their name) described sculpting as "Taking a block of marble and removing everything that isn't the subject". The seed is effectively the grain of the marble, imperfections that shape the finished product slightly. A seed means it's reproducible - if it were pure random static nothing could be recreated.
To reproduce an image you need the same:
- Seed
- Model (exact one!)
- VAE
- Prompts, both +ve & -ve
- Steps
- CFG level
- Hi-res Fix steps (if used)
- Denoising strength (if Hi-res used)
- Whether 'Restore Faces' was used, and if so which type and strength
- Size of image (not just the same aspect ratio, although that's a big part of it, the exact width x height in pixels)
- Embeddings, LoRAs etc.

So, very difficult, but it is still possible to recreate an image. Fortunately almost all of this is saved by default in the metadata of a PNG generated by SD. Just enough of this information isn't stored to make it really frustrating! If you're using the full version of SD there is a tab called PNG info. Drop an image in there and the full parameters will be displayed. There's also the option to send to txt2img, img2img etc. to automate it a little.

A VAE is a mathematical overlay on a model. Not quite sure how they work, but custom ones are available. Most people, if a model doesn't have a VAE 'baked in' (and that is the term used!), use vae-ft-mse-840000-ema-pruned. It's generally a "set it and forget it" thing as by default SD will use a baked-in VAE unless the model doesn't have one, then SD will use whatever you've set as your default.
VAE's do provide substantial improvement for eyes, faces and other stuff - I'd certainly never go without one. But, as I say, everyone uses the baked-in one or vae-ft-mse-840000-ema-pruned.

Nano999 · Mar 8, 2023

It it true that promts (negative and positive) are limited in tokens? Like I have 1000 words negative promt xD

[Stable Diffusion] Prompt Sharing and Learning Thread

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Member

Well-Known Member

Member

Well-Known Member

Well-Known Member

Well-Known Member

Member

Well-Known Member

Member

Well-Known Member

Member

Well-Known Member

Member