Generating AI images with StableDiffusion - Beginner Guide | and with img2vid

giqui · Oct 29, 2025

Another John Doe said:
I've managed to get it to do some pretty explicit stuff, but it takes some word manipulation to do it. This seems to be a field where literary artistry is more useful than graphic.

Well, I used “Florence 2” to try to detect the atypical prompt used to generate image 4, but I couldn't see anything in the description that made it so pornographic. I'm curious about the prompt used. Congratulations on your creativity.

Florence 2:

The image shows a man and a woman in a dungeon-like setting. The woman is sitting on a swing with her legs spread apart and her hands tied up with chains. The man is standing behind her, with his arms around her waist and his mouth open in a scream. He is wearing a black leather collar and cuffs, and his face is covered in red paint. The background is dark and there is smoke coming out of the chains, suggesting that the scene is taking place in a dimly lit room. The overall mood of the image is tense and intense.

Another John Doe · Oct 29, 2025

giqui said:
Well, I used “Florence 2” to try to detect the atypical prompt used to generate image 4, but I couldn't see anything in the description that made it so pornographic. I'm curious about the prompt used. Congratulations on your creativity.

Florence 2:

The image shows a man and a woman in a dungeon-like setting. The woman is sitting on a swing with her legs spread apart and her hands tied up with chains. The man is standing behind her, with his arms around her waist and his mouth open in a scream. He is wearing a black leather collar and cuffs, and his face is covered in red paint. The background is dark and there is smoke coming out of the chains, suggesting that the scene is taking place in a dimly lit room. The overall mood of the image is tense and intense.

View attachment 5387939

Interesting. The text I wrote didn't look anything like that.

giqui · Oct 29, 2025

Another John Doe said:
Interesting. The text I wrote didn't look anything like that.

“Florence 2” is a bit poetic, and JPG files leave no traces. Write a prompt so I can use it. I want to see if I can find the part that sets it apart from my prompt and evolve by analyzing how it differs from mine

Another John Doe · Oct 29, 2025

Here is a deep throat one I literally just made: A stunningly beautiful girl's face with moist emerald blue eyes and honey blond hair is near the dick of a man, her skin is milky white, her face is blushing red, her eyebrows are raised, her eyes are almond, her brow is furrowed, her eyes are rolled back, her eyes are slightly narrowed, her forehead is creased, she is crying, she is choking, her eyes are watering heavilly, she is facing the camera, man is only visible from below his chest, the man's thighs are widely parted, the man's dick is deep in the girl's mouth, the man has black pubic hair, the man is holding the girl by her hair with both hands, seed:::215106013, , high-angle shot

Another John Doe · Oct 29, 2025

You'll notice it ignored several prompts. It certa8nly isn't consistent.

giqui · Oct 30, 2025

Another John Doe said:
You'll notice it ignored several prompts. It certa8nly isn't consistent.

You must be registered to see the links

Your safety settings have blocked this image.

This generated image may contain themes that are not appropriate for all audience, such as graphic violence, or sexual content. This can happen due to the randomness and biases of the AI model, or specific words in your prompt.

Another John Doe · Oct 30, 2025

giqui said:
You must be registered to see the links

Your safety settings have blocked this image.

This generated image may contain themes that are not appropriate for all audience, such as graphic violence, or sexual content. This can happen due to the randomness and biases of the AI model, or specific words in your prompt.

Yeah, I got that a couple of days ago when I first started playing with it. Just change the settings.

giqui · Oct 30, 2025

I used your prompt in ForgeSD with the virtualDiffusion_v20.safetensors model. This was the result. Very good!

Another John Doe · Oct 30, 2025

giqui said:
I used your prompt in ForgeSD with the virtualDiffusion_v20.safetensors model. This was the result. Very good!

View attachment 5388547

I think it translated honey blond hair as honey blond sperm.

giqui · Oct 30, 2025

Another John Doe said:
I think it translated honey blond hair as honey blond sperm.

The face of an incredibly beautiful girl, , ebony, with freckles on the face and curly blonde hair, is close to a man's penis. Her skin is milky white, her face is flushed, her eyebrows are raised, her eyes are almond-shaped, her forehead is wrinkled, her eyes are rolled back and slightly narrowed, her forehead is wrinkled, she is crying and choking, her eyes are watering profusely, and she is facing the camera. The man is only visible below the chest and his thighs are wide open. His penis is deep in the girl's mouth, spilling semen. He has black pubic hair and is holding the girl by the hair with both hands.

You don't have permission to view the spoiler content. Log in or register now.

Another John Doe · Oct 30, 2025

And the evil queen lived happily ever after.

leerlauf · Oct 30, 2025

giqui said:
I ran a test on the website
You must be registered to see the links
, typing: “A woman walking naked on the beach.” As expected, the guidelines blocked it.

The guidelines didn't block this - your personal settings blocked it (that's why it says personal safety settings in your image). You can change those to allow nsfw content.

Another John Doe · Nov 1, 2025

And now you know where Rapeseed oil comes from.

Rudwick · Nov 3, 2025

its_not_real · Nov 3, 2025

I see people using florence2 in here.
Just wanted to quickly chime in with some info.

What model is used depends highly on if a prompt from florence will work or not.
For diffusion models you should use booru tags, ie "1girl, blonde, big breasts, naked, standing, next to a tree, blah, blah, blah", ie TAGS.
Florence does not work like that as you can see.

For models like FLUX or even video generation models accept and even works better with full descriptions like what florence gives you (but does also understand tags).

But for pony models for example, it works better with tags. (you can also make descriptions like florence, but tags is the way to get the model to play nice with you). You also need those "score" prompts to specify the quality of the images the model should pull from the dataset (only pony V6 based models).

I have not used A1111 in AGES, but on forge, you have a 2 buttons you can press below the "generate" button, one for clip and the other for booru. Press them and see what they spit out. They are kinda like "florence for tags".

Could be good to know if the model is not producing the results you want. Read about the model and find how how you should prompt for it.

Edit
Since the thread is about img2vid:
I have recently started to really deep dive into video generation locally.
The new WAN models are quite frankly insane so I have started using a few of them in Comfy. It's pretty complicated but the results sometimes blows even my mind, and I have been generating images locally for years.

If you want to generate videos, forget about Hunyuan and go for WAN. Maybe start out with WAN2.1, and after learning that, move on to WAN2.2.
WAN2.2 is pretty new, so loras and tutorials can be a bit tricky to find since it is very new while WAN2.1 has tons of tutorials and lora:s for you to play around with.
Just be mindful with what size of model you use, they are pretty darn big if you want good quality.
I have a GPU with 24G memory so I can for example use Q8 gguf models (gguf is faster than safetensors, easily described), start with Q4 and work your way up and see what works to your satisfaction.

Also, be prepared to wait. A 5s video in 960x540 (the resolution I usually run in because it gives me the details and quality I want) using Q8 with my 3090 takes around 5-10 minutes (yes, the generation time CAN vary that much). So if it's not "good enough" you have to tweak (if needed) and rerun the whole generation and wait again.
Patience is a required trait with video generation, unless you can afford a 5090 or an Axxxx model ofc.
You can then upsacle the video by 2 to get 1080p, but the upscaled video will never be better than the original so keep that in mind.
Hence "I use Q8 gguf in resolution 960x540". If I use lower Q models, or lower resolution the output becomes "sloppy" in my eyes.

More GPU ram: better quality models (bigger size models) and higher base resolution
Newer GPU: faster generation

All above is about I2V, I have not played around with t2i at all actually. I prefer to create an image in forge and then generate videos from those.

I would say a 3090 with 24g beats a 50xx model with less memory, simply because I CAN use better/bigger models even though it takes time.
With a 50xx card with lets say only half the amount of memory, it will go fast, but I will never be able to use the big quality models so what good does the speed do to me then.

Also, do not forget about power consumption on some of the high grade cards. For me, 350w might not sound like much, but have it running for 8hrs and suddenly it starts to cost a bit of money in electricity.

The AI spit out this video when I was playing around with it. I wanted the video to be static but I forgot about nailing that down in the prompt. Look at how it handles distances with camera movement, it blew my mind it could do this (I was using florence2 to add to the prompt, that is probably why it managed to sectionate the image so good).
View attachment WAN-UmeAiRT-gguf-speed_2025-11-02-1411_OG_00001.mp4

Generating AI images with StableDiffusion - Beginner Guide | and with img2vid

giqui

Conversation Conqueror

Another John Doe

Newbie

giqui

Conversation Conqueror

Another John Doe

Newbie

Another John Doe

Newbie

giqui

Conversation Conqueror

Another John Doe

Newbie

giqui

Conversation Conqueror

Another John Doe

Newbie

giqui

Conversation Conqueror

Another John Doe

Newbie

leerlauf

Newbie

Another John Doe

Newbie

Rudwick

New Member

its_not_real

Member