She's waiting...ready to tease you live - Jerkmate is free! Join Now!
x

Generating AI images with StableDiffusion - Beginner Guide | and with img2vid

5.00 star(s) 2 Votes

giqui

Conversation Conqueror
Compressor
Nov 9, 2019
6,977
47,793
883
I've managed to get it to do some pretty explicit stuff, but it takes some word manipulation to do it. This seems to be a field where literary artistry is more useful than graphic.
Well, I used “Florence 2” to try to detect the atypical prompt used to generate image 4, but I couldn't see anything in the description that made it so pornographic. I'm curious about the prompt used. Congratulations on your creativity.

Florence 2:

The image shows a man and a woman in a dungeon-like setting. The woman is sitting on a swing with her legs spread apart and her hands tied up with chains. The man is standing behind her, with his arms around her waist and his mouth open in a scream. He is wearing a black leather collar and cuffs, and his face is covered in red paint. The background is dark and there is smoke coming out of the chains, suggesting that the scene is taking place in a dimly lit room. The overall mood of the image is tense and intense.

image.png
 
  • Like
Reactions: Another John Doe
Mar 3, 2025
64
93
27
Well, I used “Florence 2” to try to detect the atypical prompt used to generate image 4, but I couldn't see anything in the description that made it so pornographic. I'm curious about the prompt used. Congratulations on your creativity.

Florence 2:

The image shows a man and a woman in a dungeon-like setting. The woman is sitting on a swing with her legs spread apart and her hands tied up with chains. The man is standing behind her, with his arms around her waist and his mouth open in a scream. He is wearing a black leather collar and cuffs, and his face is covered in red paint. The background is dark and there is smoke coming out of the chains, suggesting that the scene is taking place in a dimly lit room. The overall mood of the image is tense and intense.

View attachment 5387939
Interesting. The text I wrote didn't look anything like that.
 
  • Like
Reactions: giqui

giqui

Conversation Conqueror
Compressor
Nov 9, 2019
6,977
47,793
883
Interesting. The text I wrote didn't look anything like that.
“Florence 2” is a bit poetic, and JPG files leave no traces. Write a prompt so I can use it. I want to see if I can find the part that sets it apart from my prompt and evolve by analyzing how it differs from mine (y)
 
Mar 3, 2025
64
93
27
Here is a deep throat one I literally just made: A stunningly beautiful girl's face with moist emerald blue eyes and honey blond hair is near the dick of a man, her skin is milky white, her face is blushing red, her eyebrows are raised, her eyes are almond, her brow is furrowed, her eyes are rolled back, her eyes are slightly narrowed, her forehead is creased, she is crying, she is choking, her eyes are watering heavilly, she is facing the camera, man is only visible from below his chest, the man's thighs are widely parted, the man's dick is deep in the girl's mouth, the man has black pubic hair, the man is holding the girl by her hair with both hands, seed:::215106013, , high-angle shot download (1).jpeg
 
  • Like
Reactions: giqui

giqui

Conversation Conqueror
Compressor
Nov 9, 2019
6,977
47,793
883
You'll notice it ignored several prompts. It certa8nly isn't consistent.


:(

Your safety settings have blocked this image.

This generated image may contain themes that are not appropriate for all audience, such as graphic violence, or sexual content. This can happen due to the randomness and biases of the AI model, or specific words in your prompt.
 
Mar 3, 2025
64
93
27


:(

Your safety settings have blocked this image.

This generated image may contain themes that are not appropriate for all audience, such as graphic violence, or sexual content. This can happen due to the randomness and biases of the AI model, or specific words in your prompt.
Yeah, I got that a couple of days ago when I first started playing with it. Just change the settings.
 
  • Like
Reactions: giqui

giqui

Conversation Conqueror
Compressor
Nov 9, 2019
6,977
47,793
883
I used your prompt in ForgeSD with the virtualDiffusion_v20.safetensors model. This was the result. Very good! (y)

00001-1387246195.png
 
  • Haha
Reactions: Another John Doe

giqui

Conversation Conqueror
Compressor
Nov 9, 2019
6,977
47,793
883
I think it translated honey blond hair as honey blond sperm.
The face of an incredibly beautiful girl, , ebony, with freckles on the face and curly blonde hair, is close to a man's penis. Her skin is milky white, her face is flushed, her eyebrows are raised, her eyes are almond-shaped, her forehead is wrinkled, her eyes are rolled back and slightly narrowed, her forehead is wrinkled, she is crying and choking, her eyes are watering profusely, and she is facing the camera. The man is only visible below the chest and his thighs are wide open. His penis is deep in the girl's mouth, spilling semen. He has black pubic hair and is holding the girl by the hair with both hands.

You don't have permission to view the spoiler content. Log in or register now.
 
  • Like
Reactions: Another John Doe

leerlauf

Newbie
Dec 13, 2019
42
15
85
I ran a test on the website , typing: “A woman walking naked on the beach.” As expected, the guidelines blocked it.
The guidelines didn't block this - your personal settings blocked it (that's why it says personal safety settings in your image). You can change those to allow nsfw content.
 

its_not_real

Member
Game Developer
May 14, 2023
110
308
179
I see people using florence2 in here.
Just wanted to quickly chime in with some info.

What model is used depends highly on if a prompt from florence will work or not.
For diffusion models you should use booru tags, ie "1girl, blonde, big breasts, naked, standing, next to a tree, blah, blah, blah", ie TAGS.
Florence does not work like that as you can see.

For models like FLUX or even video generation models accept and even works better with full descriptions like what florence gives you (but does also understand tags).

But for pony models for example, it works better with tags. (you can also make descriptions like florence, but tags is the way to get the model to play nice with you). You also need those "score" prompts to specify the quality of the images the model should pull from the dataset (only pony V6 based models).

I have not used A1111 in AGES, but on forge, you have a 2 buttons you can press below the "generate" button, one for clip and the other for booru. Press them and see what they spit out. They are kinda like "florence for tags".

Could be good to know if the model is not producing the results you want. Read about the model and find how how you should prompt for it.

Edit
Since the thread is about img2vid:
I have recently started to really deep dive into video generation locally.
The new WAN models are quite frankly insane so I have started using a few of them in Comfy. It's pretty complicated but the results sometimes blows even my mind, and I have been generating images locally for years.

If you want to generate videos, forget about Hunyuan and go for WAN. Maybe start out with WAN2.1, and after learning that, move on to WAN2.2.
WAN2.2 is pretty new, so loras and tutorials can be a bit tricky to find since it is very new while WAN2.1 has tons of tutorials and lora:s for you to play around with.
Just be mindful with what size of model you use, they are pretty darn big if you want good quality.
I have a GPU with 24G memory so I can for example use Q8 gguf models (gguf is faster than safetensors, easily described), start with Q4 and work your way up and see what works to your satisfaction.

Also, be prepared to wait. A 5s video in 960x540 (the resolution I usually run in because it gives me the details and quality I want) using Q8 with my 3090 takes around 5-10 minutes (yes, the generation time CAN vary that much). So if it's not "good enough" you have to tweak (if needed) and rerun the whole generation and wait again.
Patience is a required trait with video generation, unless you can afford a 5090 or an Axxxx model ofc.
You can then upsacle the video by 2 to get 1080p, but the upscaled video will never be better than the original so keep that in mind.
Hence "I use Q8 gguf in resolution 960x540". If I use lower Q models, or lower resolution the output becomes "sloppy" in my eyes.

More GPU ram: better quality models (bigger size models) and higher base resolution
Newer GPU: faster generation

All above is about I2V, I have not played around with t2i at all actually. I prefer to create an image in forge and then generate videos from those.

I would say a 3090 with 24g beats a 50xx model with less memory, simply because I CAN use better/bigger models even though it takes time.
With a 50xx card with lets say only half the amount of memory, it will go fast, but I will never be able to use the big quality models so what good does the speed do to me then. :)

Also, do not forget about power consumption on some of the high grade cards. For me, 350w might not sound like much, but have it running for 8hrs and suddenly it starts to cost a bit of money in electricity.

The AI spit out this video when I was playing around with it. I wanted the video to be static but I forgot about nailing that down in the prompt. Look at how it handles distances with camera movement, it blew my mind it could do this (I was using florence2 to add to the prompt, that is probably why it managed to sectionate the image so good).
View attachment WAN-UmeAiRT-gguf-speed_2025-11-02-1411_OG_00001.mp4
 
Last edited:
  • Like
Reactions: giqui
5.00 star(s) 2 Votes