[Stable Diffusion] Prompt Sharing and Learning Thread

felldude · Jun 28, 2024

crnisl said:
You definitely can. Look, clip interrogation with blip2-2.7b, with sdxl prompt mode.

And you can use your own dictionary for questioning, it can help.

I did see the BLIP-2 but I am guessing that nearly any CLIP model won't fit into my VRAM with BLIP-2, but it looks like an interesting tool and I wasn't aware of it, I might try it later. (EDIT: I could probably run it FP8 assuming some of the 15GB FP32 Model was offloaded when not in use)

Time wise I can run BLIP-2 and append with WD-14 on 332 2k images in a few hours....
If my times are the same as yours then it would be 11 hours to run on the program.

EDIT: I'd be curious to see what the image attached is tagged as using that program if you wanted to test run it:

BLIP-2 with WD-14 (Onyx) appended -1girl

a woman sitting on a bench in a white tank top and red shorts with her hands on her hips, solo, long hair, breasts, looking at viewer, skirt, jewelry, medium breasts, sitting, purple hair, outdoors, pussy, choker, day, spread legs, miniskirt, clothes lift, mole, lips, uncensored, no panties, red skirt, skirt lift, sunglasses, tank top, lifted by self, building, mole on breast, watch, wristwatch, white choker

crnisl · Jun 28, 2024

felldude said:
EDIT: I'd be curious to see what the image attached is tagged as using that program if you wanted to test run it:

Heh, I don't know why violet tanktop.
With this prompt and your aspect ratio, I got this from realvisxl, so it catches some vibe.

felldude · Jun 28, 2024

crnisl said:
View attachment 3777479

Heh, I don't know why violet tanktop.
With this prompt and your aspect ratio, I got this from realvisxl, so it catches some vibe.
View attachment 3777443 View attachment 3777444

Thanks for the test, my thoughts:

I am fighting anchor fallacy here but I also am one of those people that pulls the beta or nightly builds and then spends hours rebuilding the version I just came from....(I really should keep to VENV's)

Your description had Spain and full round face which I felt was accurate.

But I felt the natural language was lacking if you compare the two:

a woman sitting on a bench in a white tank top and red shorts with her hands on her hips
a woman sitting on a bench with her legs bare

The shorts part was wrong so it could be a wash but the WD14 caught the clothes lift, no panties, skirt lift, and lifted by self,

Of course this is just one image, and it would be poor to judge based off just one image, I know I had some badly described images in the mix, one or two that where completely wrong.

I think CLIP interrogation is more beneficial for trying to reproduce an image that a model has been trained on, rather then providing data to train a model on.

crnisl · Jun 28, 2024

felldude said:
But I felt the natural language was lacking if you compare the two:

The first sentence is just an output of the first launch of the blip2, before the loop of the interrogation. There're other default parameters in my build, to make it faster or smth, and it can be changed to be equal to your output, I think.

Anyway, I had more success with training character loras using for captioning the output of the interrogator (however, not the full one) than with blip+tagger. But there're different specifics, I guess. If you want to work with poses and fetishes, etc., not with making most similar faces, then the tagger is better - or maybe you need to experiment with custom dictionaries for the interrogator.

felldude · Jun 28, 2024

crnisl said:
The first sentence is just an output of the first launch of the blip2, before the loop of the interrogation. There're other default parameters in my build, to make it faster or smth, and it can be changed to be equal to your output, I think.

Yeah I didn't mess with number of beams all that, I think the default settings where tested over a large group of images and then used a scale to not over describe or get the repeating terms. (Based of one article I read, I haven't played around at all with it)

I'm not sure what quality I am running at, I would guess FP8 or at best 16 as my card is not able to fit 15GB but with bitsandbytes it might actually be running at FP32 as my understanding is it doesn't need to load the whole model in unless training.

crnisl · Jun 28, 2024

Sepheyer said:
We had a scorching debate about likeness not a six month ago I think. The eventual consensus was the likeness -- facewise -- is NOT retained per se. To keep the face one needs to borderline "copy" it over. So, depending on your threshold for likeness this can work either very well or not at all:
https://f95zone.to/threads/stable-diffusion-prompt-sharing-and-learning-thread.146036/post-12670868
https://f95zone.to/threads/stable-diffusion-prompt-sharing-and-learning-thread.146036/post-12750138
https://f95zone.to/threads/stable-diffusion-prompt-sharing-and-learning-thread.146036/post-12669146

Well...

Synalon · Jun 28, 2024

felldude said:
I have been playing around with PONY for awhile but I honestly might switch back to XL

I did a 2k training on 332 images of females flashing the camera in public places, (It was 600 images I pruned down the ones I thought wouldn't train well)

I took over an hour just to BLIP-2 Caption them, I was impressed with the BLIP-2 captioning on high quality highly complex images. I would not use it on a white background image however.

Here are some test images from 1024 up to 2k on the XXXL-V3 model

View attachment 3776302 View attachment 3776303 View attachment 3776304 View attachment 3776306 View attachment 3776307 View attachment 3776308

What program did you use for the blip 2 captioning?
I used Kohya to caption 768 images and it took about 3 minutes, since yours took so long was it more detailed captions or something?

felldude · Jun 28, 2024

Synalon said:
What program did you use for the blip 2 captioning?
I used Kohya to caption 768 images and it took about 3 minutes, since yours took so long was it more detailed captions or something?

You have an amazing computer if you where parsing 4.5ish 2k images a second.
(Or the model is oversized for me and causing major slowdowns)

I can get those times on the 373MB WD-Vit tagger with oynx but not the 15GB BLIP 2 model

Synalon · Jun 29, 2024

I'm using a i9-13900KF, an rtx 4080, and 32gb of 6400Mhz ddr5. I've not captioned anything over 1024x1024 yet, but I did 768 images pretty quickly when I was trying to train a lora.

Is that 15gb BLIP 2 model built into Kohya?

felldude · Jun 30, 2024

Synalon said:
I'm using a i9-13900KF, an rtx 4080, and 32gb of 6400Mhz ddr5. I've not captioned anything over 1024x1024 yet, but I did 768 images pretty quickly when I was trying to train a lora.

Is that 15gb BLIP 2 model built into Kohya?

It automatically pulls the one off

You must be registered to see the links

Based of other users that can get 10-15IT's per second on SDXL I am at a solid 1IT per second.

You must be registered to see the links

Synalon · Jun 30, 2024

felldude said:
It automatically pulls the one off
You must be registered to see the links

Based of other users that can get 10-15IT's per second on SDXL I am at a solid 1IT per second.

You must be registered to see the links

If you want to save time in the future I'll run blip for you as long as we can find a place to upload the images to pass to each other.

felldude · Jul 2, 2024

You must be registered to see the links

Trained to convergence on 332 images with Natural Language and captioning that has been checked by hand and edited on 70% of the BLIP-2 Captions.

WD-14 tagging is a custom dictionary with manual appending.

Avoid using "nude" if possible.
Use -Large Breasts when using small breasts or flat chest

You must be registered to see the links

Onetrueking · Jul 3, 2024

Hello everyone i’d like to make similar arts like a guy on twitter Philon (Philonai95). So can someone please make similar of his works and share information about prompts/lora/tools you used to achieve it. Nothing on civitai looks the same. Known information: Forge, pony xl v6, concept art twilight.

crnisl · Jul 3, 2024

Onetrueking said:
Nothing on civitai looks the same.

Well, I clearly see a mix of the styles of Cutesexyrobutts, Sabudenego and Krys Decker.
Start from there.

You must be registered to see the links

felldude · Jul 8, 2024

You must be registered to see the links

Mrrg · Jul 8, 2024

felldude · Jul 10, 2024

Onetrueking said:
Hello everyone i’d like to make similar arts like a guy on twitter Philon (Philonai95). So can someone please make similar of his works and share information about prompts/lora/tools you used to achieve it. Nothing on civitai looks the same. Known information: Forge, pony xl v6, concept art twilight.

Based off of the distortion in the signature I would guess that is a post effect, it looks like a wave distortion and some kind of guassian blur

Onetrueking · Jul 10, 2024

felldude said:
Based off of the distortion in the signature I would guess that is a post effect, it looks like a wave distortion and some kind of guassian blur

It could be matte painting made in photoshop since it’s mentioned on his twitter. I’m also curious about perfect hands, eyes, feet he makes in his works, is it adetailer or something else

felldude · Jul 10, 2024

Onetrueking said:
It could be matte painting made in photoshop since it’s mentioned on his twitter. I’m also curious about perfect hands, eyes, feet he makes in his works, is it adetailer or something else

A 3D sourced image you can blend elements together, AI does hair, 3D does hands and feet as well as posing.

felldude · Jul 11, 2024

You must be registered to see the links

[Stable Diffusion] Prompt Sharing and Learning Thread

Active Member

Well-Known Member

Active Member

Well-Known Member

Active Member

Well-Known Member

Member

Active Member

Member

Active Member

Member

Active Member

Member

Well-Known Member

Active Member

Newbie

Active Member

Member

Active Member

Active Member