[Stable Diffusion] Prompt Sharing and Learning Thread

crnisl

Active Member
Dec 23, 2018
752
575
I don't think I could run the clip interrogation with BLIP-2
You definitely can, use colab or replicate or whatever. Look, clip interrogation with blip2-2.7b, with sdxl prompt mode.

And you can use your own dictionary for questioning, it can help.
 
Last edited:

felldude

Active Member
Aug 26, 2017
572
1,695
You definitely can. Look, clip interrogation with blip2-2.7b, with sdxl prompt mode.

And you can use your own dictionary for questioning, it can help.
I did see the BLIP-2 but I am guessing that nearly any CLIP model won't fit into my VRAM with BLIP-2, but it looks like an interesting tool and I wasn't aware of it, I might try it later. (EDIT: I could probably run it FP8 assuming some of the 15GB FP32 Model was offloaded when not in use)

Time wise I can run BLIP-2 and append with WD-14 on 332 2k images in a few hours....
If my times are the same as yours then it would be 11 hours to run on the program.

EDIT: I'd be curious to see what the image attached is tagged as using that program if you wanted to test run it:

BLIP-2 with WD-14 (Onyx) appended -1girl

a woman sitting on a bench in a white tank top and red shorts with her hands on her hips, solo, long hair, breasts, looking at viewer, skirt, jewelry, medium breasts, sitting, purple hair, outdoors, pussy, choker, day, spread legs, miniskirt, clothes lift, mole, lips, uncensored, no panties, red skirt, skirt lift, sunglasses, tank top, lifted by self, building, mole on breast, watch, wristwatch, white choker
 
Last edited:

crnisl

Active Member
Dec 23, 2018
752
575
EDIT: I'd be curious to see what the image attached is tagged as using that program if you wanted to test run it:
screen.png


Heh, I don't know why violet tanktop.
With this prompt and your aspect ratio, I got this from realvisxl, so it catches some vibe.
o1.png o2.png
 

felldude

Active Member
Aug 26, 2017
572
1,695
View attachment 3777479


Heh, I don't know why violet tanktop.
With this prompt and your aspect ratio, I got this from realvisxl, so it catches some vibe.
View attachment 3777443 View attachment 3777444
Thanks for the test, my thoughts:

I am fighting anchor fallacy here but I also am one of those people that pulls the beta or nightly builds and then spends hours rebuilding the version I just came from....(I really should keep to VENV's)

Your description had Spain and full round face which I felt was accurate.

But I felt the natural language was lacking if you compare the two:

a woman sitting on a bench in a white tank top and red shorts with her hands on her hips
a woman sitting on a bench with her legs bare

The shorts part was wrong so it could be a wash but the WD14 caught the clothes lift, no panties, skirt lift, and lifted by self,

Of course this is just one image, and it would be poor to judge based off just one image, I know I had some badly described images in the mix, one or two that where completely wrong.

I think CLIP interrogation is more beneficial for trying to reproduce an image that a model has been trained on, rather then providing data to train a model on.
 

crnisl

Active Member
Dec 23, 2018
752
575
But I felt the natural language was lacking if you compare the two:
The first sentence is just an output of the first launch of the blip2, before the loop of the interrogation. There're other default parameters in my build, to make it faster or smth, and it can be changed to be equal to your output, I think.

Anyway, I had more success with training character loras using for captioning the output of the interrogator (however, not the full one) than with blip+tagger. But there're different specifics, I guess. If you want to work with poses and fetishes, etc., not with making most similar faces, then the tagger is better - or maybe you need to experiment with custom dictionaries for the interrogator.
 

felldude

Active Member
Aug 26, 2017
572
1,695
The first sentence is just an output of the first launch of the blip2, before the loop of the interrogation. There're other default parameters in my build, to make it faster or smth, and it can be changed to be equal to your output, I think.
Yeah I didn't mess with number of beams all that, I think the default settings where tested over a large group of images and then used a scale to not over describe or get the repeating terms. (Based of one article I read, I haven't played around at all with it)

I'm not sure what quality I am running at, I would guess FP8 or at best 16 as my card is not able to fit 15GB but with bitsandbytes it might actually be running at FP32 as my understanding is it doesn't need to load the whole model in unless training.
 
Last edited:

crnisl

Active Member
Dec 23, 2018
752
575
We had a scorching debate about likeness not a six month ago I think. The eventual consensus was the likeness -- facewise -- is NOT retained per se. To keep the face one needs to borderline "copy" it over. So, depending on your threshold for likeness this can work either very well or not at all:
https://f95zone.to/threads/stable-diffusion-prompt-sharing-and-learning-thread.146036/post-12670868
https://f95zone.to/threads/stable-diffusion-prompt-sharing-and-learning-thread.146036/post-12750138
https://f95zone.to/threads/stable-diffusion-prompt-sharing-and-learning-thread.146036/post-12669146
Well...
00002.png 00002.png 00002.png
 

Synalon

Member
Jan 31, 2022
225
663
I have been playing around with PONY for awhile but I honestly might switch back to XL

I did a 2k training on 332 images of females flashing the camera in public places, (It was 600 images I pruned down the ones I thought wouldn't train well)

I took over an hour just to BLIP-2 Caption them, I was impressed with the BLIP-2 captioning on high quality highly complex images. I would not use it on a white background image however.

Here are some test images from 1024 up to 2k on the XXXL-V3 model

View attachment 3776302 View attachment 3776303 View attachment 3776304 View attachment 3776306 View attachment 3776307 View attachment 3776308
What program did you use for the blip 2 captioning?
I used Kohya to caption 768 images and it took about 3 minutes, since yours took so long was it more detailed captions or something?
 

felldude

Active Member
Aug 26, 2017
572
1,695
What program did you use for the blip 2 captioning?
I used Kohya to caption 768 images and it took about 3 minutes, since yours took so long was it more detailed captions or something?
You have an amazing computer if you where parsing 4.5ish 2k images a second.
(Or the model is oversized for me and causing major slowdowns)

I can get those times on the 373MB WD-Vit tagger with oynx but not the 15GB BLIP 2 model
1.jpg
 
Last edited:

Synalon

Member
Jan 31, 2022
225
663
I'm using a i9-13900KF, an rtx 4080, and 32gb of 6400Mhz ddr5. I've not captioned anything over 1024x1024 yet, but I did 768 images pretty quickly when I was trying to train a lora.

Is that 15gb BLIP 2 model built into Kohya?
 

felldude

Active Member
Aug 26, 2017
572
1,695
I'm using a i9-13900KF, an rtx 4080, and 32gb of 6400Mhz ddr5. I've not captioned anything over 1024x1024 yet, but I did 768 images pretty quickly when I was trying to train a lora.

Is that 15gb BLIP 2 model built into Kohya?
It automatically pulls the one off

Based of other users that can get 10-15IT's per second on SDXL I am at a solid 1IT per second.
 

Synalon

Member
Jan 31, 2022
225
663
It automatically pulls the one off

Based of other users that can get 10-15IT's per second on SDXL I am at a solid 1IT per second.
If you want to save time in the future I'll run blip for you as long as we can find a place to upload the images to pass to each other.
 
  • Like
Reactions: felldude

felldude

Active Member
Aug 26, 2017
572
1,695

MegaPack.jpg

Trained to convergence on 332 images with Natural Language and captioning that has been checked by hand and edited on 70% of the BLIP-2 Captions.

WD-14 tagging is a custom dictionary with manual appending.

  1. Avoid using "nude" if possible.
  2. Use -Large Breasts when using small breasts or flat chest
ComfyUI_03021_.png ComfyUI_03020_.png ComfyUI_03019_.png ComfyUI_03015_.png ComfyUI_03006_.png ComfyUI_02974_.png ComfyUI_02969_.png ComfyUI_02957_.png


Cover.jpeg
 
Last edited:
  • Like
Reactions: Sharinel

Onetrueking

Member
Jun 1, 2020
149
539
Hello everyone i’d like to make similar arts like a guy on twitter Philon (Philonai95). So can someone please make similar of his works and share information about prompts/lora/tools you used to achieve it. Nothing on civitai looks the same. Known information: Forge, pony xl v6, concept art twilight.
 
Last edited:

felldude

Active Member
Aug 26, 2017
572
1,695
Hello everyone i’d like to make similar arts like a guy on twitter Philon (Philonai95). So can someone please make similar of his works and share information about prompts/lora/tools you used to achieve it. Nothing on civitai looks the same. Known information: Forge, pony xl v6, concept art twilight.
Based off of the distortion in the signature I would guess that is a post effect, it looks like a wave distortion and some kind of guassian blur
 

Onetrueking

Member
Jun 1, 2020
149
539
Based off of the distortion in the signature I would guess that is a post effect, it looks like a wave distortion and some kind of guassian blur
It could be matte painting made in photoshop since it’s mentioned on his twitter. I’m also curious about perfect hands, eyes, feet he makes in his works, is it adetailer or something else
 

felldude

Active Member
Aug 26, 2017
572
1,695
It could be matte painting made in photoshop since it’s mentioned on his twitter. I’m also curious about perfect hands, eyes, feet he makes in his works, is it adetailer or something else
A 3D sourced image you can blend elements together, AI does hair, 3D does hands and feet as well as posing.
 
  • Like
Reactions: Onetrueking