Really? When people do it on youtube they alway seem to get fantastic results.Openpose isn't perfect, it has a lot of issues determining front and back legs etc
Notice most the time the person is standing, facing forwards with the leg positions clearly defined.Really? When people do it on youtube they alway seem to get fantastic results.
View attachment 3770533
Here, they did hit every single pose correctly. Isn't that the whole idea, so that SD can't miss because it follows the exact limb layout? Maybe we're not skilled enough..
High chance of duds. Even more so on anything crawling, bending, etc. I used OpenPose heavily early on but eventually moved to other models - depth, canny and tile. May be you experimented with those already, but I would venture to say that you if you haven't then you will end up going with those over OpenPose once you try its alternatives.Really? When people do it on youtube they alway seem to get fantastic results.
View attachment 3770533
Here, they did hit every single pose correctly. Isn't that the whole idea, so that SD can't miss because it follows the exact limb layout? Maybe we're not skilled enough..
This is simple standing pose and working great, but for complex pose you need to use some trick.Really? When people do it on youtube they alway seem to get fantastic results.
View attachment 3770533
Here, they did hit every single pose correctly. Isn't that the whole idea, so that SD can't miss because it follows the exact limb layout? Maybe we're not skilled enough..
Have you tried clip interrogator?I took over an hour just to BLIP-2 Caption them, I was impressed with the BLIP-2 captioning on high quality highly complex images. use
Do you have any ideas, people, how to turn 3d images into high-realistic photo-like ones - but without losing the similarity/consistency with the original face and colors?Made a pose in daz3d, fast render at same resolution (not required but i also try i2i) and load it, then apply it.
We had a scorching debate about likeness not a six month ago I think. The eventual consensus was the likeness -- facewise -- is NOT retained per se. To keep the face one needs to borderline "copy" it over. So, depending on your threshold for likeness this can work either very well or not at all:Have you tried clip interrogator?
You must be registered to see the links
Do you have any ideas, people, how to turn 3d images into high-realistic photo-like ones - but without losing the similarity/consistency with the original face and colors?
What I use so far is just inpainting eyes and mouth with realvisxl, then filmgrain.
But maybe you have much more clever ideas, or maybe even some magic comfyui configs?
Something to make the textures of skin/hair/clothes more realistic?
I can just barley run BLIP-2 by itself for natural language I found it far superior to BLIP.Have you tried clip interrogator?
You must be registered to see the links
You definitely can, use colab or replicate or whatever. Look, clip interrogation with blip2-2.7b, with sdxl prompt mode.I don't think I could run the clip interrogation with BLIP-2
I did see the BLIP-2 but I am guessing that nearly any CLIP model won't fit into my VRAM with BLIP-2, but it looks like an interesting tool and I wasn't aware of it, I might try it later. (EDIT: I could probably run it FP8 assuming some of the 15GB FP32 Model was offloaded when not in use)You definitely can. Look, clip interrogation with blip2-2.7b, with sdxl prompt mode.
And you can use your own dictionary for questioning, it can help.
Thanks for the test, my thoughts:View attachment 3777479
Heh, I don't know why violet tanktop.
With this prompt and your aspect ratio, I got this from realvisxl, so it catches some vibe.
View attachment 3777443 View attachment 3777444
The first sentence is just an output of the first launch of the blip2, before the loop of the interrogation. There're other default parameters in my build, to make it faster or smth, and it can be changed to be equal to your output, I think.But I felt the natural language was lacking if you compare the two:
Yeah I didn't mess with number of beams all that, I think the default settings where tested over a large group of images and then used a scale to not over describe or get the repeating terms. (Based of one article I read, I haven't played around at all with it)The first sentence is just an output of the first launch of the blip2, before the loop of the interrogation. There're other default parameters in my build, to make it faster or smth, and it can be changed to be equal to your output, I think.
Well...We had a scorching debate about likeness not a six month ago I think. The eventual consensus was the likeness -- facewise -- is NOT retained per se. To keep the face one needs to borderline "copy" it over. So, depending on your threshold for likeness this can work either very well or not at all:
https://f95zone.to/threads/stable-diffusion-prompt-sharing-and-learning-thread.146036/post-12670868
https://f95zone.to/threads/stable-diffusion-prompt-sharing-and-learning-thread.146036/post-12750138
https://f95zone.to/threads/stable-diffusion-prompt-sharing-and-learning-thread.146036/post-12669146
What program did you use for the blip 2 captioning?I have been playing around with PONY for awhile but I honestly might switch back to XL
I did a 2k training on 332 images of females flashing the camera in public places, (It was 600 images I pruned down the ones I thought wouldn't train well)
I took over an hour just to BLIP-2 Caption them, I was impressed with the BLIP-2 captioning on high quality highly complex images. I would not use it on a white background image however.
Here are some test images from 1024 up to 2k on the XXXL-V3 model
View attachment 3776302 View attachment 3776303 View attachment 3776304 View attachment 3776306 View attachment 3776307 View attachment 3776308
You have an amazing computer if you where parsing 4.5ish 2k images a second.What program did you use for the blip 2 captioning?
I used Kohya to caption 768 images and it took about 3 minutes, since yours took so long was it more detailed captions or something?