[Stable Diffusion] Prompt Sharing and Learning Thread

Sharinel

Active Member
Dec 23, 2018
607
2,436
448
I've been using ComfyUI for quite a while, I've taught myself on how to create (simple) workflows to create images, upscale images, use ControlNet to use existing poses or even animations to reproduce this poses/animations with pre-defined images, do face replacements and all this stuff. All while using SD(XL) models. And it's pretty amazing what you can do, I love it! I even replaced my NVIDIA RTX 2080Ti with an NVIDIA RTX 4090 to be able to circumvent the limited VRAM of the 2080Ti. But that's when I stopped teaching myself new things. I don't know anything about Pony, Illustrious, Qwen or anything like that, I'm just seeing your posts and I'm... wow!

And now there's Image-2-Video and Text-2-Video using Wan 2.1/2.2, which I'm very interested in, but I'm totally lost! So here I am asking you guys if you can help me out.

First, I've started using the official Wan templates like "Wan 2.2 14B Text to Video" or "Wan 2.2 14B Image to Video" which can be found in the "Templates" section of the ComfyUI menu. I've downloaded the models for it, and while those workflows do work, they seem to be slow and "limited" when it comes to the length and resulution of the video.. And because you'll usually want to create a couple of videos with the same prompt to pick the best result, this might take many hours, maybe even a couple of days to get the video you're looking for. And it's only like 5 seconds long...

Next I was looking for some "optimized" workflows people share online, and first I found set of workflows on civit.ai, and I've been trying it out. I do like the "WAN 2.2 I2V" workflow included, because it seems to be faster and has more options, but I still feel limited to when it comes to the resolution and length of the video because it uses ".safetenors" models which uses a lot of VRAM. I can still get 5 seconds videos with a decent resolution, or I can get a longer video with a poor resolution.

Then I thought I might go for GGUF models instead, because from what I understand, they do use less VRAM, but they are "compressed" and therefore might take longer. I don't mind waiting a couple of minutes for results if I can use more frames (= longer videos) or a higher resolution than with the "default" workflow. So I found , which is very impressive, uses GGUF, has a bunch of options, and after downloading all the missing nodes and models (as well as fixing a "bug" in the workflow itself) it's producing decent results within a couple of minutes. I've been able to create a few videos of 20+ seconds (at 24 FPS) with a resolution of 480x800, but as soon as I add action prompts for the camera or the subject in the picture (btw: no additional LoRAs are involved), the video gets blurry (looks like a double- or even multi-exposures when talking about photographs) or it just doesn't follow the prompt (i.e. if the prompt says "the camera slowly zooms in toward the woman's face", it zooms-in for about 3 seconds, then zooms back out and repeats those steps until the end of the clip -- even if I add something like "at second 5, the camera stops completely and remains entirely static for the rest of the video. there is no zooming, panning, or movement after this point — the frame stays locked on her face.")

So here are my question:
  • What's your overall workflow to create a 10-20+ second high-resolution video based on your imgination/prompt?
    • The resulting video should be produced in a couple of minutes (5-15 minutes at most, not hours).
      • What's your Text-2-Image workflow you use to create your starting image?
      • What's your Image-2-Video workflow to produce a 10-20+ second video with a decent (720p) resolution?
      • What's your workflow to upscale the video to a HD resolution (1280p or even 1440p)?
  • What prompt (or LoRA) do you use to consistently "control" the camera movements (zoom in, zoom out, keep being static at a close-up etc.)
Any help is highly appreciated. I would love to end up with with like 3-4 workflows in total (1: create a starting/ending image for the video / 2: create an at least 10-20+ second video with "precise" camera movement / 3: upscale the video to at least 1280p).

TL;DR: if you share your workflows to create a 20+ seconds video with precise camera (and subject) actions, or are able to point me into the right direction where to research further, I will be in your dept forever :)
I'm running a 4090 as well and tbh I don't think it has the VRAM to do 10-20 secs videos. At the moment I'm running 5 sec videos at 768x768 or equivalent ratio that take 4 mins or so to produce. I've attached the json file I use, on the list of lora only the top one is needed, the rest are NSFW loras to do specific things.
The good thing about this workflow is that you get a final image which you can then use to kick off the next video, and it also uses interpolate to increase the fps/vid sizes. I'm sure if you've been downloading the models you probably have these.

Here is an example of the output

View attachment Deadwood Vibes Video 02.mp4
 

Sepheyer

Well-Known Member
Dec 21, 2020
1,702
4,184
448
TL;DR: if you share your workflows to create a 20+ seconds video with precise camera (and subject) actions, or are able to point me into the right direction where to research further, I will be in your dept forever :)
Video right now is in the same state as Stable Diffusion 1.0 was a few years ago - even with all the extra tools it is limited in what you can do.

On the other hand image generation leapfrogged with Qwen Edit 2509 and probably is the area to focus on "today" because finally you are starting to get solid control over character consistency to the point you can easily throw together a visual novel with it. Even if you don't want a VN, the renders can be made to look like stills from a high-end movie or a photoshoot.

So yea, I'd say forget about WAN etc for now and see where QE2509 can take you.

Comfy has templates for Qwen, so do use them. There is about a week-long learning curve where you mostly throw prompts at it and learn how it reponds. Generally you want the 8 step lightning lora w/ Qwen - it offers great balance between speed and quality. Oh, and you do want to familiarize yourself with Illustrious, just because this is what you use to generate the characters that Qwen will end up composing into final scene renders.

Yea so what? The main selling point is Qwen finally gives folks a full movie set/studio at home. You generate a bunch of actresses/talen with Illustrious and then Qwen puts them into stories. Like each of the stills below in real life would take about $5,000 to produce - starting with talent pay, equipment rent, makeup artist, location scout etc, etc. Just as everybody else I did "student films" with friends and I do know that each of these scenes below would be well out of my reach in terms of money, time and effort. Yet Qwen finally gives folks like me a tool to "scratch that itch".


 
Last edited:

JhonLui

Well-Known Member
Jan 13, 2020
1,141
1,119
284
Afaik, long video= Framepack Studio.
It uses Hunyuan so it has its downsides, but also a lot of + (time control, lora suite, start-endframe setting, frame editing, good upscalers/joiner) and it's extremely fast (compared to the average). Also works with 6GB cards...
 
  • Like
Reactions: Sir.Fred

theMickey_

Engaged Member
Mar 19, 2020
2,328
3,120
357
Sharinel Sepheyer Thanks for sharing your workflow and your insights, I much appreciate it! I will have a closer look into your posts as soon as I find some spare time to do so, and will probably try to (re)create my own multi-step-workflow to get a 10+ seconds video based on your suggestions. Beginning with something like Illustrious to create some characters in high quality, use QE2509 (because character consistency) to compose a starting scene image with those characters and then use an I2V workflow to create the final video. I guess that's the way to go forward to get what I want.

And by creating my "own" workflow for each step (even if I end up with basically the exact same nodes/steps from your workflow or any existing templates), I'll hopefully understand how each individual workflow does work, which always has to be my goal.

Again, thank you very much for your inputs, I'll keep you posted (might take while though)!
 

kaamist

Newbie
May 22, 2023
81
36
141
Any good LoRa to work on SD1.5 to give large decorative Golden or Diamond Ornaments or Jewelries ?? Here I get one good lora to get Thick hard pluffy nipples,,, Hope to get this one too !! ANy help kindly do consider !!
 

Sepheyer

Well-Known Member
Dec 21, 2020
1,702
4,184
448
Oh, ok, so one uses a python lib to combine multiple ST files into one:

1761062667437.png

-----------------------------

A question. So HuggingFace sometimes posts models that consist of multiple files, like so:

1761062155525.png



How do you use or combine these files into one file? Thanks!
 
Last edited:

JhonLui

Well-Known Member
Jan 13, 2020
1,141
1,119
284
A question. So HuggingFace sometimes posts models that consist of multiple files, like so:

View attachment 5363241



How do you use or combine these files into one file? Thanks!
You don't, open "model_safetensor_index.json" with a text editor and it will be clear.
You have to install both of them.
Maybe they separated the indexes to make it lighter or faster.
 
  • Like
Reactions: Sepheyer

Sepheyer

Well-Known Member
Dec 21, 2020
1,702
4,184
448
Anyone got a workflow for creating icons?

I tried using Qwen 2509 and OmniGen2, and I am still here asking this question :)
 

ElKaya

Newbie
Nov 4, 2018
15
17
136
How do you create such almost realistic videos? I tried everything that was at the beginning of the page, installing everything, and when I try to use it, everything comes out distorted. I'm sure I missed something or didn't look carefully at how to do it.
 

Sharinel

Active Member
Dec 23, 2018
607
2,436
448
How do you create such almost realistic videos? I tried everything that was at the beginning of the page, installing everything, and when I try to use it, everything comes out distorted. I'm sure I missed something or didn't look carefully at how to do it.
Distorted how? Can you put up a screenshot of what you are seeing? Then maybe a screenshot of your workflow?