Tutorial Generating AI Img2Vid or Txt2Vid - Beginner Guide to "WanGP v8.61 by DeepBeepMeep"

Psan2022

Member
Mar 8, 2022
111
167
167
Beginner Guide to "WanGP v8.61 by DeepBeepMeep"

Also see this Thread how to make AI images using Stable Diffusion from Automatic1111: Generating AI images with StableDiffusion - Beginner Guide
I know there are more fine tuned and maybe faster ways (like using Comfy UI) that give you more freedom in setting things up but with this method the learning curve is not as steep as with Comfy UI. Todays plan is to show you how you can make a video out of text or out of an existing image using this awsome tool. At the end we will be able to make something like this:
You don't have permission to view the spoiler content. Log in or register now.
What you need:

1. A good NVIDIA GPU
-> Even with my 4080 Super a 5 second Video in 480p@16 FPS takes about 20 minutes to generate. Depending on the video model sometimes even longer.

2. For ease of use we will use the Pinokio Application
-> You can download it here:
-> Install the Application
-> Next klick on Discover and type "Wan 2.1" into the search bar.
-> Klick on "Wan 2.1" and install. (As the github api name you can put anything you like or leave it as it is.) This may take a while so sit back and sip a coffee. Be aware that you will also need quite a lot of free diskspace since these video models take up a lot of it.

-> If you want a manual installation of WanGP then refer to this GitHub Page:

After the first install:

After the first installation you will be greeted by this window in Pinokio:
You don't have permission to view the spoiler content. Log in or register now.
Klick on the Start in the upper left corner and wait a little bit. After a short while you will see the Wan UI in the Pinokio App:
You don't have permission to view the spoiler content. Log in or register now.
You could in theory start to generate in this small window but I prefer using the Web UI. As you can see you have an URL after starting Wan2.1. Copy that URL und paste it into your prefered Web Browser (it is usually ). It will look something like this:
You don't have permission to view the spoiler content. Log in or register now.

Sidenotes (Pinokio App/ Wan2.1):

1. You can use Pinokio to install many different AI driven Tools. Like LLMs for example.
2. You can access your files from any given AI tool (Wan2.1 as well) from the Pinokio GUI.
3. Remember to start Pinokio and WAN2.1 before accessing the URL otherwise it wont work.
4. The main folder for WAN is under the Pinokio installation directory. For me its this. It may look a little bit differently on your end. You might have a different installation path for Pinokio. I have also while installing Wan2.1 renamed the api to deepbeepmeep. All of the needed and important folders are under the "app" folder.
-> D:\Software\Pinokio\api\deepbeepmeep\app\
5. In the "app" folder there are folders you might recognize when using Stable Diffusion or Comfy UI. There are for example Lora, model and outputs folders. What Loras do and what they are please refer to the Stable Diffusion Thread linked at the top of the Tutorial.
6. In the WanGP Web UI you have a "Guide" Tab. Here you can see almost all of the included models and what they are good for.
7. After selecting a model it will download automatically when you press on generate (it may take a while since those are quite large in size)

Interface of the Web UI:

1. At the top you have some very usefull tabs you should take note of:
1758534537227.png
-> The "Video Generator" Tab is the most important one. Here you have all of the settings and options you need to generate your AI video or image.
-> "Guides" is a very useful tab since here are many of the video and image models listed and their usecases.
-> Video Mask Creator can isolate a person or object to create either a black/white mask or to create a greenscreen like effect.
-> The Downloads tab provides a button for downloading a Lora Pack for your videos and images (takes up a little bit of space). You may need a Lora to enable certain actions in your videos/images. Loras are trained "addon" like models that enhance the generation with their trained data. If you need more Info what a Lora is please refer to the Stable Diffusion Link at the Top.
-> In Configuration

Selecting a model and generating a video:
You don't have permission to view the spoiler content. Log in or register now.
You have a vast selection of many img2vid, text2img, text2vid etc. models. What model brings you the best results varies by what you want to achieve. For example with the Controlnet ones you can put a video as a "guide" and generate a new video around it. Or there is Multitalk. With Multitalk you can put in a picture and an audiotrack of a monolog or dialogue (if 2 or more people are talking for example). That audiotrack will be used to lipsynch the ai generated video from that image.

Be aware of the number next to the model name. If you have a graphics card that has 12GB VRAM and more you can use the models with an instructionset of 14B. Otherwise i would not recommend since you will either run into issues like aborting mid generation or it will just be stuck forever. Those models are also quite large in size so keep that in mind also. Even with my 4080 Super i need to wait for like 20 minutes to get a 480p video that has 16fps and 80 frames (5 seconds of video). So pick your poison.

Depending on what Model you select the interface will change!

For the sake of more options and smaller instructionsets lets choose Wan2.1. Lets try "Fun InP image2video 1.3B" since it has only 1.3B parameters.
It will look something like this:
You don't have permission to view the spoiler content. Log in or register now.
-> Now select "Start Video with Image" and select an image you want to animate.
-> Put in your prompt what the subject is going to do. The more precise the better.
-> Select the category of the size of the video. Remember the bigger the video the longer it will tage to generate.
-> Select the Resolution Budget. Pick a resolution budget as close to your Aspect Ratio as possible (but it does not really matter since it will reallocate the pixels to the ratio of the provided picture). Please note that even if you select an aspect ratio of 16:9 the output will not be in the selected aspect ratio. All of that depends on the provided picture.
-> The number of frames dictates the length of the video. As you can see the Fun InP Model has a 16fps cap. So if you want to make a 5 Second Video you need 80 frames in total. Every model has a frame limit. To circumvent this problem there is a feature called "Sliding Window" that is automatically enabled by default. You can find this option if you tick the "Advanced Mode checkbox under inference steps)
-> Number of Inference steps. Similar to the inference steps in Stable Diffusion, a larger step count usually results in better results but it varies. Some models have their sweetspot and after a certain step count it just takes longer to generate without any noticable benefit.

Generating a video from an image:

I chose a portrait of a woman I generated in Stable Diffusion.
The aspect ratio is roughly 1:1
For faster generation I chose 480p and 720x720 pixel budget.
I have about 80 Frames to make a 5 second video.
I bumped the inference steps to 30.
-> Generation Time with my RTX 5080 Super: About 5 Minutes
For reference. Generating the video with the same picture but with 50 inference steps instead of 30 takes about 9 minutes with my equiptment

View attachment 2025-09-22-13h02m24s_seed41673287_the woman smiles slightly as she is looking at the.mp4
















If you want to try it yourself with this example here the base image i used:
You don't have permission to view the spoiler content. Log in or register now.
As for the advanced Options.
Here you can find a lot more things you can adjust.
The main options you might want to change at some point are the seed and the CFG.
Every picture you generate has a specific seed tied to it. If you want to change things only slightly then Keep the seed the same. As for CFG ist how much control you give the AI model to change and manipulate the given Generation. You can also experiment with different Samplers to see how it changes the outcome of your generation.

In the advanced options you can also find the post processing tab which also might be quite interesting. Here you can turn on temporal and spatial upsampling. Meaning you can more or less double the framerate and upscale your generated video. So if you select like a 720x720 video from the 480p category you can upscale it to 1088x1088p with a 1.5 upscale. Same with the framerate.

Generating a video from text:

Just for the sake of showing you also the text to video option. Select an text to video model and the interface will change acordingly.
The only major difference is that we use just a prompt instead of a base picture.

You don't have permission to view the spoiler content. Log in or register now.
Result:
View attachment 2025-09-22-14h01m27s_seed93645115_A beautiful korean woman walks straight towards vi.mp4

















So thats it for now. If you have questions or want something specific feel free to ask in this thread. Have a good one!
 

Psan2022

Member
Mar 8, 2022
111
167
167
What about us with AMD GPUs? :(
Hmm, very good question. I will see what I can find. There must be certainly a way.
In the meantime you could try WanGP over Pinokio anyways and see if and how it works. I dont have an AMD GPU so I cant test it.

Can a RTX 5060 handle the job?
It should but you will have to either ajust the resolution of the generated video or you will have to have a lot of patience since it may take significantly longer.
 

JanMaster

New Member
May 31, 2017
2
0
185
I get an error when trying to start Wan. "This app requires NVIDIA GPU" For the record, I have a brand new 5060TI.
 

Psan2022

Member
Mar 8, 2022
111
167
167
I get an error when trying to start Wan. "This app requires NVIDIA GPU" For the record, I have a brand new 5060TI.
WanGP was updated a few days ago to the newest 9.10 version. Regarding a few reddit posts there was an issue with 5000 Nvidia graphics cards. Should be working now if you update