Requirements
Also you'll need GGUF version of WAN Video model,
Download wan2.1-i2v-14b-720p-Q4_K_S model from here : https://huggingface.co/city96/Wan2.1-I2V-14B-720P-gguf/tree/main
You'll need Git, Install Git using this link : https://git-scm.com/
The guide to install Sage Attention 1 is include in my previous Hunyuan Video T2V Native workflow post: https://brewni.com/Genai/ULNks9g1?tag=0
You have to follow these, otherwise Triton won't work
If you followed the guide, open terminal in your comfyui folder and type this,
git clone https://github.com/thu-ml/SageAttention.git
if you already installed sageattention 1, type this,
.\python_embeded\python.exe -m pip uninstall sageattention
Open SageAttention folder, Open terminal inside SageAttention folder and type this,
..\python_embeded\python.exe -m pip install -e .
the '--fast' arg will reduce your generation time about 20%, but you need to install nightly version of Pytorch
open terminal in your comfyui folder and type this,
.\python_embeded\python.exe -m pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
Edit your run_nvidia_gpu.bat with notepad, like this
.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --use-sage-attention --fast
Workflow
Pastebin : https://pastebin.com/P3C0xPU2
Custom Nodes
You have to install these cutoms Nodes :
MultiGPU
(UnetLoaderGGUFDisTorchMultiGPU, DualCLIPLoaderMultiGPU, VAELoaderMultiGPU)
these will enables offloading GGUF, CLIP, and VAE models to system RAM
Offloading models does not reduce generation time but allows for higher resolutions and longer video lengths by freeing VRAM
KJNodes
(WanVideo Tea Cache (native),TorchCompileModelWanVideo, Skip Layer Guidance Wan Video)
WanVideo Tea Cache (native)
this will cut the generation time almost in half
With Wan 2.1 14B model, high threshold won't produce much noise
TorchCompileModelWanVideo
It will boost your generation time, make sure to set mode max-autotune-no-cudagraphs otherwise It will give you error
Skip Layer Guidance Wan Video
this will improve your video quality a lot, make sure to set block 9, not 10
FreeMemory
(Free Memory (Latent))
It will preventing VRAM Out-of-Memory (OOM) errors between the Sampler and Tiled VAE Decode nodes.
VAE Decode
720 x 720px, 65 frames, 25 steps takes 11:00