Hunyuan Video T2V Native workflow

I've been testing hunyuan video to the limit where I could get high quality result and reasonable time, and I'd like to share everyone.

Requirements

Minimum Hardware: 8GB VRAM and 16GB RAM (Tested with an RTX 3070 8GB)

Also you'll need GGUF version of Hunyuan Video model,

Download hunyuan-video-t2v-720p-Q4_1 model from here : https://huggingface.co/city96/HunyuanVideo-gguf/tree/main

Installing Sage Attention in your ComfyUI environment is crucial, as it significantly reduces processing time (nearly by half)

For installation instructions, please refer to this guide: https://brewni.com/Genai/A9-eAS7R?tag=0

Workflow

Pastebin : https://pastebin.com/pwfzZFZ2

Custom Nodes

You have to install these cutoms Nodes :

ComfyUI-MultiGPU
ComfyUI-HunyuanVideoWrapper (HY Feta Enhance Node)
Comfy-WaveSpeed
ComfyUI-FreeMemory
ComfyUI-VideoHelperSuite

MultiGPU

(UnetLoaderGGUFDisTorchMultiGPU, DualCLIPLoaderMultiGPU, VAELoaderMultiGPU)

these will enables offloading GGUF, CLIP, and VAE models to system RAM

Offloading models does not reduce generation time but allows for higher resolutions and longer video lengths by freeing VRAM

Avoid offloading the VAE to CPU, as it will drastically increase latent decoding time

HunyuanVideoWrapper

(HY Feta Enhance)

HY Feta Enhance Node node provides a subtle but noticeable improvement in output quality

WaveSpeed

(Apply First Block Cache)

This will also reduce your generation time almost half. A higher residual diff threshold further reduces processing time but may introduce more noise

FreeMemory

(Free Memory (Latent))

It will preventing VRAM Out-of-Memory (OOM) errors between the Sampler and Tiled VAE Decode nodes. Make sure this node is connected and the aggressive setting is enabled.

VAE Decode

Make sure adjust tile size to below 256 or 192, or It will give you Vram OOM

EmptyHunyuanLatentVideo

720 x 720px, 41 frames, 20 steps takes 3:39

720 x 720px, 65 frames, 20 steps takes 5:34

Given my current PC specifications (8GB VRAM and 16GB RAM), I believe this workflow represents maximum limit of optimal balance between high resolution and reasonable generation times.

Attempting to render videos with more than 65 frames results in an exponential increase in processing time. Adjust frame counts based on your hardware capabilities and desired output.

I tested 520x520px with 121 frames, and while it rendered in a reasonable time, the quality degradation was too severe to be considered worthwhile. (Due to low resolution)

#💬

#🖼️

😭

4️⃣0️⃣4️⃣ ¯\_(ツ)_/¯

🤔 ❌📄

🤔 🛠️🖥