Hunyuan Video T2V Native workflow [8GB VRAM]
anon
515
2025-03-05 13:49




Hunyuan Video T2V Native workflow


I've been testing hunyuan video to the limit where I could get high quality result and reasonable time, and I'd like to share everyone.


Requirements


  • Minimum Hardware: 8GB VRAM and 16GB RAM (Tested with an RTX 3070 8GB)


Also you'll need GGUF version of Hunyuan Video model,

Download hunyuan-video-t2v-720p-Q4_1 model from here : https://huggingface.co/city96/HunyuanVideo-gguf/tree/main


Installing Sage Attention in your ComfyUI environment is crucial, as it significantly reduces processing time (nearly by half)


For installation instructions, please refer to this guide: https://brewni.com/Genai/A9-eAS7R?tag=0


Workflow



Pastebin : https://pastebin.com/pwfzZFZ2


Custom Nodes


You have to install these cutoms Nodes :



MultiGPU

(UnetLoaderGGUFDisTorchMultiGPU, DualCLIPLoaderMultiGPU, VAELoaderMultiGPU)


these will enables offloading GGUF, CLIP, and VAE models to system RAM


Offloading models does not reduce generation time but allows for higher resolutions and longer video lengths by freeing VRAM


Avoid offloading the VAE to CPU, as it will drastically increase latent decoding time


HunyuanVideoWrapper

(HY Feta Enhance)


HY Feta Enhance Node node provides a subtle but noticeable improvement in output quality


WaveSpeed

(Apply First Block Cache)


This will also reduce your generation time almost half. A higher residual diff threshold further reduces processing time but may introduce more noise


FreeMemory

(Free Memory (Latent))


It will preventing VRAM Out-of-Memory (OOM) errors between the Sampler and Tiled VAE Decode nodes. Make sure this node is connected and the aggressive setting is enabled.


VAE Decode


Make sure adjust tile size to below 256 or 192, or It will give you Vram OOM


EmptyHunyuanLatentVideo


720 x 720px, 41 frames, 20 steps takes 3:39



720 x 720px, 65 frames, 20 steps takes 5:34




Given my current PC specifications (8GB VRAM and 16GB RAM), I believe this workflow represents maximum limit of optimal balance between high resolution and reasonable generation times.


  • Attempting to render videos with more than 65 frames results in an exponential increase in processing time. Adjust frame counts based on your hardware capabilities and desired output.


  • I tested 520x520px with 121 frames, and while it rendered in a reasonable time, the quality degradation was too severe to be considered worthwhile. (Due to low resolution)


0
#💬
#🖼️
😭
4️⃣0️⃣4️⃣ ¯\_(ツ)_/¯
🤔 ❌📄
🤔 🛠️🖥