I've been testing hunyuan video to the limit where I could get high quality result and reasonable time, and I'd like to share everyone.
Requirements
Also you'll need GGUF version of Hunyuan Video model,
Download hunyuan-video-t2v-720p-Q4_1 model from here : https://huggingface.co/city96/HunyuanVideo-gguf/tree/main
Installing Sage Attention in your ComfyUI environment is crucial, as it significantly reduces processing time (nearly by half)
For installation instructions, please refer to this guide: https://brewni.com/Genai/A9-eAS7R?tag=0
Workflow
Pastebin : https://pastebin.com/pwfzZFZ2
Custom Nodes
You have to install these cutoms Nodes :
MultiGPU
(UnetLoaderGGUFDisTorchMultiGPU, DualCLIPLoaderMultiGPU, VAELoaderMultiGPU)
these will enables offloading GGUF, CLIP, and VAE models to system RAM
Offloading models does not reduce generation time but allows for higher resolutions and longer video lengths by freeing VRAM
Avoid offloading the VAE to CPU, as it will drastically increase latent decoding time
HunyuanVideoWrapper
(HY Feta Enhance)
HY Feta Enhance Node node provides a subtle but noticeable improvement in output quality
WaveSpeed
(Apply First Block Cache)
This will also reduce your generation time almost half. A higher residual diff threshold further reduces processing time but may introduce more noise
FreeMemory
(Free Memory (Latent))
It will preventing VRAM Out-of-Memory (OOM) errors between the Sampler and Tiled VAE Decode nodes. Make sure this node is connected and the aggressive setting is enabled.
VAE Decode
Make sure adjust tile size to below 256 or 192, or It will give you Vram OOM
EmptyHunyuanLatentVideo
720 x 720px, 41 frames, 20 steps takes 3:39
720 x 720px, 65 frames, 20 steps takes 5:34
Given my current PC specifications (8GB VRAM and 16GB RAM), I believe this workflow represents maximum limit of optimal balance between high resolution and reasonable generation times.