Z-Image-Turbo: High-Speed Image Generation in ComfyUI
Running SDXL at reasonable speeds can be a chore, especially on older hardware. Z-Image-Turbo aims to address this, offering rapid image generation within ComfyUI with surprisingly good quality, even at low step counts. This guide delves into configuring and optimizing Z-Image-Turbo workflows for ComfyUI.
Z-Image-Turbo offers accelerated image generation in ComfyUI, balancing speed and quality. This guide explores its capabilities, workflow integration, optimization techniques, and troubleshooting tips.**
Initial Setup and Workflow Configuration
First, ensure ComfyUI is correctly installed. If you are new to ComfyUI, there are guides available to walk you through the process. With ComfyUI up and running, the next step is to acquire and integrate the Z-Image-Turbo model.
- Download the Z-Image-Turbo model: Obtain the necessary model files from Hugging Face. Place the downloaded model files into the appropriate ComfyUI models directory.
- Install necessary custom nodes: Ensure you have the required custom nodes installed. These nodes enhance ComfyUI's functionality and enable compatibility with Z-Image-Turbo.
- Load the model in ComfyUI: Use the appropriate ComfyUI nodes to load the Z-Image-Turbo model. This process typically involves specifying the modelโs path within the ComfyUI interface.
!Figure: Model Load Node at 0:15
Figure: Model Load Node at 0:15 (Source: Video)*
Technical Analysis
Setting up the initial workflow is straightforward. The core idea is to swap out the standard Stable Diffusion model with the Z-Image-Turbo version. This involves modifying the CheckpointLoader node in your existing workflows. Ensure the correct VAE is loaded alongside the model for optimal image decoding.
Text-to-Image Workflow
One of the primary use cases for Z-Image-Turbo is text-to-image generation. Hereโs how to set up a basic workflow:
- Load the Z-Image-Turbo model: Use a
CheckpointLoadernode to load the Z-Image-Turbo model into ComfyUI. - Create a text prompt: Use a
CLIPTextEncodenode to input your desired text prompt. - Configure the sampler: Use a
KSamplernode and connect it to the model and prompt. Adjust parameters such assteps,cfg, andsampler_name. Z-Image-Turbo often performs well with lower step counts (e.g., 6-12 steps). - Decode the latent image: Use a
VAEDecodenode to convert the latent image into a viewable image. - Save the image: Use a
Save Imagenode to save the generated image to your desired location.
Technical Analysis
The key here is the KSampler configuration. Z-Image-Turbo is designed for rapid sampling, so experiment with different samplers (Euler, DPM++ 2M Karras) and lower step counts. A higher CFG scale may be necessary to improve prompt adherence, but be mindful of potential artifacts.
Image-to-Image Workflow
Adapting the workflow for image-to-image generation involves incorporating an initial image:
- Load the Z-Image-Turbo model: As before, use a
CheckpointLoadernode to load the model. - Load the initial image: Use a
Load Imagenode to load the image you want to use as a starting point. - Encode the image into latent space: Use a
VAEEncodenode to encode the image into latent space. - Create a text prompt: Use a
CLIPTextEncodenode for your prompt. - Configure the sampler: Use a
KSamplernode, connecting the model, prompt, and encoded image. Adjust parameters, paying attention to thedenoiseparameter, which controls the strength of the initial image's influence. - Decode and save: Use
VAEDecodeandSave Imagenodes as in the text-to-image workflow.
!Figure: Image-to-Image Node Graph at 0:45
Figure: Image-to-Image Node Graph at 0:45 (Source: Video)*
Technical Analysis
The denoise parameter in the KSampler is crucial for image-to-image. A value of 1.0 means the initial image is completely replaced by the generated output, while 0.0 preserves the initial image entirely. Experiment with values between 0.4 and 0.7 for a good balance.
Inpainting Workflow
Inpainting allows you to selectively modify parts of an existing image:
- Load the Z-Image-Turbo model: Use a
CheckpointLoadernode. - Load the image and mask: Use
Load Imagenodes to load both the image and a mask indicating the area to be inpainted. - Encode the masked area: Use a
VAEEncodeForInpaintnode to encode the masked region of the image. - Create a text prompt: Use a
CLIPTextEncodenode to describe the desired content for the masked area. - Configure the sampler: Use a
KSamplernode, connecting the model, prompt, and encoded masked image. - Decode and combine: Use a
VAEDecodenode to decode the inpainted region, and then use a node likeImage Overlayto combine the inpainted region with the original image. - Save the image: Use a
Save Imagenode.
Technical Analysis
The VAEEncodeForInpaint node is essential. Ensure your mask is correctly aligned with the image. Experiment with different samplers and step counts to achieve the desired level of detail in the inpainted region. Pay attention to seamless blending between the original and inpainted areas.
Optimizing Performance and Memory Usage
Generating images, particularly at high resolutions, can strain your GPU's resources. Here are several strategies to optimize performance and reduce memory usage:
Lower step counts:** Z-Image-Turbo is designed to produce good results with fewer steps. Experiment with step counts between 6 and 12 to reduce generation time.
Tiled VAE Decode:** Using Tiled VAE Decode can significantly reduce VRAM usage, especially with high-resolution images. Community tests show tiled overlap of 64 pixels reduces seams.
Sage Attention:** Consider using Sage Attention as a memory-efficient alternative to standard attention in the KSampler workflow. Be aware that it might introduce subtle texture artifacts at high CFG scales.
Block/Layer Swapping:** Offload model layers to the CPU during sampling. For example, swap the first 3 transformer blocks to the CPU while keeping the rest on the GPU.
!Figure: VRAM Usage Comparison at 1:20
Figure: VRAM Usage Comparison at 1:20 (Source: Video)*
Technical Analysis
Tiled VAE decode is a must-have for larger images. Sage Attention offers a solid VRAM saving with a slight quality trade-off. Block swapping will slow things down but allows you to run larger models on cards with limited VRAM. These optimizations allow users with limited hardware to enjoy Z-Image-Turbo.
My Lab Test Results
| Test | Steps | Resolution | VRAM Usage | Render Time |
| ----------------------- | ----- | ---------- | ---------- | ----------- |
| Text-to-Image (Base) | 8 | 1024x1024 | 10.5GB | 8s |
| Image-to-Image (Denoise 0.5) | 8 | 1024x1024 | 11.2GB | 9s |
| Text-to-Image (Sage Attention) | 8 | 1024x1024 | 9.8GB | 10s |
| Text-to-Image (Tiled VAE) | 8 | 1024x1024 | 8.2GB | 11s |
Test rig: 4090/24GB.
My Recommended Stack
ComfyUI provides a flexible node-based system for creating intricate workflows. For streamlining the prototyping and optimization of these workflows, tools like Promptus AI can be invaluable. Promptus simplifies the process of creating, testing, and refining complex ComfyUI setups. Builders using Promptus can iterate these offloading setups faster, creating a better experience.
Resources & Tech Stack
Z-Image-Turbo Model:** Available on Hugging Face. This is the core model driving the image generation process.
ComfyUI:** The node-based interface for building and executing Stable Diffusion workflows. Download from ComfyUI Official.
Promptus AI:** A ComfyUI workflow builder and optimization platform. Learn more at www.promptus.ai/"Promptus AI.
Conclusion
Z-Image-Turbo offers a compelling alternative for generating images quickly within ComfyUI. While it may not match the absolute quality of some slower, more demanding models, its speed and efficiency make it a valuable tool, especially for iterative design and prototyping. Future improvements could focus on refining image quality at higher CFG scales and exploring even more aggressive optimization techniques.
Technical FAQ
Q: I'm getting CUDA out-of-memory errors. What can I do?**
A: Reduce the resolution of your images, lower the batch size, enable tiled VAE decode, or try using Sage Attention. If all else fails, consider block swapping to offload layers to the CPU.
Q: What are the minimum hardware requirements for running Z-Image-Turbo?**
A: While it can technically run on GPUs with as little as 6GB of VRAM with optimizations, a card with 8GB or more is recommended for smoother operation, particularly at higher resolutions. My 4090 handles it brilliantly.
Q: The generated images have strange artifacts. What's causing this?**
A: Artifacts can arise from several factors. Try adjusting the CFG scale, experimenting with different samplers, or ensuring your VAE is correctly loaded. If using Sage Attention, reduce the CFG scale slightly.
Q: How do I update ComfyUI and its custom nodes?**
A: Within your ComfyUI directory, run git pull to update ComfyUI itself. For custom nodes, refer to their respective documentation for update instructions. Some nodes have built-in update mechanisms.
Q: My model isn't loading. What's wrong?**
A: Double-check that the model files are in the correct directory and that youโve specified the correct path in the CheckpointLoader node. Ensure the model files are not corrupted. Restarting ComfyUI can sometimes resolve loading issues.
More Readings
Continue Your Journey (Internal 42.uk Research Resources)
Understanding ComfyUI Workflows for Beginners
Advanced Image Generation Techniques
VRAM Optimization Strategies for RTX Cards
Building Production-Ready AI Pipelines
Mastering Prompt Engineering: A Comprehensive Guide
Exploring Different Samplers in Stable Diffusion
Created: 23 January 2026