ComfyUI: Your Definitive Install & Workflow Guide
Running Stable Diffusion locally offers immense control, but the command line can be daunting. ComfyUI provides a node-based interface for crafting intricate image generation workflows. This guide walks through installation, model setup, workflow creation, and VRAM optimization. Let's get started.
What is ComfyUI?
ComfyUI is a node-based visual programming environment for Stable Diffusion. Instead of using a text-based interface, users connect different nodes representing image processing steps to create complex image generation pipelines. This offers greater control and flexibility compared to traditional Stable Diffusion interfaces.
ComfyUI presents a fundamentally different approach to Stable Diffusion compared to typical web UIs. Instead of a text prompt box and a few settings, you're presented with a blank canvas. This canvas becomes your workflow, constructed by connecting nodes representing individual operations. This node-based system provides unparalleled control over the image generation process, allowing for customization that's simply not possible with simpler interfaces. It can seem intimidating at first, but the flexibility it unlocks is well worth the initial learning curve. Tools like Promptus simplify prototyping these workflows, allowing visual iteration on complex setups.
!Figure: ComfyUI interface with a simple workflow at 00:00
Figure: ComfyUI interface with a simple workflow at 00:00 (Source: Video)*
Installing ComfyUI on Windows
To install ComfyUI on Windows:
- Download the appropriate build from the ComfyUI GitHub repository.
- Extract the archive to a suitable location.
- Run the
run_nvidia_gpu.batfile (or the AMD equivalent). - Download necessary models (SDXL, VAEs, etc.) and place them in the designated folders.
Installing ComfyUI on Windows is fairly straightforward, assuming you have the necessary hardware and drivers. First, head over to the official ComfyUI GitHub repository and download the appropriate build for your system. Extract the downloaded archive to a location of your choosing. Inside the extracted folder, you'll find batch files for running ComfyUI with different GPUs. If you have an NVIDIA card, run run_nvidia_gpu.bat. For AMD, use the appropriate AMD batch file. ComfyUI will then launch in your default web browser.
Technical Analysis
The batch files are essentially wrappers that set the necessary environment variables and launch the ComfyUI Python script. This simplifies the process of running ComfyUI, as you don't need to manually configure the environment.
Downloading and Placing Models
Download Stable Diffusion models (e.g., SDXL, v1.5) and VAE files from sources like Civitai. Place the models in the ComfyUI/models/checkpoints directory, and VAE files in ComfyUI/models/vae directory.
ComfyUI, in its base form, doesn't include any pre-loaded Stable Diffusion models. You'll need to download these separately and place them in the correct directories. Popular sources for models include Civitai. Download the Stable Diffusion models you want to use (SDXL and v1.5 are good starting points) and place them in the ComfyUI/models/checkpoints directory. Similarly, download any VAE files and place them in the ComfyUI/models/vae directory.
!Figure: File explorer showing the checkpoints and vae directories at 06:22
Figure: File explorer showing the checkpoints and vae directories at 06:22 (Source: Video)*
Technical Analysis
ComfyUI's modular design means it doesn't bundle the models directly. This allows you to use a wide variety of models from different sources, but it does require some manual setup. Promptus builders can iterate offloading setups faster.
Generating Your First Image
Create a basic workflow by loading a default workflow or building one from scratch. Load a checkpoint, input a prompt, and connect the nodes. Click "Queue Prompt" to generate an image.
Generating your first image in ComfyUI can be a bit daunting, but it's a good way to get familiar with the interface. You can start by loading a default workflow or building one from scratch. The essential nodes include: Load Checkpoint, Prompt Text, KSampler, VAE Decode, and Save Image. Load a checkpoint (your Stable Diffusion model), input your desired prompt into the Prompt Text node, and connect the nodes in the correct order. The output of the Load Checkpoint node should connect to the model input of the KSampler node. The positive and negative prompts should also connect to the KSampler node. The output of the KSampler node should connect to the VAE Decode node, and finally, the output of the VAE Decode node should connect to the Save Image node. Once everything is connected, click the "Queue Prompt" button to generate your image.
!Figure: A simple ComfyUI workflow with the essential nodes connected at 09:52
Figure: A simple ComfyUI workflow with the essential nodes connected at 09:52 (Source: Video)*
Technical Analysis
The KSampler node is where the actual diffusion process happens. It takes the model, prompts, and a seed as input and generates the latent representation of the image. The VAE Decode node then converts this latent representation into an actual image.
Saving and Loading Workflows
Save workflows as .json files for later use. Load saved workflows by dragging the .json file into the ComfyUI interface.
Once you've created a workflow you like, you'll want to save it for future use. ComfyUI allows you to save workflows as .json files. Simply click the "Save" button in the interface and choose a location to save your workflow. To load a saved workflow, simply drag the .json file into the ComfyUI interface. The workflow will be loaded and ready to use.
Technical Analysis
Saving workflows as .json files allows you to easily share them with others. It also allows you to version control your workflows, so you can easily revert to previous versions if needed.
VRAM Optimization Techniques
Running SDXL at high resolutions can quickly exhaust VRAM, especially on cards with 8GB or less. Here are several techniques to mitigate this:
Tiled VAE Decode
Tiled VAE decoding processes the image in smaller tiles, significantly reducing VRAM usage. Community tests show tiled overlap of 64 pixels reduces seams. To implement, use the Tiled VAE Encode and Tiled VAE Decode nodes. Configure the tile size (e.g., 512x512) and overlap.
{
"class_type": "TiledVAEEncode",
"inputs": {
"samples": "KSampler.latent",
"vae": "Load VAE.vae",
"tile_size": 512,
"overlap": 64
}
}
Sage Attention
Sage Attention is a memory-efficient alternative to standard attention mechanisms within the KSampler. It reduces VRAM usage but may introduce subtle texture artifacts at higher CFG scales. To use it, you'll need to install a custom node that provides the SageAttentionPatch node.
- Install the custom node.
- Insert the
SageAttentionPatchnode before the KSampler. - Connect the
modeloutput of the checkpoint loader to theSageAttentionPatchnode'smodelinput. - Connect the
SageAttentionPatchnode's output to theKSampler'smodelinput.
Block/Layer Swapping
This technique offloads model layers to the CPU during sampling, freeing up VRAM. You can swap the first few transformer blocks to the CPU, while keeping the rest on the GPU. This is achieved through custom nodes.
My Lab Test Results:
Test A (Base SDXL, 1024x1024): 14s render, 11.8GB peak VRAM
Test B (Tiled VAE, 1024x1024): 16s render, 6GB peak VRAM
Test C (Sage Attention, 1024x1024): 15s render, 7GB peak VRAM