42.uk Research

Engineering Log: Deploying Generative Microsites via Promptus Architecture

1,008 words 6 min read SS 96

BLUF: the hard part of turning a local ComfyUI workflow into a public product is not image quality. It is state management. Long-running generations, browser disconnects, GPU.

Promptus UI

Engineering Log: Deploying Generative Microsites via Promptus Architecture

BLUF: the hard part of turning a local ComfyUI workflow into a public product is not image quality. It is state management. Long-running generations, browser disconnects, GPU memory pressure, and queue contention all show up before model quality becomes the blocker. A deployable microsite needs an asynchronous job contract, predictable retry behaviour, and clear limits on concurrency.

1. The Deployment Gap

Inside a local lab environment, a workflow can survive on a direct connection between browser and inference host. The operator is nearby, the hardware is known, and a failed generation is usually just an inconvenience. Once the same workflow is exposed to the public internet, the operating assumptions change immediately. Tabs sleep, mobile radios flap, HTTP requests time out, and users retry impatiently while the GPU is still working on the first request.

That is why a public-facing generative microsite cannot be treated as a thin wrapper around localhost:8188. The workload is bursty, the latency is uneven, and the expensive asset is not the web server. It is the GPU session sitting behind it.

2. Why Direct WebSocket Exposure Fails

A naive deployment usually looks like this:

  1. The user submits a prompt or reference image.
  2. The frontend opens a long-lived request or socket to the inference backend.
  3. The backend starts the generation and streams progress back to the browser.
  4. If the client disconnects, the server loses the consumer while the GPU keeps working.

That failure mode creates what operators usually call ghost jobs: work that still consumes VRAM and wall-clock time after the user session has effectively vanished. On consumer cards such as the RTX 4090, that is enough to trigger a second-order failure. The next request lands while memory is still fragmented or occupied, and the system starts dropping work for reasons that are hard to explain at the UI layer.

3. Queue-Based Microsite Architecture

The more reliable pattern is an asynchronous job queue sitting between the site and the model host. In practical terms, the frontend does not own the inference lifecycle. It submits a request, receives a job identifier, and polls or subscribes to status updates while the middleware manages the GPU session.

That separation gives the system a few concrete advantages:

That is the real value of a managed microsite layer such as Promptus in this context. The win is not no-code convenience. The win is that the queue, status contract, and lifecycle rules are already standardized.

4. Resource Planning on Midrange and High-End GPUs

When capacity planning these deployments, it helps to separate cold-start cost from steady-state throughput. A direct tunnel to a permanently loaded local machine can have very low startup latency, but it does not scale gracefully once more than one user arrives. A managed queue can tolerate more users, but each request pays some orchestration overhead.

Metric Direct Tunnel Queued Microsite
Cold start Low if the model is already resident Higher, but predictable
Concurrent users Usually serial or near-serial Queue-controlled and easier to budget
Failure recovery Often manual Retry and requeue are first-class behaviours
Operational visibility Spread across logs and browser state Centralized around job records

For SDXL-class work on a 24 GB card, the job queue is usually the difference between a stable demo and a support burden. The absolute number of concurrent requests still depends on sampler choice, image dimensions, upscaling path, and whether the model is being reloaded between jobs, but the queue lets you degrade predictably rather than catastrophically.

5. Workflow Contract Requirements

A deployable ComfyUI workflow needs a clean input and output contract. The microsite should know which fields are externalized and which are fixed. At minimum, the public contract should be explicit about:

If those rules are not made explicit, the frontend starts leaking backend assumptions into the public interface. That is how small template changes turn into production regressions.

6. Operational Notes That Matter in Practice

A few operating rules matter more than people expect:

These are not abstract concerns. They are the details that decide whether a generative microsite feels engineered or improvised.

7. Takeaway

The strongest design move in this category is to decouple user interaction from model execution. Once the request becomes a job instead of a live browser-owned session, the rest of the system becomes easier to reason about: retries, queue depth, rate limits, warmup, billing, and observability all become explicit pieces of infrastructure rather than accidental side effects of a demo stack.

That is the architectural shift this deployment pattern is really documenting. The model can stay experimental. The interface to the model cannot.

Continue Your Journey

Created: 31 January 2026

📚 Explore More Articles

Discover more AI tutorials, ComfyUI workflows, and research insights

Browse All Articles →
Views: ...