Scaling Video Management Systems for Enterprise
How I approach VMS architecture: RTSP at scale, Go2RTC, and keeping latency low for real-time dashboards.
When you're building a Video Management System (VMS) that has to handle hundreds or thousands of RTSP streams, the usual playbook doesn't cut it. I've worked on systems where we needed sub-second latency from camera to dashboard while keeping CPU and memory in check. Here’s how I think about the architecture.
The core challenge
RTSP is stateful and relatively heavy per stream. Naively forwarding or re-encoding every stream in your backend doesn’t scale. You need a strategy that:
- Keeps ingestion and transcoding close to the edge or in a dedicated layer
- Exposes a simple, consumable format (e.g. HLS or WebRTC) to the frontend
- Lets you run AI or analytics on a subset of streams without re-ingesting everything
Go2RTC in the pipeline
Go2RTC is one of those tools that fits really well in this picture. It’s a small server that takes RTSP (and a few other inputs) and can expose them as HLS, MJPEG, or WebRTC. We use it as a stable, low-overhead bridge between cameras and our app layer. The important part is to run it in a way that matches your scale: one process per host with many streams, or one per stream if you need strict isolation. We leaned on one process per host and pinned streams to it so we could tune buffer sizes and timeouts per deployment.
Latency and buffering
For real-time dashboards, big buffers are your enemy. We keep HLS segment duration short (1–2 seconds) and tune Go2RTC’s buffer so we get low latency without stutter on shaky networks. For WebRTC we use the same Go2RTC backend so operators get the lowest latency when they need it, while most views stay on HLS for compatibility.
Metadata and AI
The VMS doesn’t just serve video; it often feeds AI or analytics. We run a separate pipeline that subscribes to a subset of streams (or snapshots), runs inference, and writes results (e.g. events, counts) to a store. The frontend then combines live video with this metadata. Keeping this pipeline async and decoupled from the main RTSP→HLS path is what lets us scale both ingestion and AI without blocking each other.
Takeaways
- Use a dedicated RTSP→HLS/WebRTC layer (e.g. Go2RTC) instead of overloading your app servers.
- Tune for low latency (short segments, small buffers) if the product is “live” first.
- Keep AI/analytics on a separate, async path so ingestion and inference scale independently.