I. Introduction

Voice agents are more than a cool demo — they're being used in production settings across support, sales, and automation. At a high level, a real-time voice agent consists of:

But in practice, building one involves deep trade-offs across latency, orchestration, infrastructure, and tooling. This post shares a practitioner's take on how to build, optimize, and decide between platforms, frameworks, or custom implementations.


II. Core Components & Considerations

1. ASR (Speech-to-Text)


2. LLM (Understanding & Response)


3. TTS (Text-to-Speech)