Orion VC — Voice AI
a bit more about the market
Voice AI Market 2025
Voice is the most human interface as. It carries emotion, intonation and context. Finally, technology taken a significant leap forward:
Latency has dropped below 300 ms: conversations now feel natural with interruptions, tonality, and the rhythm of real dialogue.
Costs have collapsed: what used to cost hundreds of thousands is now available for cents per minute.
For businesses, this has created a new operational reality. SMBs miss, on average, 62% of inbound calls and replacing a $40k/year call center rep with $4k/year software suddenly makes a lot of economic sense. Voice agents work 24/7, don’t get tired, and often outperform humans in consistency and memory capacity.
What does the market look like in 2025?
To understand Voice AI better, it helps to break it down into four distinct layers: Core Voice Models, Voice Agent Infrastructure, Horizontal Voice solutions and Vertical SaaS.
1. Core Voice Models
These are the foundation models of voice - the TTS (Text-to-speech) equivalents of LLMs: voice generation, cloning (copying a specific voice), emotional synthesis (controlling tone and mood), ultra-low latency for real-time agents. A prime example is ElevenLabs, which turned high-quality, low-latency TTS into an infrastructure API standard, raised a $180M Series C at a $3.3B valuation, and reached roughly $200M+ in ARR this year.
2. Voice Agent Infrastructure
Built on top of the core models, this layer provides the tooling to deploy and orchestrate agents at scale: call routing, CRM integrations, automated responses, voice workflows. These are the orchestration engines that tie together ASR (Automatic Speech Recognition) + LLM + TTS. What makes this layer attractive is that companies can become platforms that others build on top of.
3. Horizontal Voice Solutions
These are out-of-the-box voice platforms designed to serve multiple industries. Clients can configure and deploy agents quickly, allowing these platforms to scale by serving many industries with one product. But their generalist nature means higher competitive pressure and thinner differentiation.
4. Vertical SaaS
These are voice-first applications tailored to specific industries. They tend to deliver more stable revenue and a stronger product–market fit. A good example is ServiceTitan: in 2025, the company is growing revenue by around 25%, with subscription revenue up 27%. At the product level, Contact Center Pro with AI Voice Agents gives customers like Bonney Plumbing a +11% increase in booking conversion and a 60% reduction in missed calls.
At the same time, the addressable market in each such vertical is narrower, so players must quickly dominate their niche or face consolidation.
Horizontal vs Vertical
Horizontal solutions aim to build a universal voice agent - and that’s where the problems start. On paper, one agent that can handle any business sounds great, in practice, real calls are highly specific. Generic platforms are trained on broad, mixed datasets and ship with template flows, so they have to be heavily customized and re-trained on each customer’s data: historical calls, tickets, forms, pricing and scheduling rules. That makes GTM slow and expensive. Horizontal tools work best where workflows are simple and similar across industries.
Vertical solutions start with one industry and go deep. They train on focused, industry-specific data and ship with ready-made workflows, integrations and compliance. That lets them reuse domain knowledge across many customers, which boosts reliability and reduces the need for one-off configuration. At ServiceTitan, for example, AI voice agents sit on top of more than a decade of job requests, pricing and scheduling data, so they can actually book the right job type, assign an available technician, apply memberships or discounts, and trigger follow-up workflows automatically.
A few key takeaways:
Voice AI is no longer niche. Around 22% of the latest YC batch are building voice agents in some form. Since 2020, over 90 voice-related startups have gone through YC.
The market is hyper-fragmented. In core voice models (like ElevenLabs, Cartesia, Resemble), a few players are already strong. In infrastructure/orchestration (Vapi, Retell, Synthflow, Bland.ai, etc.), there’s another dense layer. Above them, there are dozens of generic bots and horizontal platforms. Meanwhile, vertical SaaS is booming across real estate, restaurants, healthcare, and beyond.
For small generalist VC funds, a one-shot Voice AI bet is a bad strategy: they usually don’t have enough bets to spread the risk. In practice, this is a market that only really makes sense for large multi-stage funds that can place multiple bets across models, infra and vertical apps, and let a few breakout winners return the whole Voice AI exposure. For everyone else, the realistic options are tactical investments in clearly understood verticals or a conscious decision to stay out of the category.
Thanks for reading! You can follow us on LinkedIn for the latest. Comment here or DM to Sergey Dean or Armen Fljyan.
See you soon,
Orion VC team



Great article!! Definitely check out telnyx.com which is basically covering the voice models + infra + telephony + cloud storage.. all what you need to build a voice AI agent