Fullduplex.
An observatory for Speech-to-speech, Full-duplex, and Audio Foundation Models. We collect the papers, benchmarks, models, and datasets in one place — and we keep the reply channel open.
Much of this observatory is researched and updated automatically by AI. If you spot an error, a misattribution, or an outdated entry, please report it in our community Discord.
- Covo-Audio-Chat / Covo-Audio-Chat-FDTencent
- Gemini 3.1 Flash Live PreviewGoogle DeepMind
- MiniCPM-o 4.5OpenBMB / ModelBest
- Full-Duplex-Bench v3NTU / UC Berkeley / UW / MIT
- Artificial Analysis Speech ArenaArtificial Analysis
- Big Bench AudioArtificial Analysis
Four corners of the channel.
We keep four running catalogs. Think of them as newspaper sections — same paper, different desks.
Blog
Weekly essays, field notes, and dispatches. Honest summaries for people too busy to read every paper.
latest issueBenchmarks
Full-Duplex-Bench, VoiceBench, URO-Bench, Scale Voice Showdown, τ³-Bench. How the field measures voice — grouped by tier and setting.
see allModels
Moshi, gpt-realtime, Gemini Live, Sesame CSM, GLM-4-Voice, LiveKit Agents, Cartesia Sonic, Kyutai Unmute. Grouped by what it is and how it closes the loop.
see allDatasets
AMI, CANDOR, Emilia, VoxPopuli, MLS — plus a 2024-26 frontier: J-CHAT, J-Moshi, MultiDialog, InteractSpeech, MM-F2F, otoSpeech-FD. Grouped by role, interactivity, and license class.
see allBenchmarks that keep score.
The evaluations we trust for speech-to-speech, full-duplex, and audio-foundation work — sorted by most-recent update. Each row points to the maintainer's live leaderboard or repo.
- 01Full-Duplex-Bench v3NTU / UC Berkeley / UW / MITnativelive#full-duplexfirst-token latency2026-04
- 02Big Bench AudioArtificial Analysisnativelab#speech-lmStep-Audio R1.1 Realtime97.0%2026-03
- 03Scale Voice ShowdownScale AInativearena#speech-lmpreference win-rate2026-03
- 04τ³-Bench (τ-Voice)Sierra Researchnativevertical#vertical-agentpass^k2026-02
- 05VocalBenchSJTU (OmniAgent)nativelab#speech-lmACC2026-01
Corpora at the frontier.
Freshly-minted full-duplex, dialog, and speech-LM corpora from 2024 – 26. The training set decides the ceiling — these are the ones actually pushing it.
- 01otoSpeech-full-duplex-280hotoearthpermissiverole · dialog-interactiveinter. · high280 h2026-02
- 02otoSpeech-full-duplex-processed-141hotoearthpermissiverole · dialog-interactiveinter. · high141 h2026-02
- 03InteractSpeechInteractSpeech authors (EMNLP 2025)customrole · dialog-interactiveinter. · high150 h2025-11
- 04OleSpeech-IVOlewave / Stanford / Microsoft / PingAn / KU / CMUnon-commercialrole · dialog-interactiveinter. · medium100 h (open subset)2025-09
- 05MLC-SLM (Interspeech 2025)Nexdata / Interspeech 2025 challengegatedrole · dialog-interactiveinter. · high1,604 h2025-09
The labs behind the directory.
The Verticals series profiles one lab, company, or maintainer per essay. Every piece has a matching row on /models, /benchmarks, or /datasets — tap an entry to open the essay, tap the chip to jump straight to the directory.
- v01 · kyutai
Kyutai: the twelve-person Paris nonprofit turning open releases into shared vocabulary
Research velocity converted into reputational capital. A twelve-person Paris nonprofit ships weights every ten to twelve weeks, rewriting the vocabulary the open voice-AI field thinks in.
Moshi∎ 17 min - v03 · cartesia
Cartesia: why AWS put a non-transformer voice AI on its own shelf
The only voice-AI company commercially competing without a transformer. $191M cumulative, 62% blind-test preference, Sonic-3 on AWS SageMaker JumpStart — earned on a state-space model backbone by the people who wrote the SSM papers.
Cartesia Sonic 3∎ 18 min - v04 · hume
Hume AI: the smile inside a sentence, and the nine days that clarified voice AI’s exit shape
Hume bet that emotion lives inside prosody. In January 2026, Google DeepMind brought on the founder and left the company standing. A new exit shape for voice AI — not buyout, not wind-down, but a graduation ceremony.
Hume EVI 3∎ 18 min - v05 · elevenlabs
ElevenLabs: why a TTS company is priced at $11B
In February 2026, a two-person London startup closed a $500M Series D at $11B — a TTS company at the top of the voice-AI valuation stack. The structure behind that fact: founders, hypothesis, product, customers, counterargument.
ElevenLabs Agents∎ 18 min - v09 · meta
Meta FAIR Speech: six years, nine papers, and the field's default citations
Between June 2020 and October 2024, Meta FAIR Speech shipped nine audio foundation models — Wav2Vec2, HuBERT, dGSLM, MMS, Seamless, Spirit-LM. By 2026, open full-duplex research thinks in the vocabulary Meta left behind. The release cadence, the talent diaspora to Kyutai and Gradium, and why this lab's floor outlasts its own release calendar.
SeamlessM4T v2∎ 17 min
Fresh off the press.
Our own field notes on speech-to-speech, full-duplex, and audio-foundation work — the primers, thresholds, architecture maps, data ceilings. Every post double-checked against the benchmarks and models cataloged on the observatory.
- 0526 aprFoundation before vertical[cover]∎ 14 min#foundation
- v0122 aprKyutai: the twelve-person Paris nonprofit turning open releases into shared vocabulary[dispatch]∎ 17 min#verticals
- v0322 aprCartesia: why AWS put a non-transformer voice AI on its own shelf[dispatch]∎ 18 min#verticals
- v0422 aprHume AI: the smile inside a sentence, and the nine days that clarified voice AI’s exit shape[dispatch]∎ 18 min#verticals
You know a model we're missing? Tell us.
Fullduplex is, as the name implies, a two-way channel. New models, benchmarks, datasets — and corrections on our own — go through the community board.
One email, every Wednesday.
New models, updated benchmarks, papers and datasets worth your time — one digest a week. No tracking pixels, no drip sequences, no dark patterns.