Fullduplex.

An observatory for Speech-to-speech, Full-duplex, and Audio Foundation Models. We collect the papers, benchmarks, models, and datasets in one place — and we keep the reply channel open.

Much of this observatory is researched and updated automatically by AI. If you spot an error, a misattribution, or an outdated entry, please report it in our community Discord.

browse the directory read the blog

models tracked

benchmarks

datasets

// latest from the observatory

native FD17

open weights21

benchmarks42

datasets42

latest releases

Covo-Audio-Chat / Covo-Audio-Chat-FDmar 2026FD nativeopenTencent
Gemini 3.1 Flash Live Previewmar 2026FD nativeGoogle DeepMind
MiniCPM-o 4.5feb 2026FD nativeopenOpenBMB / ModelBest

fastest native STS

Four corners of the channel.

We keep four running catalogs. Think of them as newspaper sections — same paper, different desks.

A · essays & dispatches

Blog

42 posts

Weekly essays, field notes, and dispatches. Honest summaries for people too busy to read every paper.

latest issue

B · evaluations

Benchmarks

25 native · 42 total

Full-Duplex-Bench, VoiceBench, URO-Bench, Scale Voice Showdown, τ³-Bench. How the field measures voice — grouped by tier and setting.

see all

M · models & platforms

Models

17 native · 34 total

Moshi, gpt-realtime, Gemini Live, Sesame CSM, GLM-4-Voice, LiveKit Agents, Cartesia Sonic, Kyutai Unmute. Grouped by what it is and how it closes the loop.

see all

D · corpora

Datasets

14 frontier · 42 total

AMI, CANDOR, Emilia, VoxPopuli, MLS — plus a 2024-26 frontier: J-CHAT, J-Moshi, MultiDialog, InteractSpeech, MM-F2F, otoSpeech-FD. Grouped by role, interactivity, and license class.

see all

§02 · top benchmarks

Benchmarks that keep score.

The evaluations we trust for speech-to-speech, full-duplex, and audio-foundation work — sorted by most-recent update. Each row points to the maintainer's live leaderboard or repo.

full leaderboard

01
Full-Duplex-Bench v3NTU / UC Berkeley / UW / MIT
nativelive#full-duplex
first-token latency
2026-04
02
Big Bench AudioArtificial Analysis
nativelab#speech-lm
Step-Audio R1.1 Realtime97.0%
2026-03
03
Scale Voice ShowdownScale AI
nativearena#speech-lm
preference win-rate
2026-03
04
τ³-Bench (τ-Voice)Sierra Research
nativevertical#vertical-agent
pass^k
2026-02
05
VocalBenchSJTU (OmniAgent)
nativelab#speech-lm
ACC
2026-01

§03 · frontier datasets

Corpora at the frontier.

Freshly-minted full-duplex, dialog, and speech-LM corpora from 2024 – 26. The training set decides the ceiling — these are the ones actually pushing it.

full catalog

01
otoSpeech-full-duplex-280hotoearth
permissiverole · dialog-interactiveinter. · high
280 h
2026-02
02
otoSpeech-full-duplex-processed-141hotoearth
permissiverole · dialog-interactiveinter. · high
141 h
2026-02
03
InteractSpeechInteractSpeech authors (EMNLP 2025)
customrole · dialog-interactiveinter. · high
150 h
2025-11
04
OleSpeech-IVOlewave / Stanford / Microsoft / PingAn / KU / CMU
non-commercialrole · dialog-interactiveinter. · medium
100 h (open subset)
2025-09
05
MLC-SLM (Interspeech 2025)Nexdata / Interspeech 2025 challenge
gatedrole · dialog-interactiveinter. · high
1,604 h
2025-09

§04 · verticals · long-form

The labs behind the directory.

The Verticals series profiles one lab, company, or maintainer per essay. Every piece has a matching row on /models, /benchmarks, or /datasets — tap an entry to open the essay, tap the chip to jump straight to the directory.

see the full series

v01 · kyutai
Kyutai: the twelve-person Paris nonprofit turning open releases into shared vocabulary
Research velocity converted into reputational capital. A twelve-person Paris nonprofit ships weights every ten to twelve weeks, rewriting the vocabulary the open voice-AI field thinks in.
Moshi∎ 17 min
v03 · cartesia
Cartesia: why AWS put a non-transformer voice AI on its own shelf
The only voice-AI company commercially competing without a transformer. $191M cumulative, 62% blind-test preference, Sonic-3 on AWS SageMaker JumpStart — earned on a state-space model backbone by the people who wrote the SSM papers.
Cartesia Sonic 3∎ 18 min
v04 · hume
Hume AI: the smile inside a sentence, and the nine days that clarified voice AI’s exit shape
Hume bet that emotion lives inside prosody. In January 2026, Google DeepMind brought on the founder and left the company standing. A new exit shape for voice AI — not buyout, not wind-down, but a graduation ceremony.
Hume EVI 3∎ 18 min
v05 · elevenlabs
ElevenLabs: why a TTS company is priced at $11B
In February 2026, a two-person London startup closed a $500M Series D at $11B — a TTS company at the top of the voice-AI valuation stack. The structure behind that fact: founders, hypothesis, product, customers, counterargument.
ElevenLabs Agents∎ 18 min
v09 · meta
Meta FAIR Speech: six years, nine papers, and the field's default citations
Between June 2020 and October 2024, Meta FAIR Speech shipped nine audio foundation models — Wav2Vec2, HuBERT, dGSLM, MMS, Seamless, Spirit-LM. By 2026, open full-duplex research thinks in the vocabulary Meta left behind. The release cadence, the talent diaspora to Kyutai and Gradium, and why this lab's floor outlasts its own release calendar.
SeamlessM4T v2∎ 17 min

§05 · latest dispatches

Fresh off the press.

Our own field notes on speech-to-speech, full-duplex, and audio-foundation work — the primers, thresholds, architecture maps, data ceilings. Every post double-checked against the benchmarks and models cataloged on the observatory.

see the full issue

0526 aprFoundation before vertical[cover]∎ 14 min#foundation
v0122 aprKyutai: the twelve-person Paris nonprofit turning open releases into shared vocabulary[dispatch]∎ 17 min#verticals
v0322 aprCartesia: why AWS put a non-transformer voice AI on its own shelf[dispatch]∎ 18 min#verticals
v0422 aprHume AI: the smile inside a sentence, and the nine days that clarified voice AI’s exit shape[dispatch]∎ 18 min#verticals

§06 · community

You know a model we're missing? Tell us.

Fullduplex is, as the name implies, a two-way channel. New models, benchmarks, datasets — and corrections on our own — go through the community board.

join the channel submit an entry

Fullduplex.

Four corners of the channel.

Blog

Benchmarks

Models

Datasets

Benchmarks that keep score.

Corpora at the frontier.

The labs behind the directory.

Kyutai: the twelve-person Paris nonprofit turning open releases into shared vocabulary

Cartesia: why AWS put a non-transformer voice AI on its own shelf

Hume AI: the smile inside a sentence, and the nine days that clarified voice AI’s exit shape

ElevenLabs: why a TTS company is priced at $11B

Meta FAIR Speech: six years, nine papers, and the field's default citations

Fresh off the press.

You know a model we're missing? Tell us.

One email, every Wednesday.