Thinking Machines wants to build an AI that actually listens while it talks

Thinking Machines Lab, the AI startup launched last year by former OpenAI chief technology officer Mira Murati, has unveiled what it's calling "interaction models" — essentially, AI systems designed to interrupt and respond in real time rather than waiting for users to finish speaking. Traditional AI assistants operate on a strict turn-taking basis: the user speaks, the model listens, then the model responds while the user waits. Thinking Machines is attempting to build something fundamentally different — a model that processes what you're saying and simultaneously generates a response, mimicking the flow of an actual phone conversation rather than a back-and-forth text exchange.

The technical framework behind this approach is known as "full duplex" communication, and Thinking Machines claims its model, called TML-Interaction-Small, achieves response times of 0.40 seconds. According to the company, this matches the pace of natural human conversation and substantially outperforms comparable systems from OpenAI and Google in speed tests. The benchmark figures suggest the technology could enable genuinely interactive AI experiences, where users can interrupt, correct, or redirect the model mid-response without waiting for complete sentences to finish processing.

However, the company is careful to set expectations appropriately. The announcement describes TML-Interaction-Small as a research preview rather than a finished product, and Thinking Machines has no immediate plans for public release. A limited research preview is scheduled for the coming months, with a broader rollout targeted for later this year. The technology remains untested in real-world applications, so while the performance metrics look promising on paper, whether the actual user experience will match those claims remains to be seen once developers and researchers gain access.