Mistral AI shipped Mistral Medium 3.5 in public preview: a 128B dense model blending instruction following, reasoning, and coding. Mistral Vibe gains remote coding agents; Le Chat gains Work mode for long async jobs started from the CLI or chat. Devstral 2 exits as default in Vibe CLI and Le Chat; weights are open under a modified MIT license; local CLI sessions can move to the cloud with history and approvals intact. (Source: Mistral announcement)
The easy headline is “new flagship model.” The harder, more useful headline is different: coding help is less about sitting at a laptop for every tool call, and more about cloud sessions that keep running, leave traces you can inspect, and finish in outcomes such as a GitHub pull request you actually review. (Source: Mistral announcement)
Primary sources: model specs, benchmarks, product mechanics, plan tiers, and API pricing below track Mistral’s announcement unless labeled otherwise. (Sources: Mistral announcement)
What shipped
Mistral framed one coordinated launch across model, coding agent runtime, and chat harness.
- Mistral Medium 3.5 is a 128B dense merged model with a 256k context window, reasoning effort you can tune per request, and a vision encoder trained from scratch for variable image sizes and aspect ratios, released as open weights under a modified MIT license. (Source: Mistral announcement)
- On Mistral’s published numbers, the model posts 77.6% on SWE-Bench Verified and 91.4 on τ³-Telecom, with Devstral 2 and Qwen3.5 397B A17B named on the page as comparison points. (Source: Mistral announcement)
- Mistral Vibe adds remote coding agents that run in the cloud, can run in parallel, and notify you when they finish; you can start them from the Vibe CLI or Le Chat, and Vibe CLI now defaults to Medium 3.5 instead of Devstral 2. (Source: Mistral announcement)
- While a session runs, Mistral says you can follow file diffs, tool calls, progress states, and questions the agent surfaces. A local CLI session can be teleported upward with session history, task state, and approvals preserved. (Source: Mistral announcement)
- Le Chat Work mode (Preview) uses a new harness with Medium 3.5 as the execution backend, turns connectors on by default, shows tool calls and rationale, and asks for explicit approval before sensitive steps like sending a message, writing a document, or modifying data. (Source: Mistral announcement)
- Through the API, Medium 3.5 is listed at $1.5 per million input tokens and $7.5 per million output tokens; remote coding agents and Work mode are tied to Pro, Team, and Enterprise on the product side. (Source: Mistral announcement)
Why “remote” matters as much as “bigger”
If you have only used coding agents as a live console act, the shift is simple to state and hard to operate: Mistral Vibe is arguing that many sessions can run in parallel, so one person is no longer the throttle on every tool call. (Source: Mistral announcement) That is a workflow claim, not a benchmark brag.
For most software teams, the practical question stops being “did the model save typing?” and starts being “can we review the branch, trust the diffs, and merge without fire drills?” That lines up with how Mistral describes finishing work: the agent can open a GitHub pull request and notify you, pushing judgment to review instead of play-by-play supervision. (Source: Mistral announcement)
Why this matters:
- Parallel cloud runs only help if review and test capacity keeps up. Raising concurrency without raising review throughput mostly raises merge risk. (Inference: capacity coupling from Mistral’s parallel-session positioning.)
- The product story leans on visibility: diffs, tool calls, progress states, and surfaced questions are the audit trail that makes async acceptable. (Source: Mistral announcement)
This wave is not only “a bigger model.” It is a bet that long tasks belong in sandboxes with artifacts you can read later, not only in a fast chat reply loop. (Source: Mistral announcement)
Mistral Medium 3.5 in plain terms
Mistral Medium 3.5 is Mistral’s first merged flagship in public preview: one weight stack pitched for instruction following, reasoning, and coding together, rather than splitting those into separate “which model do I pick?” lanes at the base layer. (Source: Mistral announcement) At 128B parameters dense, with 256k context, Mistral positions it for long-horizon work where tools fire repeatedly and outputs need enough structure for automation to consume. (Source: Mistral announcement)
If you need a one-line spec: Medium 3.5 is the new default for Vibe and Le Chat, replaces Devstral 2 in Vibe CLI, ships as open weights under a modified MIT license, and advertises self-host feasibility on as few as four GPUs for teams weighing cloud concurrency against on-prem control. (Source: Mistral announcement)
Benchmarks on vendor pages are headline signals, not a substitute for runs on your repos, your tests, and your permissions model. Treat SWE-Bench Verified as a directional hint until your own harness proves the upgrade is worth migration cost. (Inference: standard bench caveat applied to vendor-reported scores.)
How Vibe remote sessions are supposed to work
Remote coding agents target work that survives you stepping away: the cloud session continues, then pings completion. (Source: Mistral announcement) Each session runs in an isolated sandbox that still allows broad edits and installs, the usual trade between isolation and fidelity to real builds.
When the task completes, Mistral’s story is review-centric: you get a pull request path rather than a mandate to watch every intermediate edit. (Source: Mistral announcement) Connectors named in the announcement include GitHub for code and PRs, Linear and Jira for issues, Sentry for incidents, and Slack or Teams for reporting, which tells you where humans are still expected to stay in the loop. (Source: Mistral announcement)
Teleporting is the handoff metaphor: a local CLI session can continue in the cloud with session history, task state, and approvals carried forward. (Source: Mistral announcement) If you think in terms of separating control from execution for long jobs, that lines up with how other vendors talk about harnesses versus sandboxes; see OpenAI’s Agents SDK harness versus sandbox split for the same pattern in different packaging.
Operator takeaway: if you pilot this, start on branches and services where bad merges are recoverable, and measure review latency before you raise parallel session count. (Inference: conservative rollout for organizations new to async agent throughput.)
Work mode in Le Chat: connectors, approvals, long jobs
Work mode is preview-stage tooling where Le Chat picks up an execution backend instead of behaving like a single-turn assistant. Medium 3.5 powers that backend, and Mistral emphasizes multi-step runs that can call tools in parallel until the task completes. (Source: Mistral announcement)
Compared with everyday chat, Work mode is built around connectors on by default, visible tool calls with rationale, and explicit approval paths for sensitive actions tied to your permissions. (Source: Mistral announcement) Example patterns Mistral lists include combining email, messages, and calendar in one pass; research across web and internal docs; inbox triage with draft replies; and creating Jira items from discussions while pushing summaries to Slack. (Source: Mistral announcement)
Longer runs with mail and calendar access raise real blast-radius questions. Approvals before send or write are not a footnote here; they are the difference between a helpful assistant and an automation incident. (Source: Mistral announcement) For a contrasting take on “managed” async agents and sessions APIs, see Claude Managed Agents runtime and sessions API.
Where Mistral’s launch sits in the broader agent push
Mistral is hardly alone in packaging long-horizon tool runs behind a polished client. The announcement also ties public access to Studio workflows and broader enterprise history, then opens starting coding tasks from the web to a wider audience. (Source: Mistral announcement) That should read as continuity: vendors are racing to make agents dependable enough that “automation with trace” replaces “chat novelty.”
For earlier AgenticWire coverage on graph-shaped workflows and MCP contracts, see Microsoft Agent Framework 1.0 ships graph workflows and MCP. The through line is the same: tools are commitments, not magic words.
What teams should do next
- Match parallelism to review bandwidth: if you turn on many cloud sessions, make sure PR review, test gates, and ownership scale with the extra branches. (Inference: execution of Mistral’s parallel-session value prop inside real engineering orgs.)
- Scope Work mode connectors deliberately: default-on access to mail and calendars is powerful; widen permissions only after you trust approvals and logging. (Source: Mistral announcement)
- Price the API for retries: at $1.5/M input and $7.5/M output, long tool loops can add up; model runs where failures retry quietly need budget guardrails. (Source: Mistral announcement)
- Validate self-host claims on your kit: “four GPUs” is a planning hint from Mistral’s page, not a guarantee for your tokenizer mix, batch sizes, or latency targets. (Source: Mistral announcement)
- Bench scores are marketing air cover until proven locally: run Medium 3.5 against your real failures before you commit roadmap to it. (Inference: migration discipline paired with vendor headline benchmarks.)
Related coverage
- OpenAI's Agents SDK update: harness vs sandbox for long runs - Why splitting harness control from sandbox execution keeps showing up as vendors pitch asynchronous agents.
- Claude Managed Agents: managed runtime and sessions API - How Anthropic frames managed sessions, useful contrast for Mistral’s cloud session story.
- Microsoft Agent Framework 1.0 ships graph workflows and MCP - Graph workflows and MCP contracts as backdrop for tool-heavy assistants.
References
- Mistral announcement - https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5



