browser-use vs Stagehand: Which Browser Agent to Pick

browser-use is a Python library that gives an LLM full autonomy over a browser: hand it a goal, and its agent loop plans and executes every click, scroll, and form fill itself. Stagehand is a TypeScript framework built as a deterministic-first layer on top of Playwright and Chrome DevTools Protocol (CDP): you write normal Playwright code and drop in act(), extract(), or observe() only where you need an LLM to handle the unpredictable part of a page. The short version: pick browser-use for open-ended research and prototyping tasks where you would rather describe a goal than script it, and pick Stagehand when you are shipping a browser workflow into production and need most of it to run the same way every time.

Both projects sit on top of the same problem: LLMs are good at deciding what to click and bad at reliably finding the same element twice. browser-use solves that by asking the model at every step. Stagehand solves it by asking once, caching the result, and replaying it deterministically after that. Language, autonomy level, and cost profile all follow from that one design fork.

Key takeaways

browser-use is Python, MIT licensed, and runs a continuous agent loop: the LLM re-reasons about the page on every step (Source: browser-use GitHub).
Stagehand is TypeScript, MIT licensed, and wraps Playwright/CDP with three targeted AI primitives: act(), extract(), and observe(), plus a higher-level agent() for multi-step tasks (Source: browserbase/stagehand GitHub).
browser-use self-reports an 89.1% success rate on WebVoyager (586 tasks, GPT-4o) in a December 2024 technical report; the figure is real but dated and self-measured, not an independent 2026 benchmark (Source: browser-use SOTA technical report).
Stagehand caches successful actions so repeat runs skip the LLM call entirely, which is the main reason its production cost per task is typically lower than a pure agent loop (Source: Scrapfly comparison).
Decision rule: reach for browser-use when the task genuinely varies every run; reach for Stagehand when 80% of the flow is stable and only a few steps need an LLM to adapt.

What browser-use and Stagehand actually are

browser-use is an open-source Python library, installed with pip install browser-use, that turns a plain-language goal into browser actions. Give it something like "find the cheapest flight from New York to London next Friday," and its agent loop reads the page, decides what to click or type, executes it through Playwright under the hood, and repeats until the goal is met or it gives up. It supports OpenAI, Anthropic, Gemini, and local models through Ollama, and is the most-starred project in this space at over 100,000 GitHub stars as of this year (Source: browser-use GitHub).

Stagehand, installed as @browserbasehq/stagehand from npm, takes the opposite starting point. It assumes you already know most of the workflow and write it as ordinary Playwright code: navigate here, click that selector, wait for this element. Where the page is unpredictable, dynamic markup, a redesign, an A/B-tested layout, you drop in an AI primitive instead of a brittle selector. act() performs one instruction ("click the newsletter signup button"), extract() pulls structured data into a schema you define, and observe() inspects the page and returns candidate actions without executing them, useful for previewing what an act() call would do first. Stagehand also exposes agent() for fuller autonomy over a bounded sub-task (Source: browserbase/stagehand GitHub).

Both terms worth defining while we're here: CDP (Chrome DevTools Protocol) is the low-level interface browsers expose for automation tooling to inspect and control a page; Playwright is Microsoft's cross-browser automation library built on top of protocols like CDP; and WebVoyager is an academic benchmark of 586 real-world web tasks used to score how well an agent completes open-ended browsing goals.

The core difference: full autonomy vs deterministic-first

browser-use's model is one continuous reasoning loop. Every step, the agent re-reads the page state and asks the LLM what to do next. Developer Steven Gonsalvez, comparing the field's major browser-agent frameworks, called it "the 800-pound gorilla of this space": general-purpose, needing almost no upfront scripting, and willing to attempt tasks nobody wrote a workflow for (Source: dev.to framework-wars roundup).

Stagehand inverts that. You write the deterministic skeleton and only invoke AI at the specific steps where a hardcoded selector would break. That difference shows up directly in the autocomplete data: searchers type browser use vs agent browser almost as often as they type the head-to-head query, because the underlying question people actually have is "do I want an agent that acts on its own, or a script I can drop AI into." A team automating one well-understood login-and-download flow rarely needs an agent replanning from scratch on every run; a team building a research assistant that has to handle arbitrary sites usually does.

browser-use vs Stagehand: at a glance

	browser-use	Stagehand
Language	Python	TypeScript (Python and Go SDKs available)
Underlying engine	Playwright	Playwright/CDP (CDP-native since v3)
Core primitives	Agent.run() (goal in, agent plans and executes)	act(), extract(), observe(), agent()
Automation model	Full autonomy, continuous LLM reasoning per step	Deterministic-first, AI invoked selectively
WebVoyager score	89.1% (self-reported, Dec 2024, GPT-4o)	Not benchmarked on WebVoyager
Repeat-run cost	LLM call on every step, every run	Caches successful actions; near-zero cost on replay
Production story	Agent orchestration built in; needs separate infra for scale	Pairs with Browserbase for managed cloud browsers; needs separate CAPTCHA/2FA handling
License	MIT	MIT
Latest version (verified this run)	0.13.3 (PyPI)	3.6.0 (npm, packages/core)

Version and dependency figures were checked directly against PyPI and the browserbase/stagehand GitHub repo during this run; the WebVoyager score is browser-use's own December 2024 report (Source: browser-use GitHub; Source: browser-use SOTA technical report).

Stagehand v3's CDP rewrite: what actually changed

Stagehand's third major version replaced its Playwright-only execution path with a CDP-native engine, talking to the browser more directly instead of routing every interaction through Playwright's abstraction layer. Independent testing reported roughly a 44% improvement on complex DOM interactions, specifically shadow DOM and iframe cases that are notoriously fragile for selector-based automation (Source: Scrapfly comparison).

That does not mean Playwright disappeared from the stack. Stagehand's published packages/core/package.json still lists playwright-core as a peer dependency (pinned to ^1.55.1), alongside optional patchright-core and puppeteer-core peers, so teams can plug in whichever underlying driver fits their setup (Source: browserbase/stagehand GitHub). The honest framing is that v3 moved its own execution engine closer to the metal while keeping Playwright as a compatible, swappable layer, not that it dropped Playwright entirely, a distinction most comparison posts gloss over.

Benchmarks and cost in practice

The 89.1% WebVoyager figure attached to browser-use gets repeated constantly, but it is worth reading where it actually comes from. browser-use founder Gregor Zunic published it in a technical report dated December 15, 2024, evaluated with GPT-4o across 586 tasks, and disclosed real methodology caveats in his own write-up: the team manually re-judged tasks the automated WebVoyager evaluator marked "unknown" or "failed," removed 55 tasks, and adjusted stale dates in the dataset, noting plainly that "the default WebVoyager evaluator is not good" (Source: browser-use SOTA technical report). It is a legitimate, disclosed result, not a fabrication, but it is a vendor self-report from over a year before this article, and browser-use's own current documentation has since moved to a different internal benchmark ("BU Bench") rather than re-citing the WebVoyager number. Stagehand has not published a competing WebVoyager score; its performance claims are framed around latency and DOM-interaction reliability instead, which fits its role as a targeted tool rather than a fully autonomous agent (Source: Scrapfly comparison).

Operator note (first-hand): running pip install browser-use today resolves to version 0.13.3, and Stagehand's packages/core/package.json on the browserbase/stagehand GitHub repo (fetched directly, not the deprecated npm listing) shows core version 3.6.0 with playwright-core: ^1.55.1 as a required peer dependency. If your install pulls a materially older Stagehand core, check whether your lockfile is still resolving a pre-v3 release before assuming a bug in your automation code.

Cost follows directly from the architecture. browser-use pays for an LLM call on effectively every step, commonly cited in the $0.02 to $0.30 per task range depending on model and task length (Source: dev.to framework-wars roundup). Stagehand's caching lets an already-successful workflow replay at near-zero inference cost until the page changes enough to break the cached mapping, which is why high-volume repeat workflows tend to land on Stagehand even when browser-use handled the initial exploration.

Stagehand vs Playwright: what's the difference

Plain Playwright automates a browser with hardcoded selectors and no AI in the loop: fast and fully deterministic, but it breaks the moment a site's markup changes. Stagehand is not a replacement for Playwright, it is an extension of it: its own repository lists playwright-core as a required peer dependency rather than a competing library (Source: browserbase/stagehand GitHub). You write Playwright-style code for the stable part of a workflow, then call act(), extract(), or observe() only where an LLM reading the page in natural language is more resilient than a brittle CSS selector. That is why stagehand vs playwright shows up as its own real search pattern: people are not choosing between the two so much as deciding how much AI to layer onto Playwright they already use.

Which should you pick

Pick browser-use if you are prototyping fast, automating one-off or highly varied tasks, or building something closer to a research agent than a repeatable pipeline. Its full-autonomy model means less upfront engineering per task, at the cost of paying for LLM inference on every run and accepting more run-to-run variance.

Pick Stagehand if you are shipping a browser workflow that runs the same way most of the time and needs to survive in production: scheduled scraping, repeated form submissions, monitoring flows. Its deterministic-first design, TypeScript ecosystem, and action caching are built for exactly that repeatability, though you will want Browserbase or your own infrastructure for managed cloud browsers, and separate tooling for CAPTCHA and two-factor flows that neither project handles natively (Source: Skyvern comparison).

Nothing stops a team from using both: browser-use to explore what a new site's flow looks like, then hand-coding the stable version in Stagehand once you know the steps.

FAQ

What is Stagehand and browser use?

Stagehand and browser-use are both open-source frameworks for controlling a web browser with an LLM. Stagehand is a TypeScript, Playwright/CDP-based tool that adds targeted AI actions to otherwise deterministic automation code. browser-use is a Python library that runs a fully autonomous agent loop from a single natural-language goal.

Is browser-use open source?

Yes. browser-use is released under the MIT license and its source is on GitHub at browser-use/browser-use, with more than 100,000 stars as of this year. It supports OpenAI, Anthropic, Gemini, and local models via Ollama, and installs with pip install browser-use.

What is Browserbase Stagehand?

Stagehand is an open-source browser automation framework maintained by Browserbase, a company that also sells managed cloud browser infrastructure. Stagehand itself is free and MIT licensed; Browserbase's paid platform is one option for running it at scale in the cloud.

Stagehand vs Playwright: what's the difference?

Playwright is a general-purpose browser automation library with no AI built in; every action needs a hardcoded selector. Stagehand is built as a layer on Playwright and CDP that adds act(), extract(), and observe() so an LLM can handle the parts of a page that are too dynamic for a fixed selector to survive.

What is Stagehand computer use?

"Computer use" and "Stagehand" are sometimes conflated in search because both describe letting an AI model control a screen or browser directly. Stagehand is specifically a browser-automation framework, not a general computer-use agent; it operates through Playwright/CDP against a browser context rather than controlling an entire desktop.

References

browser-use GitHub - https://github.com/browser-use/browser-use
browser-use SOTA technical report - https://browser-use.com/posts/sota-technical-report
browserbase/stagehand GitHub - https://github.com/browserbase/stagehand
Scrapfly: Stagehand vs Browser Use - https://scrapfly.io/blog/posts/stagehand-vs-browser-use
Skyvern: Browser Use vs Stagehand - https://www.skyvern.com/blog/browser-use-vs-stagehand-which-is-better/

browser-use vs Stagehand: Which Browser Agent to Pick

browser-use vs Stagehand: Which Browser Agent to Pick

Key takeaways

What browser-use and Stagehand actually are

The core difference: full autonomy vs deterministic-first

browser-use vs Stagehand: at a glance

Stagehand v3's CDP rewrite: what actually changed

Benchmarks and cost in practice

Stagehand vs Playwright: what's the difference

Which should you pick