Best LLM Observability Tools 2026 (GDPR & EU Hosting)
The best GDPR-compliant LLM observability tools in 2026 — compared on EU data residency, operating-entity jurisdiction, self-hosting, and free tiers.
LLM observability went from nice-to-have to table stakes fast. If you run anything with a model in the loop — a support copilot, a RAG pipeline, an agent — you need traces: which prompt went in, which completion came out, what it cost, where it failed.
Here’s the part most comparison articles skip: LLM traces are the most PII-dense telemetry your company emits. An uptime check stores a URL and a response time. An LLM trace stores the full text of what your users typed — names, medical questions, contract clauses — plus the model’s response. Your tracing backend is effectively a second copy of your users’ most sensitive data.
That makes the jurisdiction question unavoidable. After Schrems II (CJEU C-311/18) and with the US CLOUD Act giving American authorities access to data held by US companies regardless of where servers sit, “EU region available” is not the same as EU data sovereignty. And the EU AI Act layers a second obligation on top of GDPR: Article 12 requires automatic event logging for high-risk AI systems, with the Annex III obligations enforceable from August 2, 2026. Tracing is becoming mandatory — which means where those traces live is becoming a legal question.
Here are the LLM observability tools worth considering in 2026 if you take that question seriously. All pricing and jurisdiction facts are as of June 2026.
What Makes an LLM Observability Tool GDPR-Clean?
Same four checks we apply to monitoring tools, with higher stakes:
- EU data residency — traces stored on EU servers, not just “EU region on the roadmap”
- EU-incorporated company — not subject to the CLOUD Act or similar non-EU disclosure laws
- Instant DPA — a Data Processing Agreement without a sales call (GDPR Article 28 requires one; your tracing vendor is a processor of personal data)
- Transparent sub-processors — you need to know who else touches prompt data
There’s a fifth dimension unique to this category: self-hosting maturity. LLM observability is unusually open-source-friendly — several leading tools run entirely on your own infrastructure, which eliminates the vendor jurisdiction question outright. We flag that for every tool below.
1. Langfuse — Best Overall for EU Teams
Langfuse is the clear EU flagship in this category. The operating company is Langfuse GmbH, incorporated in Berlin, Germany (Charlottenburg register, HRB 248821B) — making it, as far as we can verify, the only major LLM observability platform run by an EU legal entity. It covers tracing, evals, prompt management, and datasets, with OpenTelemetry-based instrumentation and integrations for the OpenAI SDK, LangChain, LiteLLM, and more.
What stands out:
- EU-incorporated operator and a dedicated EU cloud data region — the only tool on this list with both
- MIT-licensed open-source core: self-host the full platform without licensing fees, on-prem or in a VPC
- DPA included on all plan tiers, including the free one
- SOC 2 Type II and ISO 27001 certified
Pricing: Free Hobby tier (50k units/month — notably generous), Core from $29/month, Pro $199/month
Hosting: EU region available (plus US, Japan); self-hosting anywhere
Jurisdiction: Germany (Langfuse GmbH, Berlin)
Best for: Any EU team that wants hosted LLM observability without a CLOUD Act conversation. If you only shortlist one tool from this article, it’s this one.
2. Arize Phoenix — Best Pure Self-Hosted Option
Phoenix is the open-source (ELv2) observability and evaluation tool from Arize AI, built on OpenTelemetry and OpenInference. It runs anywhere — notebook, Docker, Kubernetes — and supports practically every major framework (OpenAI Agents SDK, LangGraph, LlamaIndex, CrewAI, DSPy) and provider.
What stands out:
- Self-hosting is the primary deployment model, not an afterthought — traces never leave your infrastructure
- Native OpenTelemetry, no proprietary lock-in
- Strong evaluation tooling alongside tracing
Pricing: Free (self-hosted OSS); the commercial Arize AX cloud starts free, Pro at $50/month
Hosting: Wherever you deploy it (self-hosted); AX cloud is US-operated
Jurisdiction: Arize AI is US-incorporated — irrelevant if you self-host Phoenix, decisive if you use AX cloud
Best for: Teams with ops capacity that want zero third-party exposure for prompt data. Self-hosted Phoenix on EU infrastructure is GDPR-clean by construction.
3. LangSmith — Best for LangChain-Native Stacks, with a Jurisdiction Asterisk
LangSmith is LangChain Inc.’s observability and evals platform — the path of least resistance if your stack is built on LangChain/LangGraph. To its credit, LangChain offers EU data residency on all plan tiers at no extra cost (eu.smith.langchain.com).
The asterisk: LangChain Inc. is a US company. EU-stored traces under a US operator remain reachable under the CLOUD Act — the EU region solves data-residency checkboxes, not legal jurisdiction. The DPA is available on request via support rather than instant download.
Pricing: Free Developer tier (5k base traces/month, 1 seat), Plus at $39/seat/month plus usage ($2.50 per 1,000 additional base traces)
Hosting: US or EU (your choice, all tiers)
Jurisdiction: US (LangChain Inc.)
Best for: LangChain-heavy teams whose compliance bar is “EU data residency” rather than “EU legal jurisdiction.” If the latter matters, pair LangGraph with Langfuse or self-hosted Phoenix instead — both instrument it well.
4. Lunary — EU Hosting, US Entity: The Textbook Example
Lunary is a lightweight open-source (Apache 2.0) observability platform focused on chatbots and RAG, with tracing, cost tracking, and user analytics. Its cloud data is hosted in Europe with SOC 2 Type II and ISO 27001 certifications — which is why it appears on many “GDPR-compliant” lists.
But check the legal pages: the operator is Lunary LLC, a Delaware company with a San Francisco address, and its DPA is governed by Delaware law. This is the exact hosting-vs-jurisdiction split this article keeps warning about — EU servers, US entity, CLOUD Act reach intact. The free self-hostable Community Edition sidesteps the issue.
Pricing: Free tier (10k events/month, 3 projects), Team at $20/user/month; free self-hosted Community Edition
Hosting: EU (cloud); anywhere (self-hosted)
Jurisdiction: US (Lunary LLC, Delaware)
Best for: Chatbot/RAG products that want a lean tool — self-host it for the clean version, or accept the US-entity trade-off knowingly.
5. Opik (Comet) — Best Open-Source Eval Depth
Opik is Comet’s open-source (Apache 2.0) LLM evaluation and observability platform — tracing, automated evals, and production dashboards, with the full feature set available in the self-hosted version rather than gated behind the cloud product.
What stands out:
- True open source: core observability and eval features are all in the OSS release
- Strong automated-evaluation tooling for RAG and agentic workflows
- Generous free cloud tier, no credit card
Pricing: Free (self-hosted, full features); free cloud tier, paid cloud plans above that
Hosting: Anywhere (self-hosted); cloud operated from the US
Jurisdiction: US (Comet, New York)
Best for: Teams that want serious eval infrastructure and are willing to self-host on EU machines to keep it GDPR-clean.
6. Portkey — Gateway-First, US-Operated
Portkey is an AI gateway with observability attached: one API in front of 1,600+ models, with routing, caching, governance, and request logging. The gateway is open source; the observability product is usage-priced on recorded logs. GDPR, SOC 2 Type 2, ISO 27001, and HIPAA certifications are available — at the Enterprise tier.
The structural caveat for EU teams: a gateway sits in the request path, so every prompt and completion transits the vendor by design. With a San Francisco-headquartered operator, that’s the maximal version of the jurisdiction problem unless you self-host the gateway.
Pricing: Usage-based on recorded logs; free tier available, compliance certifications gated to Enterprise
Hosting: US-operated cloud; open-source gateway self-hostable
Jurisdiction: US (Portkey, San Francisco)
Best for: Teams that primarily need multi-model routing and accept (or self-host around) the US jurisdiction.
7. Helicone — Proceed with Caution (Acquired, Maintenance Mode)
Helicone was one of the most popular proxy-based LLM observability tools — one line of code, cost tracking, caching. In March 2026 it was acquired by Mintlify and moved to maintenance mode: security patches and new model support continue, active feature development does not, and Mintlify is helping customers migrate elsewhere.
It was already a tough sell for EU teams — US company, prompts and responses transiting US servers on the cloud plan — and a vendor in managed wind-down settles the question. The open-source code still exists, but we wouldn’t start a new build on it in 2026.
Pricing: Free tier (10k requests/month), Pro $79/month, Team $799/month
Hosting: US (cloud); self-hostable
Jurisdiction: US (now part of Mintlify)
Best for: Existing Helicone users planning their migration — Langfuse is the most common destination for EU teams.
8. OpenLLMetry — The Vendor-Neutral Escape Hatch
OpenLLMetry isn’t a platform — it’s an Apache 2.0 instrumentation layer built on OpenTelemetry by Traceloop (a Tel Aviv company acquired by ServiceNow in March 2026). It auto-instruments LLM providers, vector DBs, and frameworks, then exports standard OTel traces to any of 25+ backends — including ones you run yourself.
Why it matters here: instrument once with an open standard, and your jurisdiction decision becomes which backend receives the traces — reversible without re-instrumenting. Point it at self-hosted Langfuse or Phoenix on EU infrastructure and the vendor question disappears. The Traceloop cloud itself now sits under a US parent, so treat it like the other US-operated clouds.
Pricing: Free (Apache 2.0); Traceloop cloud has a free tier (50k spans/month)
Jurisdiction: Instrumentation: none (open standard). Traceloop cloud: US parent (ServiceNow)
Best for: Teams that want to avoid lock-in and keep the backend decision reversible.
Comparison Table
| Tool | Hosting | Jurisdiction (operating entity) | CLOUD Act reach | Self-host | Free Tier | Starts At |
|---|---|---|---|---|---|---|
| Langfuse | 🇪🇺 EU region (or US/JP) | 🇩🇪 Germany (Langfuse GmbH) | None | ✅ MIT | ✅ 50k units/mo | $29/mo |
| Arize Phoenix | Self-hosted (AX cloud: 🇺🇸 US) | 🇺🇸 US (Arize AI) | None if self-hosted | ✅ ELv2 | ✅ OSS free | Free / $50/mo cloud |
| LangSmith | 🇺🇸 US or 🇪🇺 EU region | 🇺🇸 US (LangChain Inc.) | Yes, even on EU region | ❌ (Enterprise only) | ✅ 5k traces/mo | $39/seat/mo |
| Lunary | 🇪🇺 EU (cloud) | 🇺🇸 US (Lunary LLC, Delaware) | Yes, despite EU hosting | ✅ Apache 2.0 | ✅ 10k events/mo | $20/user/mo |
| Opik (Comet) | 🇺🇸 US (cloud) | 🇺🇸 US (Comet, New York) | None if self-hosted | ✅ Apache 2.0 | ✅ | Free |
| Portkey | 🇺🇸 US (cloud) | 🇺🇸 US (San Francisco) | Yes (cloud) | ✅ Gateway OSS | ✅ | Usage-based |
| Helicone | 🇺🇸 US (cloud) | 🇺🇸 US (Mintlify) — maintenance mode | Yes (cloud) | ✅ | ✅ 10k req/mo | $79/mo |
| OpenLLMetry | Your backend | Open standard (Traceloop → ServiceNow 🇺🇸) | Depends on backend | ✅ Apache 2.0 | ✅ | Free |
Reading the table: the Jurisdiction column is the one EU buyers skip and shouldn’t. LangSmith and Lunary both store data in the EU, yet both operate under US law — the CLOUD Act follows the company, not the servers. Only one row has EU in both columns. Every other GDPR-clean option is spelled the same way: self-host it.
How to Choose
Want hosted observability with zero CLOUD Act exposure? → Langfuse Cloud, EU region. It’s the only option, and fortunately also one of the best products.
Have ops capacity and want maximum control? → Self-host Langfuse, Phoenix, or Opik on EU infrastructure (Hetzner, Netcup, OVH). Prompts never leave your perimeter.
Deep in the LangChain ecosystem? → LangSmith with the EU region if data residency suffices; Langfuse if legal jurisdiction is the bar.
Building toward EU AI Act Article 12 compliance? → Prioritize configurable retention (six months minimum under Articles 19/26) and exportable, automatic logs. Langfuse and self-hosted Phoenix both fit.
Currently on Helicone? → Plan the migration now, while Mintlify is still offering migration support.
Your LLM Stack Also Needs Plain Old Uptime Monitoring
One blind spot we keep seeing in AI teams: world-class tracing, zero infrastructure monitoring. LLM observability tells you what the model did — it doesn’t tell you that your inference API has been returning 502s for twenty minutes, or that last night’s embedding pipeline never ran.
AI products fail at the infrastructure layer constantly, and batch workloads fail silently: a nightly fine-tune that crashes, an embedding refresh that hangs, a RAG index rebuild that quietly stops. The fix for that class of failure is heartbeat monitoring — your job pings a unique URL when it completes, and if the ping doesn’t arrive, you’re alerted within seconds. One curl command at the end of the pipeline, no agent — a natural fit for batch inference and embedding jobs.
The same jurisdiction checklist applies, because monitor URLs and incident history map your infrastructure. FoundersDeck is our answer there: uptime monitoring, heartbeat/cron checks, and cookie-free status pages on 100% German infrastructure (Netcup, Nuremberg), operated by a German company — free tier with 5 monitors, paid from €9/month. To be clear, FoundersDeck does not do LLM tracing — pair it with Langfuse or a self-hosted stack and you cover both layers under EU jurisdiction. For the full monitoring comparison, see our guide to the best GDPR-compliant monitoring tools in 2026.
Frequently Asked Questions
Do LLM traces contain personal data under GDPR?
Almost always, yes. LLM traces capture full prompts and completions, and in production those contain whatever your users typed — names, email addresses, health details, contract text. Unlike infrastructure metrics, LLM telemetry is raw conversational content. That makes your observability vendor a processor of personal data under GDPR Article 28: you need a Data Processing Agreement, a documented sub-processor chain, and a lawful transfer mechanism if traces leave the EU. Treat your tracing backend with the same scrutiny as your production database — in practice it stores a copy of your users’ most sensitive inputs.
Is Langfuse GDPR compliant?
Langfuse is the strongest GDPR position among hosted LLM observability tools. The operating company is Langfuse GmbH, incorporated in Berlin, Germany (HRB 248821B), so it falls under EU jurisdiction exclusively — no CLOUD Act exposure. Langfuse Cloud offers a dedicated EU data region, a Data Processing Agreement is included on all plan tiers, and the core platform is MIT-licensed open source, so you can self-host it entirely if even a German cloud is too much third-party exposure. As of June 2026 it is the only major LLM observability platform that combines an EU-incorporated operator with an EU hosting region.
Can I self-host LLM observability?
Yes, and the options are unusually good. Langfuse (MIT), Arize Phoenix (ELv2), Lunary Community Edition (Apache 2.0), and Opik (Apache 2.0) can all be self-hosted for free via Docker or Kubernetes. OpenLLMetry, the OpenTelemetry-based instrumentation standard, ships traces to any OTel-compatible backend you control. Self-hosting removes the vendor jurisdiction question entirely — prompts and completions never leave your infrastructure. The trade-off is operational: you run the database and handle retention and deletion requests yourself.
Does the EU AI Act require LLM logging?
For high-risk AI systems, yes. Article 12 of the EU AI Act requires automatic recording of events (logs) over the lifetime of the system — manual documentation does not satisfy it. Articles 19 and 26 set a minimum log retention of six months. The Annex III high-risk obligations become enforceable on August 2, 2026, with penalties up to €15 million or 3% of worldwide annual turnover. Even teams outside the high-risk classification are adopting tracing as standard practice, because the same logs serve debugging, cost control, and incident forensics. The practical consequence: LLM tracing is shifting from optional tooling to a compliance requirement, which makes your tracing vendor’s jurisdiction a legal question, not just a procurement one.
Which LLM observability tools are EU-incorporated?
As of June 2026, Langfuse (Langfuse GmbH, Berlin, Germany) is the only major LLM observability platform operated by an EU-incorporated company. Every other significant player is US-incorporated: LangChain Inc. (LangSmith), Arize AI (Phoenix), Comet (Opik), Portkey, Lunary LLC (Delaware), and Helicone (now part of Mintlify). Some offer EU hosting regions — LangSmith and Lunary both store data in the EU — but EU hosting under a US operator does not remove CLOUD Act reach. If EU legal jurisdiction is a hard requirement, the realistic shortlist is Langfuse Cloud (EU region) or self-hosting an open-source tool on EU infrastructure.
Is an EU hosting region enough for GDPR compliance?
It helps, but it does not close the jurisdiction gap. EU hosting means the servers are physically in the EU; it does not change which government can compel the operator to hand data over. Under the US CLOUD Act, a US-incorporated company must comply with US disclosure orders for data it controls anywhere in the world — including a Frankfurt datacenter. This is the post-Schrems II reality. For LLM traces, which contain raw user conversations, the cleanest positions are: an EU-incorporated vendor with EU hosting (Langfuse), or self-hosting an open-source tool so no third-party operator exists at all.
Engin Yildirim
Founder of FoundersDeck. 13+ years in software engineering. Building EU-first tools for founders.
Read more about me →