This page contains press release content distributed by XPR Media. Members of the editorial and news staff of the USA TODAY Network were not involved in the creation of this content.

Quesma Releases OTelBench: Independent Benchmark Reveals Frontier LLMs Struggle with Real-World SRE Tasks

New benchmark shows top LLMs achieve only 29% pass rate on OpenTelemetry instrumentation, exposing the gap between coding ability and real-world SRE work.

OTelBench shows that while LLMs are impressive at generating code snippets, they’re not yet capable of the cross-cutting reasoning required for production engineering.”

— Jacek Migdał, founder of Quesma

WARSAW, POLAND, January 20, 2026 /EINPresswire.com/ — Quesma, Inc. announced the release of OTelBench, the first comprehensive benchmark for evaluating LLMs on OpenTelemetry instrumentation tasks. The open-source dataset tests 14 state-of-the-art models across 23 real-world tasks in 11 programming languages, revealing significant gaps in AI’s ability to handle production-grade Site Reliability Engineering (SRE) work.

While frontier LLMs have demonstrated impressive coding capabilities, the benchmark reveals a stark reality: the best-performing model, Claude Opus 4.5, achieved only a 29% pass rate on OpenTelemetry instrumentation tasks, compared to 80.9% pass rate in the SWE-Bench. This gap highlights a critical distinction between writing code and performing the complex, cross-cutting engineering work required for production systems.

The $1.4 Million Per Hour Problem
Enterprise outages cost an average of $1.4 million per hour, making production visibility mission-critical. Distributed tracing, the gold standard for debugging complex microservices, allows teams to link user actions to every underlying service call. However, implementing this visibility remains difficult, with 39% of organizations citing complexity as their top observability obstacle. OpenTelemetry has emerged as the industry standard with backing from 1,100+ organizations, yet configuring it correctly remains a major source of toil for SRE teams.

Fundamental Limitations Exposed
The benchmark tested models on agentic coding tasks where they were given source code from realistic applications, an interactive Linux terminal, and clear instrumentation objectives. The results revealed several critical failure modes:

Context propagation, passing trace context between services to maintain parent-child span relationships, proved to be an insurmountable barrier for most models. This is particularly concerning because context propagation is fundamental to distributed tracing.

“The backbone of the software industry consists of complex, high-scale production systems with mission-critical reliability, and seasoned engineers are architecting, evolving, and troubleshooting them,” said Jacek Migdał, founder of Quesma. “OTelBench shows that while LLMs are impressive at generating code snippets, they’re not yet capable of the cross-cutting reasoning and sustained problem-solving required for production engineering. This gap matters because many vendors are marketing AI SRE solutions with bold claims but no independent verification. We need benchmarks like this to separate reality from hype.”

Language Ecosystems Matter
Success rates varied dramatically across programming languages, revealing that AI generalization is far weaker than human engineers. Models had some moderate success with Go and, quite surprisingly, C++. A few tasks were completed for JavaScript, PHP, .NET, and Python. Just a single model solved a single task in Rust. None of the models solved a single task in Swift, Ruby, or (to our biggest surprise, due to a build issue) – Java.

Why This Matters for AI Development
OTelBench reveals several reasons why OpenTelemetry instrumentation challenges current LLMs:
– Reliability-critical applications reside in private repositories at companies like Apple, Airbnb, and Netflix, limiting training data.
– Instrumentation requires cross-cutting changes across codebases, rather than sequential additions.
– Some tasks required 50+ commands over 10+ minutes. Models consistently performed worse as tasks lengthened.

Migdał added, “AI SRE in 2026 is what DevOps Anomaly Detection was in 2016—lots of marketing, huge budgets, but lacking independent benchmarks. Just as SWE-Bench became the standard for coding evaluation, we need SRE-style benchmarks to determine what actually works. That’s why we’re releasing OTelBench as open-source: to create a North Star for navigating the AI hype and to enable the community to track real progress.”

A Path Forward
Despite the challenges, the benchmark reveals promising signals. Claude Opus 4.5, GPT-5.2, and Gemini 3 models show capability on specific tasks, with go-otel-microservices-traces reaching a 52% pass rate. With more environments for Reinforcement Learning with Verified Rewards, OpenTelemetry instrumentation appears to be a solvable problem for future AI systems.

Until then, organizations requiring distributed tracing across services should expect to write that code themselves—or work alongside AI assistants that understand their limitations.

OTelBench is available today as an open-source project at https://quesma.com/benchmarks/otel/, enabling researchers and practitioners to reproduce results and contribute additional test cases.

Lucie Šimečková
Quesma
press@quesma.com

Legal Disclaimer:

EIN Presswire provides this news content “as is” without warranty of any kind. We do not accept any responsibility or liability
for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this
article. If you have any complaints or copyright issues related to this article, kindly contact the author above.

Information contained on this page is provided by an independent third-party content provider. XPRMedia and this Site make no warranties or representations in connection therewith. If you are affiliated with this page and would like it removed please contact pressreleases@xpr.media

The Slow Collapse Examines How America Drifted From Its Foundations Without A Shot Fired

The Slow Collapse Examines How America Drifted From Its Foundations Without A Shot Fired

Written under the name An American Patriot, this reflective new book offers a sobering meditation on how a nation built

January 25, 2026

RestoPros of Southeast Atlanta Receives Best Water Damage Restoration Company Award

RestoPros of Southeast Atlanta Receives Best Water Damage Restoration Company Award

DECATUR, GA – January 20, 2026 – PRESSADVANTAGE – RestoPros of Southeast Atlanta has been recognized as the Best Water

January 25, 2026

Amana Care Clinic Expands Urgent Care Services to Meet Growing Muscatine Healthcare Needs

Amana Care Clinic Expands Urgent Care Services to Meet Growing Muscatine Healthcare Needs

MUSCATINE, Iowa – January 20, 2026 – PRESSADVANTAGE – Amana Care Clinic – Muscatine has expanded its walk-in medical

January 25, 2026

Agency Reports Record Demand for Newborn Care Support

Agency Reports Record Demand for Newborn Care Support

PHOENIX, AZ – January 20, 2026 – PRESSADVANTAGE – The Newborn Care Solutions Agency, the only newborn care placement

January 25, 2026

Deep Blue Sports + Entertainment Announces its Fourth Annual Business of Women’s Sports Summit 2026 presented by GEICO

Deep Blue Sports + Entertainment Announces its Fourth Annual Business of Women’s Sports Summit 2026 presented by GEICO

With speakers Sue Bird, Swin Cash, Ashlyn Harris, and more, the Summit’s theme explores incrementality to ensure an

January 25, 2026

Announcing the 2026 AAFCS Distinguished Service Award Winners

Announcing the 2026 AAFCS Distinguished Service Award Winners

Announcing the 2026 American Association of Family and Consumer Sciences Distinguished Service Award Winners.

January 25, 2026

Lithium Battery Company Raises $4.3M for America’s First Fully Automated Lithium Battery Plant

Lithium Battery Company Raises $4.3M for America’s First Fully Automated Lithium Battery Plant

Lithium Battery Company is reshoring manufacturing with a $4.3M investment in a fully automated, 100% domestic battery

January 25, 2026

New Jonathan Grave Thriller ‘Scorched Earth’ Raises the Stakes in John Gilstrap’s Bestselling Series

New Jonathan Grave Thriller ‘Scorched Earth’ Raises the Stakes in John Gilstrap’s Bestselling Series

Book 17 in the blockbuster series arrives February 24, 2026 LOS ANGELES, CA, UNITED STATES, January 20, 2026

January 25, 2026

Digital Interview AI Launches on Kickstarter to Help Candidates Practice Real Interviews Before It Counts

Digital Interview AI Launches on Kickstarter to Help Candidates Practice Real Interviews Before It Counts

New AI-powered platform introduces real-time, interactive interview simulations designed to help candidates prepare for

January 25, 2026

BREATHE! Exp Launches GPS Summit, the World’s First Certification Program for AI Systems Generalists

BREATHE! Exp Launches GPS Summit, the World’s First Certification Program for AI Systems Generalists

Three-Day Enterprise Training Program Addresses Growing AI Skills Gap as Companies Struggle to Achieve ROI on AI

January 25, 2026

Day One Experts Announces Partnership Between Gone to Texas Podcast and Day One Camouflage for 2026 Spotlight Series

Day One Experts Announces Partnership Between Gone to Texas Podcast and Day One Camouflage for 2026 Spotlight Series

Day One Camouflage, a Texas-based apparel company owned by a U.S. Navy veteran, will serve as Official Wardrobe &

January 25, 2026

Avant Names Daniel Van Hoff as Chief Operating Officer

Avant Names Daniel Van Hoff as Chief Operating Officer

The Experienced Ed Tech Leader Aims to Drive Next Stage of Growth in Language Proficiency Innovation We’re thrilled to

January 25, 2026

The Loop Marketing Adds Three 2026 Web Excellence Awards

The Loop Marketing Adds Three 2026 Web Excellence Awards

The Loop Marketing earns three 2026 Web Excellence Awards for standout web design and integrated campaigns, showcasing

January 25, 2026

Larkins Investigations now offering it’s Top Rated Investigative and TSCM services across the entire state of Arkansas.

Larkins Investigations now offering it’s Top Rated Investigative and TSCM services across the entire state of Arkansas.

To learn more about Larkins Investigations top rated services, read client reviews, or schedule a confidential

January 25, 2026

From Corporate Executive to AI Leader: Gerardo Kerik Named WSI’s Latest Top Contributor

From Corporate Executive to AI Leader: Gerardo Kerik Named WSI’s Latest Top Contributor

Most people want to talk about platforms and automation. That's step six or seven. We focus on governance, training,

January 25, 2026

Ecowaste Solutions Expands Tulsa and Kansas City Footprint with Dual Acquisitions

Ecowaste Solutions Expands Tulsa and Kansas City Footprint with Dual Acquisitions

Baker Trash Service and Gardner Disposal Bring Decades of Local Expertise to Ecowaste's Regional Footprint COPPELL, TX,

January 25, 2026

DBSync expands its leadership in accounting automation to the Xero ecosystem

DBSync expands its leadership in accounting automation to the Xero ecosystem

DBSync expands its accounting automation leadership to the Xero ecosystem with a new connector enabling end-to-end

January 25, 2026

Marconi Technologies Responds to NFPA/IFC Code Push with Bi-Directional Amplifier Solutions for First-Pass Approvals

Marconi Technologies Responds to NFPA/IFC Code Push with Bi-Directional Amplifier Solutions for First-Pass Approvals

Marconi Technologies expands its BDA and engineering support program to help projects meet NFPA and IFC in-building

January 25, 2026

Parent Teacher Home Visits Welcomes Four New Board Members, Elects New Leadership

Parent Teacher Home Visits Welcomes Four New Board Members, Elects New Leadership

PTHV announces four new board members: Shital Shah (Board Chair), Georgia Rhett, Mary Jane Cobb (Treasurer), and Dr.

January 25, 2026

ABRAMORAMA ACQUIRES NORTH AMERICAN DISTRIBUTION RIGHTS TO 2DIE4 FEATURE

ABRAMORAMA ACQUIRES NORTH AMERICAN DISTRIBUTION RIGHTS TO 2DIE4 FEATURE

NEW YORK, NY, UNITED STATES, January 20, 2026 /EINPresswire.com/ — Groundbreaking racing film from Brazilian

January 25, 2026

Jobleads Study Finds Fully Remote Jobs Make Up Just 6% of U.S. Job Market While 39% of Job Seekers Want the Flexibility

Jobleads Study Finds Fully Remote Jobs Make Up Just 6% of U.S. Job Market While 39% of Job Seekers Want the Flexibility

The platform analyzed 5M+ jobs posted in the US and surveyed more than 426K American job seekers to find the mismatch

January 25, 2026

2026 To Be A New Era For IT Sustainability With Green AI

2026 To Be A New Era For IT Sustainability With Green AI

Sectors of Industries are embracing Green AI to enhance the efficiency of AI models and reduce its environmental

January 25, 2026

Anesa Miller, Author, to Address New York Literary Dominance at AWP Conference — March 4–7, 2026, Baltimore, MD

Anesa Miller, Author, to Address New York Literary Dominance at AWP Conference — March 4–7, 2026, Baltimore, MD

A decade-long study explores how publishing power shapes which books gain national visibility and who gets seen

January 25, 2026

SnagitLive.com Launches as Next-Gen Online Auction Marketplace Combining Real-Time Discovery and Zero-Barrier Selling

SnagitLive.com Launches as Next-Gen Online Auction Marketplace Combining Real-Time Discovery and Zero-Barrier Selling

NEW YORK, NY, UNITED STATES, January 20, 2026 /EINPresswire.com/ — SnagitLive.com, a newly launched online auction and

January 25, 2026

Control Asian Cycad Scale with a Superior Organic Horticultural Oil

Control Asian Cycad Scale with a Superior Organic Horticultural Oil

Lightweight horticultural oils, such as Summit’s Year-Round® Spray Oil, control scale by coating and smothering pests

January 25, 2026

Branded Hospitality Media and DoorDash Extend Strategic Partnership Into Third Year, With Events and Activations

Branded Hospitality Media and DoorDash Extend Strategic Partnership Into Third Year, With Events and Activations

Branded Hospitality Media and DoorDash mark a third year of partnership, continuing event-led activations connecting

January 25, 2026

Tiny Transitions Unveils Mentorship + Certification to Help Women Build Flexible, Profitable Careers Supporting Families

Tiny Transitions Unveils Mentorship + Certification to Help Women Build Flexible, Profitable Careers Supporting Families

We’re not just training consultants, we’re developing confident, capable business owners. When women are supported,

January 25, 2026

ChemCeed Partners with PMI to Launch Line of High-Performance Rubber Accelerators and Antioxidants for the U.S. Market

ChemCeed Partners with PMI to Launch Line of High-Performance Rubber Accelerators and Antioxidants for the U.S. Market

ChemCeed Enhances Polymer Compounding Portfolio through PMI Partnership, Ensuring Rapid Domestic Delivery of Essential

January 25, 2026

Andrea Ingersoll Totte selected as Top Laboratory Planner by IAOTP

Andrea Ingersoll Totte selected as Top Laboratory Planner by IAOTP

The International Association of Top Professionals (IAOTP) will honor Andrea Ingersoll Totte at their annual awards

January 25, 2026

Triad Semiconductor Reinvents Audio Signal Capture with TS5510: The Universal AFE

Triad Semiconductor Reinvents Audio Signal Capture with TS5510: The Universal AFE

WINSTON – SALEM, NC, UNITED STATES, January 20, 2026 /EINPresswire.com/ — Triad Semiconductor, a leader in

January 25, 2026

East Coast Electronics Recycling Expands Solar Panel Recycling Services Across the East Coast

East Coast Electronics Recycling Expands Solar Panel Recycling Services Across the East Coast

Company strengthens regional capabilities to help businesses and organizations responsibly recycle end-of-life solar

January 25, 2026

ECER Inc – CT Reaffirms Renewed Commitment to Secure, Compliant Electronics Recycling Across Connecticut

ECER Inc – CT Reaffirms Renewed Commitment to Secure, Compliant Electronics Recycling Across Connecticut

Strengthened service focus supports Connecticut businesses with dependable e-waste pickups, secure handling practices,

January 25, 2026

SEMAI Launches Industry-First ‘AEO Audit’ to Optimize B2B Content for AI Search Overviews

SEMAI Launches Industry-First ‘AEO Audit’ to Optimize B2B Content for AI Search Overviews

New feature helps B2B marketers audit AEO readiness, identify intent gaps, and optimize pages for AI-driven search

January 25, 2026

Canada Health Act Changes Could Disrupt Access to Nurse Practitioner Care for Albertans, Warns Beaumont Health Leader

Canada Health Act Changes Could Disrupt Access to Nurse Practitioner Care for Albertans, Warns Beaumont Health Leader

The upcoming Canada Health Act-related policy and funding changes taking effect April 1, 2026 may affect patients’

January 25, 2026

Supio and YoCierge Announce Strategic Partnership to Accelerate Technology Driven Growth for Personal Injury Law Firms

Supio and YoCierge Announce Strategic Partnership to Accelerate Technology Driven Growth for Personal Injury Law Firms

Strategic partnership transforms how Personal Injury (PI) firms manage, build, and grow their practices. Personal

January 25, 2026

Marriage Bites”: Joe Komara and Ashley Hargrove Sink Their Teeth into Horror-Comedy at World Premiere

Marriage Bites”: Joe Komara and Ashley Hargrove Sink Their Teeth into Horror-Comedy at World Premiere

ATLANTA , GA, UNITED STATES, January 20, 2026 /EINPresswire.com/ — Fans of both horror and comedy were thoroughly

January 25, 2026

MSP360 Releases 2025 State of Managed Backup Report Highlighting Downtime Risks and Resilience Gaps

MSP360 Releases 2025 State of Managed Backup Report Highlighting Downtime Risks and Resilience Gaps

MSPs are being pushed toward resilience driven models where recovery speed, continuity planning, and operational

January 25, 2026

Arts Garage Kicks Off 2026 with Momentum Following Successful Annual Campaign

Arts Garage Kicks Off 2026 with Momentum Following Successful Annual Campaign

Delray Beach arts hub celebrates 15 years, announces summer youth programming, world-class performances, and immersive

January 25, 2026

National Psoriasis Foundation Announces Inaugural San Francisco Soirée Benefiting Psoriatic Disease Research and Care

National Psoriasis Foundation Announces Inaugural San Francisco Soirée Benefiting Psoriatic Disease Research and Care

SAN FRANCISCO, CA, UNITED STATES, January 20, 2026 /EINPresswire.com/ — The National Psoriasis Foundation (NPF) is

January 25, 2026

NCHE Launches ‘Hope In Action’ Podcast On 10th Anniversary of National Day of Racial Healing

NCHE Launches ‘Hope In Action’ Podcast On 10th Anniversary of National Day of Racial Healing

The National Collaborative For Health Equity’s first podcast highlights how art can be a powerful tool for addressing

January 25, 2026