The Quality Trail: March 2026 QA News

Home » Blogs/Events » The Quality Trail: March 2026 QA News

From the Desk of the Editor

Hey there, and welcome back to The Quality Trail.

Once again, everyone’s talking about AI in testing. Two major studies this quarter tried to measure how much of that talk has turned into results (spoiler alert: the answer is probably less than you’d think). This edition digs into that gap, covers Meta’s research on throwaway tests that catch real bugs, and looks at how the QA role is shifting with an increasing realization that AI outputs are exceptionally difficult to validate. We’ve also got tool updates (Playwright CLI, Selenium 4.41, Cypress v16 security changes) and a packed spring conference calendar.

As always, if you think we missed something, we want to hear about it! You can also sign up to receive these testing updates via email.

– The QualityLogic Editorial Team

Upcoming Conferences and Events

Spring conference season is in full swing. Here’s what’s coming up:

ParisTestConf (March 31, Paris): Single-day, single-track conference focused on practical testing talks. Good signal-to-noise ratio for a one-day commitment.
Swiss Testing Days (April 3, Zurich): Long-running European event covering test strategy, automation, and quality leadership. Strong attendance from enterprise QA teams. From their description: “As artificial intelligence reshapes our digital world, the role of QA is entering a critical decade. The next five years will determine whether we guide AI or are guided by it. Swiss Testing Day 2026 challenges testers, developers, and tech leaders to go beyond tool sand frameworks: to become Guardians of Trust in an AI-driven future.”
TestGuild IRL San Francisco (April 7, San Francisco): Free, in-person, limited to 100 seats. Evening networking-focus from 4:00 – 8:00 PM with hands-on sessions.
TestGuild IRL Los Angeles (April 9, Los Angeles): Same format as the SF event. Free, 100 seats, first come first served, from 4:00 – 8:00 PM.
QonfX Bangalore (April 10, Bangalore): Invite-only QA leadership forum focused on executive-level quality strategy conversations.
ACM/IEEE AST 2026 (April 12-18, Rio de Janeiro): Academic conference on automation of software test. If you care about where testing research is headed, this is where the papers land.
UCAAT (April 14-16, Sophia Antipolis, France): User Conference on Advanced Automated Testing. Small, focused, and heavily technical.
TestingUY (April 15-16, Montevideo): Latin America’s largest dedicated testing conference featuring two days of talks and workshops.
QA Financial Forum Toronto (April 23, Toronto): Focused on QA in financial services. Relevant for teams dealing with testing in highly regulated environments.
STAREAST (April 26 – May 1, Orlando): One of the largest testing conferences in North America featuring workshops, keynotes, and an expo. Early bird pricing ends March 27.
SeleniumConf & AppiumConf (May 6-8, Valencia): The joint conference returns. If you work with Selenium or Appium, this is the community gathering of the year.
Romania QualityConnect (May 6-7, Cluj-Napoca): Regional event growing in reputation, focused on practical QA topics.
QA Financial Forum New York (May 12, New York): If you missed Toronto, this covers the same ground for the US East Coast audience. Speakers tend to differ between the two.

For the full year-round list, testingconferences.org remains the best single resource.

Agentic AI Meets Reality

The Trust Gap

Two independent studies released in February landed on the same conclusion. Leapwork’s survey of 300+ software engineers and QA leaders found that 88% say AI is a priority for their testing strategy, but only 12.6% actually apply it across key workflows. The top barrier is quality and reliability concerns, cited by 54% of respondents. “AI dominates testing strategy conversations. Nearly every organization sees it as essential to the future of quality. Testing supports critical systems, where accuracy and reliability are non-negotiable. That standard determines how far and how fast teams apply AI.”

BrowserStack’s State of AI in Software Testing 2026 report surveyed 250+ CTOs and QA leaders and found a similar pattern. 88% of organizations are increasing budgets. 94% of teams use AI in testing, but only 12% have reached full autonomy. Integration (not budget) is the number one barrier at 37%. Organizations using AI for 4+ years are 83% more likely to achieve over 100% ROI, but higher spending alone doesn’t guarantee stronger returns.

Teams that started early are pulling ahead. Teams that bought a tool last year and expected it to pay for itself are not. As BrowserStack CTO Nakul Aggarwal put it, “Too many teams think adopting AI is the finish line, when it’s really the starting point.”

In an age where the latest-and-greatest feels like it shifts by the day, having roles dedicated to experimentation and building out expertise becomes increasingly important.

Meta’s Ephemeral Testing Experiment

On the research side, Meta published a paper on Just-in-Time Catching Tests (JiTTests). Instead of maintaining a static test suite, JiTTests uses LLMs to generate tests tailored to each code change (or diff), runs them to catch bugs, then throws them away. The tests are designed to fail, surfacing potential issues rather than confirming expected behavior. The idea is to get rid of the maintenance and test code review that often plague QA teams.

You are probably wondering, why wouldn’t I want to maintain a suite of permanent and provenly solid tests? Meta’s research reveals that static suites often fall into a “Regression Only Trap,” where the massive effort required to maintain old tests during rapid code changes eventually provides less value than the new bugs they fail to catch. By shifting to Just-in-Time ephemeral tests, you can identify four times more defects in new code while eliminating the permanent technical debt and CI bottlenecks caused by a bloated test repository.

Unlike traditional “hardening tests” that pass upon creation to protect against future regressions, these tests are specifically designed to fail on proposed changes to surface bugs before they are merged.

The numbers from 22,126 generated tests were rather promising, demonstrating a 4x improvement in catch rate over hardening tests, a 70% reduction in human review load through automated assessors, and 8 confirmed true positives out of 41 candidates reported to engineers. Four of those eight would have caused serious production failures.

This is early research at a scale most organizations can’t replicate (yet). But the core concept, tests that exist only to catch bugs in a specific change and then disappear, eliminates the maintenance problem that eats most automation budgets. We suspect this idea will continue to surface and may be something worth looking out for.

The Evolving QA Role

The trust gap and JiTTests both surface the same underlying question: what does QA even look like when the software writes itself at an outrageously rapid pace? This wonderful piece (The new role of QA: From bug hunter to AI behavior validator) puts it clearly. “For years, QA has operated on a simple principle: define the expected behavior, run the test, compare actual results to expected results. Pass or fail. Green or red. Binary outcomes for a binary world.” “The core question has shifted from ‘Does this work?’ to ‘Does this work well enough, safely enough, and fairly enough?’ That’s simultaneously more important and harder to answer.

We’re no longer validating specific outputs. We’re validating behavior boundaries. Does the AI stay within acceptable parameters? We’re testing for bias and fairness in ways that never appeared in traditional test plans.”

Regulation is accelerating this. Many provisions of the EU Artificial Intelligence Act have already gone into effect, with more requirements for high risk AI systems in August 2026 and high-risk systems for regulated products in August 2027. For QA teams building or testing AI-powered products with EU exposure, compliance testing processes need to exist immediately, not “when we figure it out.” Conformity assessments for high-risk systems are not optional, and “we’ll figure it out later” is not a strategy regulators accept.

Separately, Anthropic launched Code Review in Claude Code on March 9: a multi-agent system that automatically analyzes code and flags logic errors. AI reviewing AI-generated code is now a shipping product. How well it actually works across real codebases is a separate question.

Tool Updates

Microsoft released @playwright/cli, a standalone command-line interface built specifically for AI coding agents. It replaces the MCP server approach with a more token-efficient interface, which is helpful for any team that uses AI agents to write and run Playwright tests.

The latest release of Selenium (version 4.41.0) includes continued BiDi protocol work and a Grid deep dive covering Kubernetes Ingress NGINX migration ahead of the March 2026 EOL. Despite the growing prevalence of Playwright and other test automation frameworks, Selenium shipped 12 releases in 2025, so no, it’s not dead. Speaking of testing frameworks, Cypress got a security overhaul. Cypress.env() currently exposes all environment values (including secrets) to the browser context. The upcoming v16 introduces a three-tier security model splitting values into expose () (public), env() (browser-secret), and backend-only via process.env. Deprecation warnings are live in v15.10. If your test suite passes secrets through Cypress.env(), start migrating now.

What We’ve Been Reading

AI Won’t Fix Your Testing Strategy – Neil Duggan: Duggan writes from 15 years of QA leadership, including the Olympics and FIFA World Cup. His core argument in this piece is that AI is an accelerant that makes good strategies better and bad strategies worse. His example of self-healing tests silently suppressing real regressions deserves some attention.
Development Got 10x Faster. Testing Didn’t. – Hürkan Tuna: AI-assisted development now accounts for 41% of code written, but teams on 15-day sprints still spend 3-5 days on QA. Tuna identifies four systemic failures in test automation that explain the gap. Worth reading alongside the Duggan piece above.
The Death of Determinism: How AI Forces Us to Rethink Testing – Padget Avery/Capgemini: Uses Azure Document Intelligence as a concrete example of why binary pass/fail breaks for probabilistic systems. Where most pieces on this topic stay abstract, we found this one quite practical and specific.
Everyone is NOT Responsible for Quality – James Bach: Bach dismantles the “quality is everyone’s responsibility” mantra through four interpretations, all of which collapse under scrutiny. Likely useful ammunition for anyone defending dedicated testing roles.
Quality at Speed – Maaike Brinkhof: A sharp critique of how “quality at speed” has been hollowed out to just mean “fast.” Brinkhof traces the phrase to Atlassian circa 2014 and argues the LLM era has made the problem worse. Anyone who’s been told to “just move faster” will likely appreciate this one.
Building an Agentic Engineering Org – Angie Jones: How Block (formerly Square) scaled AI adoption to 95% of engineers, with many running parallel agents. Real organizational data from a VP of Engineering. Her strategy of buying access to many tools rather than standardizing prematurely goes against the usual enterprise playbook, and she explains why.
Why I Am Not Worried About AI Replacing Me – Matthew Sullivan: Sullivan argues that the “AI will replace testers” narrative is driven more by CEOs and VCs than by the reality of non-trivial software work. His primary point is that prediction is not discernment, and the roles most at risk on paper are exactly where risk tolerance is lowest.
US Job Market Visualizer – Andrej Karpathy: Karpathy’s interactive tool visualizes 342 occupations from the Bureau of Labor Statistics, colored by AI exposure. QA and software testing roles score high. Sullivan’s piece above argues the opposite: that digital work requiring judgment is precisely what AI handles worst. Read them back to back and decide where you land.
QASkills.sh – Pramod Dutta: A curated directory of testing-specific skills for AI coding agents (Claude Code, Cursor, Copilot, and others). One command installs structured QA knowledge into your agent’s context. 280+ skills and counting (some better than others). Worth exploring if you’re integrating AI agents into your test workflows.
AI Did Not Break Testing – Katja Obring: Obring’s argument here isn’t that AI doesn’t matter, actually quite the opposite. It’s that the testing profession already has the tools for dealing with systems whose internals you can’t inspect: observability, monitoring, and guardrails. The real problem is a lack of shared educational baseline that forces each generation to relearn the same lessons.
When Building Is Cheap, Quality Becomes a Bigger Differentiator – Esben Bager: As AI compresses the cost of writing code, competitive advantage shifts from feature count to whether your software actually works. Bager cites Sonar data showing 42% of committed code is now AI-generated, but 96% of developers don’t fully trust its correctness. This is a short read with a clear thesis.
From Fragile to Agile Part II: The Sequence-Based Dynamic Test Quarantine System – Abinodh Thomas/Reddit Engineering: How Reddit handles flaky tests at scale using a sequence-based quarantine system that automatically detects and isolates unreliable tests. If you’re sick of the AI talk, you’ll probably appreciate this solid infrastructure engineering solving a problem every team has.
17 Playwright Testing Mistakes You Should Avoid – Yevhen Laichenkov: Covers the common mistakes that cause flaky, slow, and hard-to-maintain Playwright tests. How many of these have you made without realizing it?
Deep Dive into Playwright CLI: Token Efficient Browser Automation – Pratik Patel/TestDino: A detailed walkthrough of the new Playwright CLI mentioned earlier in this edition. If you’re thinking about integrating Playwright with AI coding agents, this covers what the CLI does, how it differs from the MCP server, and where the token savings come from.
How We Release the Spotify App: A Look Under the Hood (Part 2) – Spotify Engineering: The internal tooling behind shipping Spotify to hundreds of millions of users covering their Release Manager Dashboard built on Backstage, and how they reduced friction across a release process that touches hundreds of millions of devices.
Advanced Playwright Authentication: A Multi-Role Fixture for Scalable E2E Testing – Faizan Ahmad: If your Playwright suite handles multiple user roles and you’re still logging in during beforeEach, this is for you. Ahmad walks through a custom fixture that handles locking, session expiration, and role-aware state management across parallel workers. Includes full code examples.
AAAAA Testing: How to Make Tests AI-Friendly – Fedor Novikov/Bolt: Extends the familiar Arrange/Act/Assert pattern with two more A’s: Anticipate and AI. The goal is structuring tests so AI agents can read, diagnose, and refactor them without losing context. Includes concrete examples of what each “A” looks like in real test code.
Tech Layoffs 2026 Tracker – MedhaCloud: 166 layoff events and 55,775 jobs cut so far in 2026. QA and manual testing roles are explicitly called out as a targeted category alongside middle management. This is worth tracking in the context of every other story in this newsletter.

That’s All for Now!

That wraps up this edition. If you found something useful here, share the link with a colleague who might benefit. And if we missed a story that deserves attention, we always want to hear about it.

Until next time, keep testing, keep learning, and keep pushing for quality!

Interested in More Information About QualityLogic?

Let us know how we can help out – we love to share ideas! (Or click here to subscribe to our monthly newsletter email, free from spam.)