The Quality Trail: September 2025 QA News

Home » Blogs/Events » The Quality Trail: September 2025 QA News

From the Desk of the Editor

Welcome to a mid-September edition of the Quality Trail, your one stop shop for the latest news and information in the QA space. It’s been a super busy month for us, so we’ve opted to cover the happenings in the month of August and the first half of September together this time around.

– The QualityLogic Editorial Team

As always, let us know if you think we’ve missed something, or share the link with your colleagues or partners who may benefit from some or all of this information. You can also sign up to receive these testing updates via email.

Conferences & Events

STARWEST 2025 (Anaheim, CA): Sunday September 21 – Friday September 26
KWSQA Targeting Quality 2025: Cambridge, Ontario Canada, Monday September 22 – Tuesday September 23
TestBash Brighton is the UK’s largest software testing conference, and it’s put on by the Ministry of Testing (so you know it’ll be good). This year it will take place from October 1 – 2 at the Brighton Dome in Brighton.

Becoming a Better QA Leader

This newsletter often touches on the technical or trendy aspects of software quality. While we think that’s largely a good thing (you can’t understate the importance in staying ahead these days), we’d also be remiss for failing to talk about the people element from time to time too.

Gary Hawkes, a seasoned QA leader, recently wrote an article titled 6 leadership mistakes that made me a better leader. It caused us to reflect a bit on the attributes that make-or-break management in our industry. After all, you can have the best testers and tools out there, but without effective leadership, it won’t matter. If you have the time, the article is a good one to read in full. Regardless, here are the lessons with some added commentary:

Support your team’s growth as professionals and as people proactively so that you won’t have to defend them reactively. One thing we have found works wonders is providing reports with the time for protected learning blocks tied to our current risks and/or objectives.
Be clear on the principles underlying the kind of leader you want to be and play that role. Make sure to regularly ask yourself: “Did failure lead to blame, or did it lead to inquiry and a lesson learned?”
Don’t impose metrics on teams. Instead, understand what success and failure look like to them. When you impose the metrics you want to focus on without asking for input, it feels forced and passes up the chance for a good conversation. When everyone mutually agrees on performance indicators, they will be more likely to want to meet them. If we can’t discuss metrics without weaponizing them, we shouldn’t be using them.
Protect your culture. When one of your reports is damaging it and mentoring fails, you may need to make tough decisions to protect the culture. This is obviously much easier said than done. However, it’s worth remembering the stats: one that sticks with me, MIT found that toxic culture was 10 times more of a predictor of industry-adjusted attrition than compensation. Put another way, avoiding one toxic worker can literally yield more value than hiring a “superstar.”
Give recruitment candidates an enjoyable interview experience. Recruit for your culture and diverse team dynamics first, skills second. Skills are significantly easier to train than culture anyway.
Be a leader when it comes to decisions about your career. Never be led by your emotions or your current situation.

Of course, all of this would be moot without talking about a good set of values. As a bonus, check out this article Want Strategic QA? Start with What Every Great Company Has: Core Values – MarinaJordao (Medium).

AI in QA

The Rise of AI Specific QA Roles

A few months back we predicted that, as LLMs become more widespread in the workplace, new roles and methodologies would begin to crop up. That seems to be well under way now.

A simple LinkedIn search will surface a bunch of people with the title of “Vibe code cleanup specialist.”

To level set, the term “vibe code” was coined by Andrej Karpathy (one of the founders of OpenAI) on X. In his own words: “There’s a new kind of coding I call “vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It’s possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like “decrease the padding on the sidebar by half” because I’m too lazy to find it. I “Accept All” always, I don’t read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I’d have to really read through it for a while. Sometimes the LLMs can’t fix a bug so I just work around it or ask for random changes until it goes away. It’s not too bad for throwaway weekend projects, but still quite amusing. I’m building a project or webapp, but it’s not really coding – I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.”

If you’re reading this as a quality specialist, that paragraph has probably got you seeing stars. Imagine knowing that there exists a subset of software engineers who do this on the job every day. A colleague recently told me a story about walking into a coffee shop in SF and watching a developer playing Chess. Occasionally, every few minutes, he’d switch over to Claude Code and almost absent mindedly give it an instruction, before going right back to his game.

Companies are loving the fact that AI has allowed them to spin up demos and prototypes in a matter of days where it used to take weeks… and herein lies the problem. We as QA practitioners know that getting code to run is actually the easy part. The edge cases that users are likely to encounter in the wild only start to present themselves with rigorous automated and manual testing. When developers let LLMs write the majority of their code, the importance of the tester is magnified tenfold. Enter the “Vibe Code Cleanup Specialist,” who goes in and refines the existing code to make it more maintainable prior to production. I originally learned about this from a post on LinkedIn. Since then, others have been posting about it as well.

Testing LLM-based Applications

It seems like everyone has a chat bot or LLM integration now. Yet, the nondeterministic nature of GenAI presents a million-dollar question, how are we supposed to effectively test them? In a post titled Stop “vibe testing” your LLMs. It’s time for real evals, developers at Google captured the sentiment brilliantly: “If you’re building with LLMs, you know the drill. You tweak a prompt, run it a few times, and… the output feels better. But is it actually better? You’re not sure. So you keep tweaking, caught in a loop of “vibe testing” that feels more like art than engineering.” Here are a few more resources on the topic:

Testing AI: lessons from wearing three hats – Avetik Babayan (Medium): From the vantage point of someone who has been a developer, project manager, and QA tester throughout their career.
How to implement self-healing tests with AI – Shyamal Raju (Medium)
Top LLM Chatbot Evaluation Metrics: Conversation Testing Techniques – Confident AI
How to Test LLMs, AI Assistants & Agents – The Future of QA – Youtube: A video conversation with Igor Dorovskikh, CEO & Co-Founder of Engenious and MarathonLabs.

What We’ve Been Reading

Cypress (the widely popular JavaScript based testing framework) was updated to version 15, which is laying the groundwork for AI integrated features, with an ambitious end goal of learning from your tests and making relevant suggestions. Read more in the blog post on The foundation for what’s next – Cypress Studio.
When hiring software testers doesn’t work (and what to do BEFORE you hire them) – Ministry of Testing: Hint, it’s rarely because the tester couldn’t handle the job. Instead, there is a chance that you might be trying to fix the symptom, not the underlying problem. You could have the wrong expectations. This is a good read for anyone that is considering bringing on testing talent, both internal and external.
One team, one goal: The reality of introducing a unified testing – Ministry of Testing: Does your organization have an overarching software testing strategy? For most, the answer to this question is a resigned “sort of,” if not an outright “no.” This post outlines the journey taken by a large organization (over 1000 employees) to get there without excluding the bumps in the road.
The Complete Playwright End-to-End Story, Tools, AI, and Real-World Workflows – Microsoft for Developers
The Three Pillars of QA: Why Testing Alone is Never Enough – Maksim Laptev (Medium): Makes the case that QA testing is a complex but achievable goal built on well-thought-out test documentation, stable test environments, and streamlined processes and communication.
Jason Huggins, founder of the widely popular Selenium browser automation framework, announced vibium, the AI-native successor to selenium which lets you write your tests in plain English. Learn more in the Founder video for Y Combinator’s F25 batch.
AI Agents and Test Suites: Lessons from the Trenches – James Kip (Medium)
“Works on My Machine!”: How to Deal With Dismissive Devs – Dev Tester

Interested in More Information About QualityLogic?

Let us know how we can help out – we love to share ideas! (Or click here to subscribe to our monthly newsletter email, free from spam.)