The Evolution of Software & Validation Testing: An Introduction

Introduction

With over 30 years of regulatory history, computer system validation has a relatively established set of best practices and processes for testing regulated software. Even with periodic guidance and clarification updates, traditional software system validation is fairly well understood.

However, the advent of Artificial Intelligence systems cast doubt on the applicability of traditional validation practices. Given the structured process of validation and the speed at which AI changes, how should testing these systems be approached?

Brief Validation & Computing History

The core regulatory framework for computer systems validation was established through GMP requirements such as 21 CFR Parts 210/211, 21 CFR Part 820, and EU Annex 11. 21 CFR Part 11 – introduced in 1997 – formalized FDA expectations for electronic records/electronic signatures. They also reinforced the need for validating the systems that support them.

At the time 21 CFR Part 11 was introduced, software was largely distributed on floppy disks or CDs. Software was installed in the company’s local data centers and updates were released on average every 12-24 months. And the software was largely considered “black box”, meaning there was little to no visibility into the testing practices or testing documentation that was performed for their releases.

Regulatory validation largely involved testing every button, field, and link in the software. Because vendor documentation wasn’t available, customers typically retested every feature to make sure all functional requirements worked as expected.

Over time, software delivery cadence and methods changed, as did vendor documentation. Software moved from slow physical delivery and updates to frequent electronic delivery – initially every 1-3 months, to now sometimes daily. Location of the software code base also changed from on premises to hosting in the cloud with internet access and rights.

Many vendors – especially those in the regulated space – started sharing their testing documentation to help customers meet regulatory requirement more easily. This allowed customers to focus on usage testing and ensuring the software would function for their specific needs and workflows.

Validation testing guidance has also evolved, moving from highly detailed functional testing (every button and every field), to a more right-sized risk-based approach, testing those areas that have regulatory implications and that the customer will actually use.

A key feature of this is that traditional software outputs are predictable and reproducible (deterministic) given the same configuration, state, and input. For example, clicking the Save button will save the file. Entering data in a field will retain only that information. In a workflow, setting up specific conditions will determine exactly which route the system takes. This predictability makes testing easier and more reliable.

What is Artificial Intelligence (AI)?

To understand why this matters, it’s important to clarify what’s meant by AI in regulated environments. AI is an umbrella term under which many models, such as machine learning (ML) and Large Language Model (LLM) architectures fall. A highly visible slice of today’s AI—especially generative AI—uses LLMs. LLMs are systems that are trained on large data sets (books, encyclopedias, websites, and so on).

Users enter natural language prompts, and the software is able to provide responses accordingly. It’s largely a probability and prediction engine, predicting the next word in a sentence given the inputted subject. While AI is not truly intelligent yet, it is incredibly powerful, being able to process options, predict outcomes, and so on at astounding speed.

A challenge with AI is that while LLMs can be trained for reproducible attributes, LLMs are often more variable than traditional software. Because it’s a prediction engine calculated from the inputs, and because each LLM has a different training set, the output from these engines can vary – even with the exact same input! While the general idea of those outputs are usually similar, the specific content frequently differs (and even the main idea may also change).

Additionally, LLMs can be subject to performance drift over time or produce fabricated responses depending on input conditions. In production, output variability can also come from system factors like retrieval content changing, model/version updates by vendors, configuration changes, or external tool integrations.

Testing Traditional Software vs AI

The advent of Artificial Intelligence (AI) is changing the software world faster than ever before, and as described above, testing can be very difficult to complete reliably. In traditional software testing, inputting A would reliably return B. And inputting C would always return D. However, with AI, inputting A could result in B.1, B.2, B.3, or even possibly D being returned depending on the conditions.

In this world of powerful but sometimes unpredictable performance, how can you adequately test to ensure that the software is fit for your needs and will give you what you need? AI systems fundamentally challenge the assumption of the predictable nature on which decades of validation frameworks were built. This means that validation strategies must evolve from static functional verification to ongoing, risk-based, statistically informed governance.

What’s Next:

The evolution of software and validation testing isn’t slowing down — and neither are we. At 11 Compliance, we’re actively working with the latest tools and approaches so we can give you real, practical guidance, not just textbook answers. Whether you’re mid-project or just starting to plan, we’d love to be part of your process. Reach out and let’s connect.