The Challenges of Testing Artificial Intelligence (AI)

Listen on the go!

Our day-to-day lives are becoming increasingly reliant on the direction, decision-making, and support of AI systems.

Never in the history of technology has the threat or the need to protect the integrity of such decision-making been more urgent or real.

I recently served as an official reviewer for a new BCS pre-publication book titled “Artificial Intelligence and Software Testing – Building systems you can trust”.

A brilliant reference covering all aspects of the challenges of both Artificial Intelligence (AI) and Machine Learning (ML), and how traditional approaches to Software Testing have been disrupted and challenged.

Societal trust in AI decision-making is at the very heart of what AI is all about. The greater the level of societal trust in AI, the greater the level of societal adoption of AI.

AI is making our lives easier in many ways never imagined, in the form of driverless cars or helping law courts derive smarter offender sentencing decisions.

We need to, however, believe that AI decision-making is arriving at the correct answer or response that is also both ethical and non-biased.

Trust in AI responses

The testing of AI systems is at the very heart of building confidence and trust in AI answers and responses.

Not only are the problems harder, but AI systems are different. AI systems change in response to stimuli, unlike other software that only changes when updated deliberately.

AI’s change in behaviour is driven by stimuli, not predetermined like other software updates. So, the testing itself will influence how the systems will behave next and make any traditionally expected results more unpredictable.

One of the challenges around the testing of AI is the ability to reproduce and explain a set of results. Ultimately, the real challenge is convincing everyone that AI systems can be trusted with important decisions.


Life as we encounter it is full of biases, some explicit and some implicit, some human-constructed and some not.

People and data have biases, and these can become embedded in AI systems. An example being, the relative number of women in technology is smaller than the number of men, which can lead AI to be biased in favour of men when deciding who is more likely to succeed in a technology-based role.

Broad-based use of AI systems resulting in the calcification and reinforcement of such biases is a significant societal risk that must be addressed.

This bias risk is further compounded by the fact that people and the public at large trust computers too much.

Many people assume that “if the computer says so, it must be right.”

We are now at the juncture where what the AI system is processing may be correct, but the ultimate answer and subsequent societal impact may be extremely negative.

Should such a system be allowed into production, and whose role is it to stop the system going live?

Ethical Behaviour

How do we test for ethical and unethical behaviour? In a situation where grey areas or known ethical challenges exist, how to handle it?

An example is where a driverless car swerves either left out of the way of an oncoming vehicle and collides with a group of six adults waiting at a bus stop or swerves right and collides with a mother pushing her new baby in a stroller.

What if the oncoming vehicle was recklessly travelling on the wrong side of the road? What constitutes ethical behaviour in this case and what is the correct system response?

How do we program such behaviour? How do we test for it?


All the challenges and differences associated with AI systems have implications for skills and the need for specialism in areas of data science and mathematics due to the probabilistic nature of the system.

Conventional systems are more understandable because they contain logic expressed in code that technical people can understand and is logically structured into classes and methods that are essentially designed by a human and usually relate to the requirements, functionality, or input and output data in some way.

Test data design is complicated enough that, while test professionals should understand the process, it must be done by a professional data scientist. AI systems can be very complex.


There are things that don’t change. It has long been known that the longer a defect exists, the more it will cost, both in terms of impact on the project, the system, and its users and in terms of cost to remove.

However, despite the testing principles that remain around shifting left (and right), when talking about AI, it is hard to imagine a situation where AI does not prove highly disruptive to all aspects of software engineering, including software testing.


Just because the testing of AI systems is full of new challenges, new risks, and new skill requirements does not mean that the old risks and skills are obsolete. There is no questioning the fact that AI has introduced an era where it is becoming more difficult and challenging to predict a system’s expected output and behaviour. This is not the way all aspects of software and system testing have traditionally been planned for and executed in the testing of non-AI systems.

The software testing community has become accustomed to testing and validating a system against a specified and predetermined set of expected results. We now need a reset on our thinking and understanding around the testing of systems to gain insights into new test design techniques to augment traditional techniques for testing AI.

While leveraging AI for testing apps for quality, enterprises may face multiple challenges such as identifying the exact use cases, lack of awareness about what really needs to be done, verifying the app behaviour based on the data that has been input, testing apps for functionality, performance, scalability, security, and more.

Cigniti’s extensive experience in the use of AI, ML, and analytics helps enterprises improve their automation frameworks and QA practices. Cigniti provides AI/ML-led testing and performance engineering services for your QA framework through the implementation of its next-generation IP, BlueSwan™.

Need help? Talk to our AI experts to learn more about overcoming the challenges of testing AI, and how Cigniti can help in the AI/ML digital transformation journey.


  • Jack Mortassagne

    Jack has more than 30 years of experience in quality assurance and software testing. He worked for many years as an independent Test Management consultant alongside and within some of the world’s leading Software Testing and Quality Engineering consulting firms before going on to cofound a successful independent Quality Assurance consulting firm with a specific focus on the Banking and Financial sector. Jack was recently accredited as a TMMi Assessor by the TMMi Foundation.

    View all posts

Leave a Reply

Your email address will not be published. Required fields are marked *