Physical and software-based artificial intelligence (AI)-based assistants have found their way into homes and organizations worldwide. But the widespread presence of these services doesn’t mean they can be treated with absolute trust.

We’ve all heard the old joke:

Hotel guest to concierge: “Hi, will you please call me a taxi?”

Concierge: “Certainly. You’re a taxi”.

This isn’t a particularly funny joke, but it is perhaps a little more amusing than your AI assistant acting similarly when you ask it to call you an ambulance. There are dozens of documented examples of AI assistants being wildly wrong and misleading: an AI tool that screened resumes but was biased against women; an experimental robot that brought the movie Short Circuit to reality by escaping from its development lab; a chatbot that made profane and inflammatory tweets; a calendar assistant getting confused about how time zones work.

To be fair, these are all past examples and AI assistant technology has continued to advance since many of these incidents occurred. A favourite example is when I asked ChatGPT two years ago when setting a charity quiz: name five songs where the name of the song does not appear in the lyrics. The response? Four instrumentals (which, to be fair, satisfied the brief despite doing so rather literally) and “Green Door” by Shakin’ Stevens (in which the title is in fact sung 17 times). The same request in May 2025 gave a much more believable answer*, though it is rather tenuous to say that the full title of The Beatles 1965 hit is actually “Norwegian Wood (This Bird Has Flown)”, and that the entire title is not sung.

Despite Advances, Errors Still Occur

In more recent years, AI assistants and related technologies have continued to slip up: an

AI robot that attacked a seven-year-old in a 2022 chess tournament because it misinterpreted the child’s movements; a mental health chatbot that provided very questionable advice; Meta’s Galactica AI-based scientific knowledge base, which existed briefly in 2022 and invented scientific papers; robo-taxis that had a hard time in 2023 co-existing on roads with human-driven fire trucks; a lawyer who was fired in 2023 because he used ChatGPT to file a motion and failed to check it (which would have revealed that it was riddled with inaccurate and even fictitious content).

The last case mentioned there is the perfect example of the problem with AI assistants – a problem that these systems share with human beings: they are fallible, and sometimes get things wrong. They are, after all, only as good as the data sources they learn from or draw from. If the data is flawed, or the interpretation of it is, then the output will be too. The greatest surprise about this fallibility is that so many people are still surprised that it exists.

The Decision-Making Process

Humans make decisions based on experience and information; of course, the two overlap considerably since experience is just a form of information. Similarly, AI engines work by ingesting vast amounts of data (570GB in the case of ChatGPT v3, according to ChatGPT v4 – it won’t disclose how much the newer version has ingested). Common sense tells us that some of that data will be incorrect, which of course means that some of the answers it provides us will be wrong too. Some of the data will contain bias, which means that outputs risk being biased as well.

It is important to remember that what we are calling “AI” in this instance is still just machine learning (ML). Very advanced, but still reliant on existing data being fed in (and that data being correct) and outcomes (and output) determined by algorithms interpreting that data based on parameters rather than truly cognitive, independent AI decision-making.

Nonetheless, most of us have been taking advice from technology for years. The internet is riddled with stories of people who have unquestioningly followed the directions of their satnav devices for years and ended up down blind alleys or in lakes. Years ago, I watched a semi-truck in front of me turn down a country road that I knew he would find impassable, so I took a different route and watched the story on the news that evening of the truck driver who had blindly and unquestionably taken his satnav’s advice. The editor of ISC2 Insights has his own sat nav story, when an early portable sat nav tried to tell him to take an immediate right turn – at roughly the halfway point across the Bay Bridge in San Francisco. Suffice to say he trusted the view out of the car’s windscreen more than the sat nav’s directions on that occasion.

Human Error

It must also be borne in mind that the way humans use AI assistants actually contributes to the likelihood of them doing or saying incorrect things. That is, the more complex the question, the more variables it has to use, the more data is involved. Hence, the greater the number of possibilities to introduce inaccuracy. If we are reading a recipe and we ask Siri to tell us what 390° Fahrenheit is in Celsius, we can be pretty sure that the answer (189.89°) is right. These days we are reasonably confident about what our satnavs tell us, too (and, incidentally, satnav software has existed in the mainstream for a great deal longer than AI assistants – shortest-path algorithms are deterministic and codable without AI and have been with us since the middle of the 20th century).

The fact is, however, that when anyone – or anything – tells us something, we invariably do a mental sanity check (sometimes consciously, though sometimes we don’t even realise we’re doing it) on the answer. In our temperature example our brain does a quick computation and feels comfortable that the Fahrenheit number is roughly double the Celsius number. If we’re in Los Angeles and the satnav points us roughly north on a route to San Franciso, that feels kind of right. Also, if we are British and were around in the 1980s, or Americans who remember the 1950s, we know perfectly well that the title of “Green Door” appears in the lyrics a non-zero number of times.

For more complex questions, though, we increasingly rely on the technology to get the answer right as we don’t necessarily have the capacity or the knowledge to do a reliable sanity check. We can detect anomalies in some cases – particularly when the response is so obviously nonsense that we can’t fail to spot the error – but the more nuanced the problems, the harder they are to spot.

Benefit vs Risk

On balance, the value that AI assistants bring can outweigh the risk of them getting it wrong – depending on the environment and application of course.

We now have an irrevocable need for AI: we have to use it for good, because so many people are using it for bad. In the cybersecurity world in which we work, if we don’t use AI we simply won’t be able to keep up with the bad actors who are using it to attack us. Sometimes, it will detect false positives and block traffic that should not be blocked, but so long as it detects real anomalies most of the time and acts as we would like it to, it will be much better at doing the job than much slower (and probably equally fallible) human brains.

On the plus side, our AI cybersecurity tools are unlikely to drive us into the path of a fire truck.

* If you’re wondering:

  1. Queen: Bohemian Rhapsody
  2. Nirvana: Smells Like Teen Spirit
  3. Red Hot Chilli Peppers: Under the Bridge
  4. The Beatles: Norwegian Wood (This Bird Has Flown)
  5. Led Zeppelin: Black Dog

Related Insights