Where do today’s privacy lawyers see problems ahead in a world full of generative AI chatbots? At ISC2 Security Congress in Nashville, a panel of legal experts shared their views: Scott Giordano (Attorney in AI, privacy, and cybersecurity, Giordano AI Law), David Patariu (privacy attorney, Venable LLP), John Bates (senior manager in the Cybersecurity Program Transformation capability, EY), and John Bandler (founder & principal, Bandler Law PLLC).

Today’s cybersecurity world faces a wide range of security challenges caused by the sudden adoption of generative artificial intelligence (AI) chatbots such as ChatGPT and Bard. One view is that these tools are still immature, are not being widely used, and perhaps shouldn’t be widely used until they have been thoroughly proven. In the view of a distinguished panel of legal experts at the ISC2 Security Congress session What ChatGPT Means for Your Infosec Program, they are not only being widely used but this will quickly start to overwhelm anyone who tries to hold back their use with old-world policies.

The threat relating to generative AI tools in terms of infosecurity are as numerous as they are unfamiliar. Nonetheless, they can be broken down into three broad areas – the training data fed to the machine learning (ML) system, (including how it’s been collected and whether it is reliable), the ML model itself and how it was trained, and finally the chatbot prompt itself and how this might be manipulated to reveal something undesirable or simply used naively.

Today, chatbot risk is often understood in terms of adversarial attacks on the ML systems or their data or software. However, it also extends to the issue of generating false content (e.g., deepfakes), misusing systems (creating polymorphic malware), conducting social engineering attacks based on different forms of impersonation, finding and exploiting novel vulnerabilities, and conducting massive data scraping of sensitive content.

These possibilities are not entirely hypothetical, with surveys of cybersecurity professionals recently recording a rise in detected incidents that were aided in some way by chatbots. Indeed, the growing prevalence of deepfakes and social engineering point to a coming crisis in authentication and identity - how we know someone is who they say they are – that the world seems under-prepared for.

Walking into trouble

“You can’t tell people not to do this,” said EY’s John Bates. “That is not the answer, emphatically. That equals shadow IT.” In fact, people are using these tools – specifically marketing and sales – which means the cybersecurity people are often the last to find out. According to Bates, the best model through which to approach these risks is to re-use the template from cloud security a decade ago.

An important issue is that the licensing terms governing how these tools are used vary, for example ChatGPT is not the same as Google’s Bard. This requires organizations to do more thorough due diligence on their technical as well as legal risks within a governance framework. Bates suggested that organizations might need to consider forming a special generative AI steering committee to deal with these issue drawn from legal as well as engineering and cybersecurity teams. “How do we find an efficient easy to use this [generative AI] that doesn’t blow us up?” Organizations already have the tools and processes to cope with this such as business continuity, third-party risk management, data governance, and data protection, he said. What is required is vigilance without trying to ignore or downplay the issue.

Privacy and hallucination

All four speakers drew attention to the fact that case law affecting the use of generative AI is now expanding quite rapidly with a slew of cases relating to privacy in particular. The challenge for ML systems is their need for data but this would have to accommodate the rights of end users, including copyright. At the extreme end of this was the tendency of generative AI systems to hallucinate facts (including about real people) in ways that could be inaccurate and legally risky. A lot of this risk is difficult to see, opening organizations using generative AI to unknown risks across a swathe of legal situations.

The first privacy risk concerned the company and personal data used to train the models.

Scott Giordano of Giordano AI Law described this as an “open season.” Right now, AI companies were still hiding behind the technical defense that what they are doing did not violate copyright because generative AI is based on a mathematical representation rather than copyrighted data.

“To build an LLM, you have to gather lots of data. The problem is that lots of that data is ourdata. Right now, we are in a place where unauthorized web scraping is being fought in court. At the moment, it’s a losing battle and it may require Congress to step in and say you can’t do this,” said Giordano.

Venable LLP attorney David Patariu agreed that the legal situation around company privacy was still unclear. Compounding this were more subtle second order effects, for example AI companies using data to improve their models and services, something typically written into platform agreements. Equally, this could be used to aid other customers, including a company’s rivals.

“I get that my data is my data. But what about the learning from my data and putting into your model?”

John Bandler of Bandler Law PLLC drew the audience’s attention to the wild inaccuracy of chatbots, something which has caught out lawyers. As a tool, it was saving time but at the expense of inventing facts and events.

“A lawyer went to ChatGPT looking for case law and it spat out a fictitious case that never happened. When they were called on it by the judge who thought it wasn’t right, they doubled down.”

Ultimately, what cybersecurity practitioners should not assume is that the risk of generative AI is purely about the systems themselves (their software, ML models or data) being compromised. Just as important are the legal risks connected to privacy, copyright, IP, and the abuse of hidden flaws in systems that are otherwise being legitimately accessed (e.g., APIs, data scraping). The same principles also apply to internal ML projects – the system is not a black box that can be taken as read. Someone has to check the data and models for inadvertent bias that could lead to additional risk.