The growth in use and demand for Large Language Models (LLMs) is at the heart of current AI development. As John Cvetko, CISSP explains, developing and training an LLM is a delicate process that can be undermined from the outset by data poisoning and other cybersecurity threats.

John Cvetko, CISSPThe training approach for Large Language Models (LLMs) is predominantly shaped by the model’s intended purpose and user requirements. A general-use public system, for example, differs significantly from a model collaboratively developed by multiple companies for industry-specific tasks. To address these varying needs, designers may deploy different training configurations: non-federated, federated, or decentralized. However, each configuration introduces its own unique vulnerabilities, which require careful consideration to maintain model integrity and security. It’s crucial to align the chosen configuration with the intended purpose while keeping in mind the potential security implications.

Training 101

The training process of LLMs typically involves four stages, each focusing on refining the model's understanding and behavior.

  • Data Preparation – this is the initial phase, where large datasets are collected, cleaned and formatted for training. Ensuring data quality is crucial here, as any overlooked biases or erroneous information could be embedded in the model.
  • Pre-Training Stage – this follows data preparation, during which the model undergoes unsupervised training on vast amounts of raw text data to learn patterns, grammar and context. This general understanding lays the groundwork for fine-tuning the model,
  • Fine-Tuning – where the pre-trained model is then trained on a smaller, curated dataset with predefined relationships provided through labeled data, allowing it to specialize in specific tasks or domains and align its responses with desired outcomes.
  • Reinforcement Learning from Human Feedback (RLHF) – this final stage refines the model’s responses further based on human evaluators’ feedback.

Accommodating Business Needs

Business needs can dictate how LLMs handle and secure data during training. The decision to train an LLM with a single organization or among multiple business partners significantly influences its attack surface. When a single entity manages training, there’s centralized control but also a concentrated risk. However, when multiple business partners collaborate, distributing control can introduce additional security challenges, as the shared model’s updates and communications expand the network’s vulnerabilities. The choice of configuration, therefore, becomes a balance between business goals and security considerations.

In non-federated (local) training, the approach is centralized, focusing on a single organization managing both the data and models within their infrastructure. This setup allows for strict control over data quality, security and compliance, which is beneficial for organizations handling sensitive or proprietary information. However, this centralized control means that multiple business partners wouldn't directly share in the training process, as all data and training operations are confined within a single entity's domain. The lack of shared management also creates a single point of failure, increasing vulnerability to targeted attacks or breaches.

Federated learning enhances privacy by keeping data localized on multiple servers (nodes) in a network. Instead of sharing raw data, nodes only send model updates – such as gradients – to a central node for aggregation and redistribution. While this reduces data exposure for each business partner, it introduces risks like model update poisoning, where compromised nodes can corrupt the shared model.

Decentralized learning employs a peer-to-peer network without a central node. Nodes communicate directly, sharing model updates and coordinating to reach a consensus. This setup offers enhanced privacy and resilience but relies heavily on consensus algorithms to maintain model integrity. In these environments, attacks like Byzantine attacks – where compromised nodes send malicious updates – can undermine the entire network, making robust fault-tolerant mechanisms essential.

The individual security hygiene of each business partner directly impacts the attack surface in both federated and decentralized learning environments. Poor security practices by one partner can lead to compromised nodes, allowing attackers to inject faulty updates that affect the entire shared model. Therefore, maintaining strong security hygiene across all partners is not just recommended but a critical requirement for collaborative LLMs to avoid expanding the attack surface.

Targets

Data poisoning typically occurs during the Data Preparation phase of training, where attackers can introduce malicious data. Labels in a dataset are key pieces of information that define the correct categorization or identification of data points, guiding the model’s learning process. Label flipping is a type of data poisoning that involves deliberately altering these labels to mislead the model. As these attacks happen before the model starts learning, they undermine the integrity of the data at the lowest level, compromising accuracy required in subsequent stages like pre-training and fine-tuning, embedding errors that persist throughout the model’s lifecycle.

Model-targeted attacks primarily occur during the Fine-Tuning Stage in federated learning and decentralized environments. In federated learning, attackers can introduce malicious updates into the aggregated model, corrupting the global model's learning process. One such attack is a Byzantine attack, where compromised clients send intentionally faulty updates that mislead or contaminate the global model. This occurs during the aggregation of model updates phase, where the central server (or in decentralized cases, peer-to-peer nodes) integrates updates from multiple nodes. Fine-tuning is particularly vulnerable because smaller datasets are often utilized, making it easier for attackers to introduce biases or harmful behaviors without immediate detection.

Identify and Protect

Anomaly detection is a complex task that requires sophisticated statistical models or machine learning techniques to identify unexpected patterns or irregularities in training data – something too challenging for humans to consistently test and validate manually. During the Pre-Training and Fine-Tuning Stages, regular validation and verification of model performance on separate datasets will help detect discrepancies caused by data or model-targeted attacks. In federated learning configurations, secure aggregation methods and node reputation systems build layers of trust during model updates, while adversarial training simulates attacks to increase model resilience. In decentralized setups, Byzantine fault-tolerant algorithms ensure model integrity even when some nodes are compromised, effectively mitigating threats during the aggregation phase.

A Constantly Evolving Situation

As LLMs continue to evolve, each training configuration presents unique vulnerabilities that require proactive vigilance from security practitioners. A commitment to continuous learning and improvement, combined with advanced detection and secure practices, is essential to ensuring that LLMs remain robust against evolving risks. By staying aware of emerging technologies and adapting defenses accordingly, security professionals can better safeguard these systems against new threats and maintain the integrity and resilience of LLMs.

John Cvetko, CISSP has over 20 years of experience developing and implementing enterprise network and application solutions with a strong focus on security, leveraging industry best practices. His recent work includes researching and deploying innovative technologies, such as generative AI, to enterprise clients with a strong focus on prioritizing secure design and risk mitigation.

Related Insights