Episode 2 — The AI Security Landscape

Artificial intelligence security can be understood as a distinct discipline within the wider field of information security, one that responds to assets and risks that did not exist in earlier eras of computing. At its core, AI security concerns itself with protecting the data, models, and interactions that define how intelligent systems function. Traditional systems primarily focus on securing applications, networks, and user accounts; AI introduces new elements such as training datasets, model weights, prompts, and tool connectors. These assets create novel vulnerabilities that demand equally novel protections. The scope of AI security stretches across the lifecycle: from the moment data is collected for training, through the stages of model development and deployment, and into ongoing inference and governance. To appreciate the breadth of this domain, it is useful to begin by surveying the core assets that must be defended and the threats that seek to exploit them.

The first category of assets to consider is data. Training datasets and corpora are the raw material from which models derive their capabilities. If these are poisoned or corrupted, the effects can ripple throughout the entire lifecycle, producing models that behave incorrectly or unfairly. Next are the model weights and parameters themselves—vast arrays of numbers that encode the learned patterns of the system. These weights are intellectual property, often representing immense investment in computation and expertise. Prompts and instructions form another asset class, less obvious but equally sensitive, because they guide the model’s outputs and often contain proprietary business context. Finally, external tools and connectors expand what models can do but also expand the attack surface. By understanding these as distinct assets, security professionals can begin to map the terrain of AI security, recognizing both what needs protection and where attackers might strike.

AI security differs from traditional application security in several fundamental ways. Models are unpredictable: their outputs cannot be guaranteed to follow a strict, predefined path. Whereas an application can be tested exhaustively against fixed inputs and outputs, models generate language or decisions that can vary dramatically, even when given similar prompts. Moreover, many models adapt continuously, learning from new data or user interactions, which creates moving targets for defenders. Data quality plays an outsized role; a model trained on compromised information will carry those compromises invisibly into production. Verifying correctness is also uniquely challenging, since there may not be a single “right” answer to compare against. This makes AI security as much about resilience and governance as it is about technical hardening. Recognizing these differences helps us see why simply applying old security techniques to new AI systems will not suffice.

Threats to training data exemplify this challenge. Attackers can engage in poisoning, inserting carefully crafted examples into public or internal datasets so that the model learns biased or incorrect behaviors. Sensitive records can be exposed if data collection and storage are not carefully controlled, raising privacy and compliance concerns. Even the labeling process, often outsourced or automated, can be manipulated to skew the outcomes. Unlike a software bug that can be patched after discovery, data-related compromises often persist silently. Once a poisoned dataset has shaped the weights of a model, the harmful influence is baked in, sometimes in ways that are difficult or impossible to remove without retraining. This long-term effect makes training data protection one of the most urgent priorities in AI security.

At the inference stage, new categories of threat emerge. Prompt injection attacks deliberately craft inputs to override or manipulate the model’s intended behavior, often bypassing guardrails designed to keep outputs safe or compliant. Jailbreak attempts, now common across many public systems, represent users actively trying to trick a model into producing forbidden responses. Inference can also expose sensitive information, such as leaking fragments of training data if the model was not properly sanitized. Adversarial inputs—carefully perturbed examples that exploit weaknesses in the model—can cause misclassification or unexpected behavior, undermining reliability. These threats illustrate that securing an AI system is not only about building it correctly at training time but also about protecting its live interactions once it is deployed.

Model-centric threats target the learned artifacts themselves. Theft of weights, whether through insider compromise or external breach, undermines both intellectual property and trust. Attackers can also extract models indirectly, issuing repeated queries and analyzing the responses until they reconstruct the underlying patterns—a process known as model extraction. Reverse engineering efforts may reveal sensitive architectural details, enabling further exploitation. Even without outright theft, tampering with the stored model can subtly degrade its integrity, inserting biases or vulnerabilities. The stakes here are high, as models often represent millions of dollars of training cost and are central to an organization’s strategic advantage. Protecting them requires both technical safeguards and careful governance of who has access, under what circumstances, and with what monitoring in place.

Prompts and other forms of user input might not seem like obvious security assets, yet they are increasingly recognized as critical to defend. In many organizations, prompts contain sensitive operational knowledge: the way a business frames customer interactions, or the instructions used to drive complex workflows. If exposed, these prompts can reveal trade secrets or give competitors an advantage. Risks also arise through indirect injection, where malicious content embedded in documents, emails, or websites can be fed into a model unintentionally, leading it to act on harmful instructions. Attackers can even chain multiple inputs together, manipulating one step of a process to influence the next. Because AI systems are designed to take in large volumes of text or structured data, their input surfaces are broad and varied, making it difficult to anticipate every potential manipulation. The result is a category of risk unique to the AI domain: the compromise of trust through language itself.

External tools and connectors further complicate the landscape. As models are linked with plugins, application programming interfaces, or other software components, their functionality grows—but so too does their vulnerability. A compromised external application can feed false data into a model, or misuse the privileges granted to it. Privilege escalation becomes a real concern if connectors are not tightly scoped, allowing one tool to gain unintended access to another. These are not hypothetical problems; they mirror supply chain risks seen in other parts of information security, where trust in one component leads to vulnerability across the system. By expanding the AI system’s reach, organizations expand its attack surface, creating dependencies that must be carefully managed and monitored over time.

It is useful to distinguish between safety and security when discussing AI, because the two terms are often confused. Safety refers to measures that control the content of outputs—ensuring, for example, that a chatbot does not produce offensive or harmful language. Security, by contrast, focuses on defending the system against adversarial manipulation, theft, or misuse. The two are related, but they are not the same. A system might produce safe content while still being vulnerable to model theft, or it might resist external attacks while producing unsafe content. Both dimensions matter, and both must be considered separately. In practice, they sometimes overlap, such as when a malicious prompt injection leads to unsafe output. But clarity about the distinction helps organizations assign responsibilities: content moderation belongs to one team, while system defense belongs to another.

Sector-specific examples illustrate why these risks matter so deeply. In healthcare, training data might contain sensitive patient information, creating the danger of exposure through model leakage. In finance, attackers may try to circumvent fraud detection models by crafting inputs that evade detection. Legal systems that rely on AI for document review could be misled by manipulated text, altering the interpretation of key evidence. In education, test-taking systems that use AI could be gamed by students who discover ways to prompt the model into giving away answers. These scenarios highlight that AI risks are not abstract—they affect critical services across society. The same threats play out differently depending on the sector, but the underlying vulnerabilities remain consistent, reinforcing the need for systematic, lifecycle-oriented defenses.

Defenses must therefore operate on multiple fronts. Training pipelines can be hardened through data validation, provenance tracking, and careful vetting of sources. Inference environments can be isolated, preventing attackers from using one compromised model to affect another. Output validation, such as filtering or secondary review, ensures that model responses are checked before they reach end users. Continuous monitoring provides visibility into how models behave over time, flagging anomalies that may indicate attack. No single measure is sufficient on its own, but together they form a layered defense. This mirrors the defense-in-depth principle long valued in cybersecurity, adapted here to the specific realities of AI. The key is recognizing that every stage of the lifecycle, from data collection to live deployment, presents both opportunities for attack and levers for defense.

Finally, standards are beginning to emerge that give structure to these efforts. While the field is still young, organizations and governments have issued early guidelines for AI security, many of them inspired by established cybersecurity frameworks. These emphasize lifecycle coverage, ensuring that attention is paid not only to deployment but also to the earlier and later phases of development. Industry baselines help organizations avoid reinventing the wheel, while also promoting interoperability and trust. As with earlier eras of cybersecurity, the presence of common frameworks allows different teams to coordinate around shared practices. Over time, these standards will likely evolve into more formalized requirements, but even now they serve as a vital foundation for anyone seeking to secure AI systems responsibly.

For more cyber related content and books, please check out cyber author dot me. Also, there are other prepcasts on Cybersecurity and more at Bare Metal Cyber dot com.

Red teaming has emerged as a cornerstone of AI security practice, providing a structured way to test the resilience of models before adversaries do. In this context, red teaming involves crafting adversarial prompts or scenarios specifically designed to reveal weaknesses. By intentionally trying to bypass safeguards or provoke unsafe outputs, security teams can better understand how their systems might be misused. The process is iterative: findings from red teaming exercises feed back into model adjustments, defense improvements, and governance updates. Much as penetration testing has long been used to probe traditional applications, red teaming provides a disciplined approach to uncovering blind spots in AI. The goal is not only to expose vulnerabilities but to learn from them, using them as stepping stones toward stronger, more resilient systems.

Alongside testing, monitoring and telemetry play critical roles in sustaining defenses over time. Every interaction with a model generates signals that can be logged and analyzed. By tracking usage patterns, organizations can detect anomalies, such as sudden spikes in unusual prompts or attempts to access sensitive information. Outputs themselves can be monitored for unexpected behaviors, and access logs can reveal suspicious patterns of activity. These signals can be integrated into broader security information and event management systems, allowing AI-specific risks to be seen in the context of enterprise-wide threats. Telemetry thus acts as a nervous system for AI security: without it, defenders are effectively blind to how their models are being used or abused in practice.

Access control is another foundational measure. Models should not be available to anyone without oversight; instead, user authentication ensures that only authorized individuals can interact with them. Service-to-service authorization further limits how different components connect, reducing the chance that a compromised element can be exploited to reach the model. Application programming interface keys must be handled carefully, as their exposure can provide attackers with direct access. The principle of least privilege, long a staple of security, applies here as well: give each user, tool, or process only the access required, and no more. By limiting the pathways into the system, organizations shrink the potential space an attacker can exploit.

Encryption reinforces these boundaries by protecting the confidentiality of data and models. Training data stored at rest should be encrypted to prevent exposure in case of breach. Model checkpoints, which store the trained weights, are themselves valuable intellectual property and must be protected with the same rigor as source code or sensitive databases. Traffic during inference, where prompts and outputs flow across networks, should also be encrypted to prevent eavesdropping or tampering. Advanced environments may even use trusted execution technologies, ensuring that computations occur in secure enclaves where data cannot be extracted by external processes. Together, these measures help build a trustworthy environment, reassuring both developers and users that sensitive AI assets are shielded from casual and determined attacks alike.

Another emerging best practice is model provenance, which refers to the careful tracking of how a model came to be. This includes documenting the origin of training data, the specific processes used in fine-tuning, and the version history of the model itself. Provenance is crucial not only for reproducibility but also for accountability. If something goes wrong, organizations must be able to trace back through the lifecycle to determine whether the issue arose from poisoned data, a faulty fine-tuning process, or a deployment misstep. Without provenance, such investigations are guesswork. With it, they become structured and actionable. In regulated industries, provenance may even become a compliance requirement, ensuring that AI systems meet standards of transparency and control.

Incident examples help illustrate how these threats manifest in reality. Chatbots have been manipulated through prompt injection, tricked into revealing sensitive information or bypassing their safety filters. Datasets have been deliberately poisoned, embedding bias into models that then propagate unfair outcomes. Stolen weights have appeared for sale online, undercutting the competitive advantage of their creators. Adversarial images, subtly modified in ways imperceptible to humans, have fooled classifiers into making dangerous misjudgments. These cases demonstrate that AI security is not theoretical—it is already being tested in practice, with real consequences. Learning from such incidents strengthens the argument for proactive defense, reminding us that the landscape is not static but actively contested.

Evaluation pipelines are one way organizations can defend against these evolving threats. Before deployment, models should undergo rigorous testing, not only for accuracy but also for robustness. Benchmarks can include known adversarial attack sets, probing how the model responds under hostile conditions. Validation continues after deployment, with regression checks ensuring that changes do not reintroduce past vulnerabilities. Continuous evaluation is necessary because threats evolve: a model safe today may be vulnerable tomorrow if attackers discover new techniques. By institutionalizing testing and evaluation as part of the lifecycle, organizations create a feedback loop where security is constantly measured and improved rather than treated as a one-time hurdle.

Integration with cloud security principles provides another layer of resilience. Many models are hosted on cloud platforms, which offer both opportunities and risks. On the one hand, providers supply built-in guardrails, such as access controls and monitoring tools. On the other, organizations remain responsible for how they configure and use these environments. Shared responsibility is a key concept: the cloud provider secures the infrastructure, but the customer must secure their data, their models, and their access pathways. Monitoring consumption patterns is also important, since unusual usage may indicate attempted extraction or abuse. By aligning AI security practices with established cloud frameworks, teams can leverage existing strengths while adapting them to new challenges.

Underlying all these practices is the foundational concept of trust boundaries. Every AI system is composed of parts—data sources, training pipelines, models, inference interfaces, and external tools. At each handoff between these parts, control shifts. Some boundaries are internal, such as between a development team and a deployment team; others are external, such as between a vendor and a customer. These points of transition are natural weak spots, where assumptions may break down and attackers may insert themselves. Mapping trust boundaries clarifies where defenses must be strongest and where oversight must be explicit. Understanding these boundaries is not just an academic exercise but a practical one: it directs limited security resources to the places of highest leverage.

Trust boundaries also prepare us for the next stage of this journey. In the following episode, we will explore system architecture in greater detail, examining how AI systems are constructed and where their internal and external boundaries lie. This progression is intentional: having first mapped the broad landscape, we now move inward to understand the structures that support it. By recognizing how architecture shapes security, you will be better equipped to analyze where vulnerabilities cluster and how they can be mitigated through thoughtful design. The linkage between today’s broad survey and tomorrow’s deep dive illustrates the layering approach of this PrepCast—each step building on the last, each concept preparing the ground for what follows.

The summary of this episode returns us to the central definition of AI security: the discipline focused on protecting the unique assets of intelligent systems across their entire lifecycle. We explored how those assets—data, models, prompts, and tools—introduce novel risks, and how those risks differ from traditional application security. We examined threats at both training and inference, as well as model-centric and prompt-related attacks. Defensive measures, from provenance to encryption to monitoring, were highlighted as part of a layered approach. By situating AI security within this landscape, you now have a working map of the field. That map will serve as the reference point for all subsequent episodes, ensuring that no matter how detailed we become, the larger picture remains visible.

As you step forward, remember that AI security is not simply a checklist of controls but a mindset of vigilance and adaptation. The threats are dynamic, the systems are complex, and the stakes are high. Yet with structured approaches, layered defenses, and an understanding of where the boundaries lie, these challenges can be managed effectively. This PrepCast is designed to equip you with both the vocabulary and the confidence to engage, whether in exams, in professional discussions, or in leadership contexts. With the landscape now surveyed, we are ready to delve deeper into the architecture of AI systems and see how trust, control, and vulnerability are built into their very design.

Episode 2 — The AI Security Landscape
Broadcast by