Ed. Note: This is the second article in our series, Smarter AI Vendor Contracts: Legal Strategies for Protecting Your Business. Read Part 1.
Contracting with AI vendors requires a clear-eyed understanding of the risks at stake.
This article catalogs the principal categories of risk that corporate counsel should evaluate, with concrete examples and an eye toward the contract levers that can mitigate each one.
Data Governance and Confidentiality Risks
Data is the fuel for AI systems, and the ways in which vendors ingest, retain, and use data create risks that go well beyond traditional data processing concerns.
The ingestion of proprietary or personal data into an AI system is the threshold issue. Due to the autonomous nature of AI systems, it may not be clear how data is combined with other data or where it is stored. Trying to identify and remove a specific piece of data becomes a hunt for a needle in a stack of needles. Once data enters a model’s training pipeline, it may be impossible to extract, and the customer may lose practical control over it.
Unintended retention, where a vendor’s system stores data beyond the contractually permitted period, or where data persists in model weights, is a risk that conventional deletion provisions or certification requirements in agreements may not adequately address.
Model training on client content is perhaps the most commercially significant data governance issue. If a vendor uses Customer A’s data to improve a model that serves Customer B, the value of Customer A’s data has been extracted without compensation or consent. If Customer A’s data is confidential, proprietary, or sensitive in nature, the risks are compounded as disclosure to another user may result in a breach of high-value, confidential, or proprietary information, resulting in operational or contractual harms. Similarly, personal information, protected health information, or financial information of one customer being included in output for another customer can result in a data breach under state and federal data privacy laws, which may result in regulatory obligations, investigations, and fines, as well as reputational harm.
Cross-border transfers add regulatory complexity, particularly where data moves through model infrastructure located in jurisdictions with different privacy regimes. Additionally, based on the data protection standards in certain jurisdictions, data stored or traveling through such jurisdictions may be at risk of unauthorized access or interception.
Intellectual Property Risks
AI creates intellectual property (“IP”) risks on both the input and output sides of the equation.
On the input side, training data provenance is a foundational concern. If a vendor’s model was trained on copyrighted works, trade secrets, or other protected content without authorization, the customer may face downstream claims for infringement of third-party intellectual property rights. Potential infringement or misappropriation claims can arise not because the customer did anything wrong, but because the tool it purchased was built on a flawed foundation.
On the output side, ownership and originality are contested questions. Under current U.S. copyright law, purely AI-generated works may not be eligible for copyright protection, which affects the customer’s ability to enforce rights in outputs. There may also be a contested claim of ownership of outputs between the customer and the vendor. The vendor may want ownership rights to the output to be able to use it to train the AI model or for any other purpose it desires.
Open-source components embedded in models can introduce risks in two ways. First, open source components may be subject to licensing obligations and restrictions with respect to usage, modification, and distribution that are required to be attached to the outputs created by the model. Second, by virtue of open-source components being readily available to the public, threat actors may be able to study the components and exploit vulnerabilities.
Additionally, prompts that request an AI tool to generate outputs that replicate or resemble third-party works (e.g., “prepare a lullaby in the voice of Paul McCartney”), whether through memorization or stylistic mimicry, create infringement exposure that is difficult to detect and expensive to litigate. This risk is compounded when AI tools are trained on content from a specific and small subset of sources. Based on limited training data, it is likely that any output will replicate or resemble elements of one of those few sources.
The financial consequences of IP claims are significant: defense costs, settlements, and injunction risk, where AI outputs incorporate or replicate third-party content, can disrupt business operations and create substantial unbudgeted liabilities.
Privacy and Security Risks
Privacy and security risks in AI engagements extend beyond the familiar terrain of a data breach(which we covered above in discussing confidentiality risks).
Processing personal data without a proper legal basis is a threshold compliance issue, particularly under regimes like the European Union’s General Data Protection Regulation (GDPR), state comprehensive privacy laws, and sector-specific regulations. Due to autonomous processing of data and novel combinations of training data by AI models, it is often not clear exactly how an AI tool is processing data. If a company discloses and receives consent to use information of its users under certain legal bases, but the AI tool it employs, unbeknownst to the company, uses data in a materially different manner, such processing of data may be conducted without a proper legal basis, resulting in a violation of one or more data privacy laws.
Multimodal AI tools (i.e., AI tools that can simultaneously process and create various types of output, such as text, images, or videos) that process biometric or voice data introduce additional regulatory requirements and heightened consumer sensitivity. It is easy to imagine how such a tool may be exploited to subvert biometric data collected for one purpose (such as processing voice data to create a summary of a meeting) to then perform a different, potentially harmful purpose (such as speaking a prompt in the voice of that individual).
The security posture of the vendor itself warrants scrutiny. Model inversion and data extraction attacks, where adversaries reverse-engineer a model to recover training data, represent AI-specific threats that conventional security assessments may not evaluate. Incident response plans must account for scenarios that are unique to AI, such as prompt injection attacks or data poisoning.
When privacy or security events occur, the downstream consequences include notification and remediation costs, regulatory investigations, penalties and fines, and class action exposure tied to data leakage or improper processing. These costs can dwarf the value of the underlying vendor contract.
Accuracy, Reliability, and Bias
The outputs of AI systems are probabilistic, not deterministic, and the consequences of inaccuracy can be severe.
“Hallucinations,” confident but fabricated outputs, are a well-documented limitation of large language models. AI is often compared to a toddler; it has a limited set of information. When it is asked a question that it cannot answer with the information it has, it still wants to please you, so it very confidently makes up an answer that it thinks you want. This artifact has come to prominence due to citations and references to source material that support the user’s position perfectly, but turn out to be false upon scrutiny.
Discriminatory outputs and fairness concerns arise when models reflect or amplify biases present in their training data. Due to the training data, an AI model may unintentionally learn that certain qualities or characteristics are preferred (e.g., male candidates are preferred for a position because, historically, the position has been a male-dominated position and the resumes it learned from belonged to men). As in our example, if the preference manifests itself in a manner that results in discriminatory actions against a protected class, the company may find itself the target of regulatory actions. Even if the output of the AI tool does not rise to the level of regulatory action, it may operationally harm the company by closing options that the company did not know the tool was closing. Reputational harm may also follow if it appears the company is biased against certain groups or individuals.
Additionally, explainability limits make it difficult to understand why a model has reached a particular conclusion, which is especially problematic in regulated sectors. Sector-specific compliance pitfalls are acute in healthcare, financial services, and employment, where AI-informed decisions can trigger regulatory obligations and liability exposure. Explainability limits may also impair a company in its daily operations or in connection with a legal or regulatory matter if users cannot explain why certain actions were taken.
Direct damages from defective outputs include reliance damages, rework costs, project delays, and business interruption from model outages or degraded performance.
In high-stakes use cases, such as employment decisions, credit decisions, healthcare decisions, professional and product liability analogs come into play.
Negligence or defective service theories may apply if AI outputs cause harm, and the allocation of liability between the vendor, the customer, and the end user is a question the contract must address head-on.
Regulatory and Compliance Risks
The regulatory landscape for AI is evolving rapidly and unevenly.
At the federal level, agencies including the Federal Trade Commission (FTC), Equal Employment Opportunity Commission (EEOC), and sector-specific regulators are issuing guidance and taking enforcement actions related to AI use. State legislatures are enacting AI-specific legislation at an accelerating pace. Consumer protection and unfair or deceptive practices frameworks are being applied to AI outputs and marketing claims. Recordkeeping and audit expectations are increasing as regulators demand transparency into how AI systems are used and governed.
Internationally, regimes affecting data and AI use in global deployments add layers of compliance complexity. The EU AI Act, in particular, imposes obligations that cascade through the vendor-customer relationship.
Regulatory and contractual penalties can arise from breach of customer contracts due to AI misuse, downstream indemnity claims from the customer’s own clients, and audit findings that reveal gaps in compliance. These risks underscore the importance of contractual provisions that require the vendor to cooperate with regulatory inquiries and maintain documentation sufficient to demonstrate compliance.
Operational Risks
Operational risks in AI engagements are often underappreciated until they materialize.
Service instability in AI platforms can result from infrastructure constraints, model updates, or capacity limitations that differ from traditional SaaS availability issues. Model drift, the gradual degradation of model performance as the data landscape changes, can erode the value of the engagement without triggering a clear breach. Given the constant ingestion of data by AI tools, model drift can result in an AI tool behaving very differently a few months after engagement than when it was first evaluated.
Over-reliance on AI outputs without human review is an organizational risk that contracts alone cannot solve, but can help manage through requirements for human-in-the-loop controls. Dependency on third-party sub-processors, including upstream model providers, hosting providers, annotators, and content filters, creates a chain of risk that the contract must address through flow-down obligations and transparency requirements. Business continuity concerns are amplified when the customer has integrated an AI tool deeply into its operations and the vendor experiences a service failure, a model recall, or a going-concern event.
Ethical and Reputational Risks
Ethical and reputational risks round out the risk landscape. Misuse of AI tools, harmful content generation, infringement of third-party intellectual property rights, content safety failures, leaks of confidential or proprietary information, personal data breaches, and misalignment with corporate AI principles can all produce reputational harm that exceeds the direct financial impact of such incidents.
Reputational harm and consequential losses, including loss of business expectancy, media fallout, and customer churn following publicized AI failures, are among the most difficult damages to quantify and among the most important to prevent.
A Risk-Based Approach to Contract Protections
When evaluating or negotiating AI vendor agreements, a risk-based approach is the most effective framework. Contractual protections should be calibrated to data sensitivity, use criticality, and deployment context. Additionally, non-negotiable terms, such as: no training on customer data by default, meaningful IP and data breach indemnities, transparency on subprocessors, and deletion and portability rights at exit, should be identified and established early.
At the same time, an organization should engage in a process of pilot testing and staged acceptance to validate the performance of the AI tool before enterprise rollout. A phased approach allows the organization to test the vendor’s claims, evaluate the model’s behavior with real-world data, and negotiate from a position of informed confidence.
Every risk identified in this article can be tied to a specific contract lever. The articles that follow will show how.