โ† Back to blog
AI

AI risks: copyright and intellectual property of generated code

When an LLM generates code, who owns it? Can it reproduce code under a protected license? These questions are not theoretical โ€” they engage the legal responsibility of your teams.

Code generation tools like GitHub Copilot, Claude or GPT-4 were trained on massive quantities of public code, including code under GPL, MIT, Apache and proprietary licenses. The legal question has become urgent: if an LLM reproduces a snippet of GPL-licensed code into your commercial codebase, you are in violation. And you may not even know it.

Studies have shown that LLMs can reproduce memorized code snippets from training, particularly for widely used code (sorting algorithms, popular snippets). GitHub Copilot includes a duplicate code detection filter that flags snippets too similar to indexed code โ€” enable it. But this filter is not infallible and does not cover all models.

Reasonable due diligence today: use license detection tools on your codebase (FOSSA, Black Duck, Licensee), specifically audit AI-generated code for critical parts, and establish a clear company policy on AI tool usage. Some organizations ban Copilot for proprietary code, others accept it with enhanced review processes. The absence of a policy is itself a legal risk.

  • Enable the duplicate code detection filter in Copilot
  • Scan your codebase with FOSSA or equivalent
  • Establish a company policy on AI tool usage
  • Be especially vigilant on complex algorithmic code

Have a project in mind?

Let's talk about your challenges and see how Gotan can help.

Contact us