Published on January 16, 2024, 7:27 am
Generative AI training methods have sparked controversy and legal concerns surrounding data privacy. The Information Commissioner’s Office (ICO) is set to examine the legality of these training methods due to worries about the use of personal data.
In recent months, AI training methods, particularly related to large language models like ChatGPT, have drawn attention. These models are typically developed using vast amounts of data collected through web scraping. While this approach has its benefits, it has raised concerns regarding data privacy and potential copyright violations.
The ICO acknowledges that there is a need for greater clarity on how data protection laws apply to the development and use of generative AI. This includes determining the appropriate lawful basis for training these models and understanding how the purpose limitation principle applies in this context.
Questions also linger about complying with the accuracy principle and meeting expectations regarding data subject rights. To address these concerns, the ICO plans to release guidance on its position regarding generative AI training methods and their compatibility with UK GDPR and the Data Protection Act (2018).
Stephen Almond, the executive director for regulatory risk at the ICO, highlights that generative AI has transformative potential if it is developed and deployed responsibly. The ICO aims to provide certainty to industry stakeholders about their obligations while safeguarding individuals’ information rights and freedoms.
Under UK GDPR, data processing must have a legitimate and necessary purpose that does not override an individual’s interests. The ICO believes that legitimate interests can serve as a valid lawful basis for training generative AI models using web-scraped data, as long as developers pass a three-part test demonstrating their specific purpose and use of the model.
Regarding necessity, most generative AI training currently relies on large-scale scraping of data. However, complexity arises when considering whether these models are deployed by the initial developer or a third party through an API or provided to third parties.
As part of its investigation, the ICO plans to engage with various stakeholders in the technology industry, including generative AI developers and users, legal advisors, consultants, civil society groups, and public bodies interested in generative AI.
The ICO’s first consultation on this matter is open until March 1st, with additional consultations scheduled throughout the first half of the year to address issues such as the accuracy of generative AI outputs.
While generative AI training has become a legal minefield due to copyright disputes, the ICO’s examination aims to provide much-needed clarity and guidelines for responsible development and deployment of these powerful technologies.