New EDPB opinion (Opinion 28/2024) on the processing of personal data in AI models
Introduction
The rapid development and use of artificial intelligence (AI) raises fundamental data protection issues. The question of whether AI models process personal data has not yet been uniformly clarified. For example, the Hamburg Commissioner for Data Protection and Freedom of Information in his Discussion paper Large Languale Models und personenbezogene Daten took the view that no personal data is stored in Large Language Models (LLMs) due to the tokenization of text fragments and the lack of identifiers for a targeted and individual assignment of information. The European Data Protection Board (EDPB) does not assume automatic anonymity in this regard and has now published current guidelines for data protection authorities and companies in its Opinion 28/2024 in order to address the data protection challenges in the development and use of AI models.
Background
The European Data Protection Board (EDPB) is an independent EU body made up of representatives of the national data protection authorities and the European Data Protection Supervisor. Its goal is to ensure the uniform application of the General Data Protection Regulation (GDPR) in the EU and to promote cooperation between the national data protection authorities.
The Opinion 28/2024 zur Verarbeitung personenbezogener Daten in der Entwicklung und im Einsatz von KI-Modellen was preceded by a request from the Irish data protection supervisory authority under Art. 64 (2) GDPR. According to this provision, supervisory authorities may request an opinion from the EDPB on issues of general importance, such as the processing of personal data in AI models, in order to ensure a uniform interpretation of the GDPR in all Member States.
Key statements of the Opinion 28/2024
Anonymization: AI models that have been trained with personal data cannot automatically be considered anonymous. Anonymization must be strictly verified by excluding the possibility of inference to individuals through attacks (e.g. membership inference attacks). Companies must provide comprehensive technical proof of anonymization.
Legitimate interest when processing training data as part of development: In principle, the processing of training data as part of the development of the AI model may be in the legitimate interest of the developing company. If this legal basis is relied upon, a three-stage check must be carried out and documented:
- The legitimate interest must be identified (e.g. improvement of AI functionality)
- The necessity of processing personal data must be proven. For example, there must be no less restrictive means.
- The rights and interests of the controller and the data subject must be carefully balanced. The interest of the data subject in the exclusion of processing must not outweigh this.
Unlawful data processing: Data that has been processed without a legal basis can subsequently have far-reaching consequences, including the unlawfulness of the entire model. Data protection authorities can order measures such as deletion or rectification of the model.
Recommendations
The EDPB statement provides guidance for companies that develop or use AI technologies. Companies should take the guidelines into account in order to minimize legal risks.
- Companies and developers of AI models should provide comprehensive evidence of anonymization procedures and data protection-friendly design (privacy by design).
- A data protection impact assessment (DPIA) should be carried out before developing and using an AI model.
- Data subjects should be informed clearly and comprehensibly about the processing of their data.
- Measures such as the pseudonymization of data, the filtering of sensitive data and opt-out options for data subjects should be prioritized.
Photo from Claudio Schwarz auf Unsplash