With any new technology, expect backlash – one form of which was delivered yesterday when Italy issued a temporary ban on the use of ChatGPT, stating they would begin an investigation of parent company OpenAI “with immediate effect.” As of this writing, it’s ChatGPT and only ChatGPT that they are focusing on and not Google’s Bard or any of the other large language models (LLMs) out there, so, for those wondering if ChatGPT really had a first-mover advantage, well, this is kind of the opposite of that.
The concerns focus on the potential for General Data Protection Regulation (GDPR) privacy violations related to “the mass collection and storage of personal data for the purpose of ‘training’ the algorithms underlying the operation of the platform,” as well as the absence of age verification. It sounds like the credit card breach at OpenAI a few weeks ago didn’t help either.
The internet, of course, had jokes:
The LLM’s “personal preference” feelings about pineapple on pizza aside (personally, I hate it if you want to add that data point, but you do you), OpenAI has stated that they feel the training given to their model was compliant, and not focused on personal data.
For me, this one hits close to home as I’ve been working with a company that has a large Italian customer base to implement AI-based tools that help their customers deliver personalized recommendations and service to their customers as well as their customers’ customers.
Here are four major compliance hurdles we’d already identified:
- Data collection and minimization: AI systems should only collect and process the minimum amount of personal data necessary for the specified purpose. Clearly not all parameters are personal parameters, but this may open up more interest in how to create an effective model on the minimum number of parameters, rather than piling in as many parameters as possible into general systems. As an example of how this can differ: OpenAI has stated that its GPT-3 version was trained on 175 billion parameters, and GPT-4 on 170 trillion parameters. If you’ve used both, you know there are many things for which GPT-4 is clearly better, and others where, honestly, they both do just fine. Competitors like Databricks have created competent models trained on only 6 billion parameters. So ask yourself, is bigger really better for any given use case?
- Consent and transparency: For many of us, we’re working on solutions that will use customer data. Companies must provide clear information about how AI systems process personal data and obtain explicit consent from users. This includes explaining the purposes of data processing and the rights users have in relation to their data.
- Data storage and security: Especially with a third-party API, do you know where your data is going? Where or if it’s being stored? Companies must ensure that they have appropriate security measures in place to protect personal data from unauthorized access, loss, or destruction.
- Automated decision-making: AI systems may make decisions that have a significant impact on individuals, such as personalized recommendations or targeted marketing. These decisions must comply with privacy regulations and provide individuals the right to contest the decision or seek human intervention.
Companies can do a lot to ensure their AI-powered tech is legal and ethical. A good start is to perform a thorough Data Protection Impact Assessment (DPIA) to identify potential risks and mitigation strategies. And when building any AI-driven functionality, use Privacy by Design and Privacy by Default principles, ensuring that privacy considerations are integrated into the development and implementation of AI systems from the outset. Leverage as much anonymization or pseudonymization as possible, to reduce the risks associated with processing personal data. Establish strict data access controls and security measures, including encryption and regular audits, to protect personal data from unauthorized access or breaches. You may have already been asking yourself the build vs. buy question – and this may make some at your org lean further toward “build” despite the resources required to do so. If you do choose to implement a third party tool, look for a strong track record of compliance (yes, easier said than done given how young this industry is), robust security measures, their commitment to data minimization as discussed earlier, and a strong data processing agreement.
For customers, provide clear and transparent privacy notices, informing users about how their data will be processed, stored, and used – and obtain explicit consent when required by laws like GDPR and when you feel it’s ethical to do so. And develop and implement clear processes for users to exercise their rights under law to understand or opt out of how their data is being used, including the right to access, rectify, erase, and object to the processing of their personal data, as well as the right to data portability.
The above are just a few things we’re thinking about, and there will assuredly be more to come in this area as we go. Hope y’all are enjoying these uncharted waters as much as I am.