LLMs are trained on troves of personal data crawled from the internet, including addresses, phone numbers, and email addresses. The MIT Technology Review warns of a “ticking time bomb” for privacy online, which opens up a plethora of security and legal risks. 

Gary Marcus also warns of a “darker consequence” of AI. He surmises that once investors realize the limitations of generative AI, companies like OpenAI will inevitably turn to surveillance business models in order to monetize their most valuable asset  - user data. This underlying market incentive is troubling, particularly given recent trends toward collecting even more sensitive user data. 

  • Companies are encouraging people to divulge sensitive personal information to AI platforms like Meta AI, ChatGPT, or xAI’s Grok. The conversational design of chatbots works to draw out private information from users. AI developers are also making the case that increasing personalization, which requires access to sensitive user data, will make AI tools more appealing by providing individualized information and support. 

  • OpenAI is expanding its product offerings to reach greater amounts of user data. Subscribers to ChatGPT Team, Enterprise, or Edu can now connect ChatGPT to their Google Drive, Dropbox, Box, Sharepoint, and OneDrive accounts. Once connected, users can query ChatGPT for answers about their stored spreadsheets and documents. OpenAI also announced it will be launching a device, which has been described as an “AI-powered surveillance companion.”

  • Meta is declining to let users opt out of having their data included in training data sets; the company retains the right to use any data shared publicly to Facebook and Instagram to train its AI systems. Facebook is also now asking for access to users’ camera rolls so Meta AI can offer suggested edits to photos that have not yet been uploaded to any Meta platform. If a user approves the request, they give Facebook permission to upload media from their camera to Meta servers on an “ongoing basis.” Meta AI has also come under fire for a design flaw that has reportedly left many users unaware that they are publicly sharing their search queries. Home addresses, sensitive court details, and other private information have been visible in publicly available searches.

  • Outside of the tech sector, the banking industry has been developing a range of AI-enabled tools to collect customer information. Large American banks like JP Morgan Chase and European ones like BBVA have built in-house AI research centers led by leading computer scientists. AI tools that allow banks to personalize their services on the basis of customer data raise privacy and discrimination concerns. 

  • Amazon is eyeing delivering ads to users during their conversations with its AI-powered digital assistant, Alexa+. The AI-generated ads will be delivered in a multistep conversation, which raises privacy concerns as generative AI chatbots tend to collect more information on users than deterministic assistants like traditional Alexa and Siri.

All of this is concerning from a legal and security perspective, even moreso given the proximity big tech leaders have with a US government that is currently “obsessed with obtaining a nearly unprecedented level of surveillance and control over resident’s minds: gender identifies, possible neurodivergence, opinions on racism, genocide, etc.” And while there are steps companies can take to let users control how AI uses and stores their data, CDT cautions that placing the burden on users to manage this risk wrongly assumes they have sufficient capacity to make informed choices across hundreds of digital services.

Questions to consider

  • What steps are companies taking to allow users to understand and control how AI products store and use their data? What steps are they taking to minimize the collection of sensitive data like health information, sexual orientation, etc.?

  • For companies designing and using chatbots that people use to discuss sensitive topics, what steps are they taking to protect user privacy? Where sensitive health information might be disclosed, how are they maintaining compliance with relevant health privacy regulations?

Keep Reading

No posts found