REPORT

Side School

Le Apr 2, 2025, avec

Mathilde Brousse

AI and Data: 2025 Report

How to stay safe: no data training, no leaks, no hacks

Co-written by Side School and Mathilde Brousse

Introduction

The use of generative AI (chatbots, virtual assistants, text or image creation tools) is booming in the professional world. Creating marketing content, programming assistance, meeting summaries, customer support – these tools offer significant time and productivity gains.

But what happens to your data when you ask a question to ChatGPT or another model? How can you ensure that you are not inadvertently disclosing sensitive information?

This guide, co-written by Side School and Mathilde Brousse, aims to explain in an accessible and concrete way how to use generative AI safely. You will find:

Clear answers to questions every professional asks (“What happens when I send a request? Who can see my data? Where is it stored?”).
An overview of the main AI tools (ChatGPT, Claude, Gemini, Mistral...) and their differences in terms of confidentiality.
An update on the laws and regulations to be aware of (GDPR, Cloud Act, AI Act) and their impact on your practices.
Above all, practical best practices: compliance checklists, examples of anonymized prompts, tool comparisons, risk lists to avoid, etc., to protect your data daily.

The guide is intended to be serious yet accessible – no need to be a computer scientist or lawyer to understand it. The goal is for any professional (SME, independent, support functions, HR, marketing, etc.) to capitalize on AI innovation without compromising data security.

Disclaimer: This report is published in April 2025. Artificial intelligence technology, business practices, and legal regulations evolve rapidly. The information presented here may therefore be subject to change. While this guide has been written with rigor and accuracy, it does not constitute official legal advice. Readers are advised to act with critical judgment.

1. What happens when I send a request to an AI?

Summary: Your data (prompts, files provided, etc.) are sent to the AI provider's servers, processed to generate a response, and often retained for some time. They can be reviewed by automated systems or even humans for improvement or moderation purposes. It’s crucial to understand the pathway of this information.

From the workstation to AI servers: the journey of your data

When you use an online AI chatbot, your request is transmitted via the Internet to the service's server:

The connection is generally encrypted (HTTPS) to protect it during transit from your computer to a server.
Your text is received by the AI model, which generates a response. This process occurs on the provider's servers (e.g., OpenAI servers for ChatGPT).
A copy of your prompt and the response may be stored on these servers. Indeed, most services retain conversations for a period primarily to improve the system and prevent abuse.

In practice, asking a question to ChatGPT is akin to sending your data to the United States (where OpenAI is based) or to other provider cloud locations. The same goes for Gemini or Claude: data centers are almost always in the USA. Unless using a local open-source solution, your prompts leave your computer to be processed remotely.

Who can access my data?

Once stored at the provider, several actors can theoretically access your data:

The provider itself: Technical teams may access the conversations, especially for debugging or human intervention (e.g., checking usage policy compliance or training the model on examples). For instance, OpenAI indicates that “reviewers” may read excerpts of conversations reported for abuse. Anthropic (Claude) also mentions that data can be reviewed by authorized staff for safety reasons. (Sources: Anthropic, OpenAI).
Automated systems: Even without direct human intervention, your data can be used to refine the AI. This means that they contribute to re-learning the model's algorithms (unless explicitly opted out, see below). Put simply, your prompt could serve as a training example to improve future responses. (Sources: Anthropic, OpenAI).
Possible subcontractors: Major providers use cloud services (hosting, storage) and sometimes moderation performed by third-party companies. Content can be shared with these service providers as necessary under strict contractual obligations. (Source: help.openai.com)
Legal authorities: Under judicial requisition, the provider might be compelled to disclose stored data. This is particularly true in the United States with the Cloud Act, allowing authorities to access data stored by American companies even if they reside outside the USA. We will delve into this point and its implications for European businesses (spoiler: the Cloud Act conflicts with the GDPR).

Fortunately, your data is not accessible to just anyone. A third-party user cannot read your requests simply by logging into the service. However, there is an indirect risk: if the model has been trained on insufficiently cleaned data, it could regurgitate sensitive information provided by others. This is why proprietary code snippets or confidential conversations introduced by some might later appear in AI responses.

In summary, when you use online AI: your data leaves your organization. They are stored, often for extended and undefined periods until account deletion, on the provider's servers (often a minimum of 30 days) and may be reviewed by authorized personnel or integrated (possibly in anonymized form) into the model's training dataset. Consider that information shared with a chatbot is no longer fully private – hence the importance of controlling what you send it.

Where is my data stored and for how long?

The exact location depends on the provider. By default, it's prudent to assume that data is stored outside of Europe, unless using a service that guarantees local hosting.

The retention period for identifiable conversations is 30 days. In reality, once this period has passed, providers often retain anonymized data for longer. For example, OpenAI anonymizes chat data after a certain time to reuse while “forgetting” the user. Practically, your phrases could fuel the AI for months, even years, but in a manner disconnected from your identity. It's better for individual privacy, but for the company providing the information, the result is the same: confidential content might persist in the model's core.

Every request sent to an external AI must be considered potentially stored and reusable. Even though these services take precautions (anonymization, encryption in transit, etc.), the best way to keep a secret is not to entrust it to a public AI. Before adopting ChatGPT & the like for professional tasks, it is crucial to understand these mechanisms... which we will explore further with the differences between tools.

2. Overview of major AI tools and their data policies

Not all AI models are equal in terms of confidentiality. Some providers use your data by default to train their models, others are less committed, and open-source solutions offer another approach.

Let's see what it is for four key players: OpenAI (ChatGPT), Anthropic (Claude), Google (Gemini), Mistral and Deepseek (in the table).

(Note: We cover here generalist text AIs. Other tools like Microsoft 365 Copilot or Gamma might be mentioned, but they often rely on the same models or have similar policies. The principles discussed here remain valid.)

Comparative table of confidentiality policies

For clarity, here is a comparative table summarizing the practices of our four emblematic tools in terms of data storage, retention, and reuse (default version, i.e., consumer version unless otherwise mentioned):

ChatGPT (OpenAI)

Quick Overview: ChatGPT is arguably the most well-known generative AI. Developed by the American company OpenAI in partnership with Microsoft, it is offered in a free and Plus version. OpenAI also offers Business plans (ChatGPT Team, ChatGPT Enterprise) and an API for developers to integrate GPT into their own applications.

Data Usage: By default, OpenAI uses ChatGPT conversation content to improve its models. This means any prompt or response you exchange via the web interface or app can be analyzed later to refine the AI.

For Enterprise and API versions, the policy is stricter: no use for training models by default. In short,

free/Plus = training by default
Enterprise/API = confidentiality by default.

Storage and Access: ChatGPT data is stored on OpenAI's cloud servers. Employees or contractors may access them if necessary (for example, OpenAI has acknowledged employing people to review certain conversations and refine the model's responses).

Free vs Paid Differences: Surprisingly, on data issues, ChatGPT free and Plus have the same data treatment – the Plus paid subscription mainly offers a better model and more features, but your data is treated the same (subjected to training, etc., unless you disable the “Improve the model for everyone” feature). Only by moving to the Enterprise plan (for businesses, with a dedicated contract) will your conversations no longer help train the AI and will benefit from contractual guarantees (enhanced encryption, SOC2-type compliance certifications, ability to choose storage region, etc.). OpenAI is aware that professional clients demand this – hence the launch of ChatGPT Enterprise in 2023 (Source: OpenAI) to reassure on these points. However, if you just use the consumer web version for work, you accept that this data becomes the raw material of OpenAI.

(Concrete example: an employee who used ChatGPT to help draft an internal report saw, a few months or years later, almost identical excerpts of her text appear in ChatGPT's responses used by someone else. This risk of indirect “leak” led companies like Samsung, Amazon, or banks to restrict or outright ban the internal use of ChatGPT.

On the Public Version, OpenAI provides “Ephemeral Chats” as a good alternative to data protection: these chats remain only 30 days in history, are not used to train models, and are automatically deleted from OpenAI servers after 30 days.

Mistral Le Chat (Mistral AI)

Quick Overview: Mistral AI is a French startup offering a conversational AI assistant called Le Chat, with a free version, a Pro offer (payment subscription), and Enterprise/API solutions for business clients.

Data Usage:

User data (prompts, generated content, feedback) is primarily used to provide the service (response generation) and ensure moderation (abuse detection). However, the use of this data for model training varies by offer: by default, interactions from Le Chat Free and Pro users can be retained and analyzed to improve Mistral's general models. An opt-out option is offered in account settings for users who refuse this processing to disable it. Conversely, Enterprise offers (e.g., the Team plan via API) guarantee increased confidentiality: no client data is used for model training by default, the client company must instead explicitly opt-in to share it

Storage and Access: Mistral AI, as a European company, is subject to the General Data Protection Regulation (GDPR), ensuring strict protection of user data. The company emphasizes hosting data in Europe (servers located in Sweden), a guarantee of compliance with GDPR standards. User data (prompts, generated content, feedback) is primarily used to provide the service (response generation) and ensure moderation (abuse detection).

Additionally, Mistral AI enforces limited retention: API call technical logs are retained for only 30 days for audit and security purposes before automatic deletion. Similarly, a conversation initiated without an account is stored only during the active session, while dialogues associated with a user account are retained as long as the account is active (until account deletion by the user).

Free vs Paid Differences: The Pro subscription, in addition to allowing users to control the use of their data, offers unlimited access to Mistral AI's most powerful models, unlimited daily requests, and advanced features such as image generation and document analysis.

Claude (Anthropic)

Quick Overview: Claude is a conversational assistant developed by the startup Anthropic, founded by former OpenAI members in the United States. Claude is available via Claude.ai (web interface, with a limited free version and a paid Pro version) and via a commercial API.

Data Usage: Claude’s policy differs by usage: in the public version (Claude.ai), Anthropic indicates that user data may be reused to improve the service and develop new products, even for the paid Pro version. Clearly, if you chat with Claude via the website, your prompts/responses will fuel future training (just like with ChatGPT). Anthropic even advises against using Claude.ai (free or Pro) for sensitive professional data, as there is no absolute confidentiality guarantee. (Source: Anthropic).

However, for enterprise customers via API or Claude “Work”: the data is not used to train the model. In other words, Anthropic follows the same pattern as OpenAI – a distinction between consumer usage (data utilized) and business (data isolated).

Anthropic emphasizes that only authorized employees can access user data, for example, if a prompt is reported for security reasons, it may be manually reviewed. On the infrastructure side, Anthropic, as a US company without its own center, likely processes Claude data through AWS servers in North America. No indication is given of a default European storage.

Free vs Paid Differences: As mentioned, Claude.ai free = Claude Pro paid in terms of confidentiality: the payment provides more capacities (longer prompts, priority access) but no different data treatment. For truly private use of Claude, you need to go through enterprise/API solutions. In sum, for Claude and ChatGPT, “free or even Pro, your data does not really belong to you anymore; in business, they remain yours.”

Number 1, the most secure: ChatGPT's Temporary Chat. Number 2: Mistral AI paid version with opt-out on model improvement (as data goes to the EU and it's French). Number 3: OpenAI/Claude paid version. For more security, move to enterprise plans of these solutions.

Google Gemini

Quick Overview: Google was somewhat shaken by the rise of ChatGPT but now offers its own chatbot, Gemini, available for free. Google also integrates these generative AIs into its products (e.g., Google Workspace for assistance on Gmail, Docs, etc.).

Data Usage: Google has a data policy quite similar to previous ones for Gemini: user interactions are by default recorded and used to improve the service. By using Gemini with your Google account, your questions/responses may be exploited to train Google's models (and potentially personalize your experience).

Human reviewers may also read some Gemini conversations: Google admits it and even offers an option to refuse “human review” in activity settings. Practically, this means, without specific settings, your Gemini prompts are handled a bit like your Google searches: stored and analyzed.

This is equivalent to the “Improve the model for everyone” setting at OpenAI. However, few consumer users know or activate this setting.

Google Workspace Case (Duet AI): For its professional clients (Google Workspace, GCP), Google has taken care to announce that enterprise user data is not used to train general models. For example, if your company uses the writing assistant in Google Docs or Gmail via Duet AI, the texts you generate won't go into the general public training datasets. They remain confined to your organization (Source: Google Workspace). This is crucial: Google clearly segments its consumer offerings (where data fuels Google) and enterprise (where client data remains theirs). This aims to comply with GDPR and attract professionals who would otherwise lack trust.

Storage and Access: Google relies on its vast cloud infrastructure. If you use Gemini, your data is linked to your Google account and stored in Google data centers. Google does not publicly specify retention duration. Moreover, as a US company, Google is also subject to the Cloud Act – meaning Gemini data, wherever stored, could be accessible to US authorities upon legal request.

Free vs Paid Differences: Currently, Gemini is free for individuals and there is no consumer paid version. The difference is more about Gemini (consumer service). On Gemini, confidentiality is limited (data used for improvement, without strong contractual guarantee for the user). On paid offers, Google commits contractually not to use client data for purposes beyond the provided service, and to comply with protection standards (encryption, limited retention, etc.). Therefore, Gemini “classic” is a tool for exploration but not to be used with truly sensitive data, while Google enterprise solutions offer confidentiality comparable to OpenAI or Anthropic's Enterprise offers.

In summary for Google: “what you say to Gemini can serve Google”, but “what you say to your Google Workspace AI stays yours”. It's always recommended to check your Google account settings (Data and Privacy page) to adjust your Gemini activity usage.

3. Free vs paid versions: what differences for my data?

We've already seen a common schema: AI providers often adopt a dual approach, offering:

On one hand, consumer versions (often free or freemium) where user data serves as an exchange currency. In exchange for free or cheap service, you generally permit the company to exploit your requests to train its models and improve its product. This is the case for ChatGPT Free/Plus, Claude.ai, Google Gemini, etc. These versions don't allow for custom contracts or strong confidentiality guarantees. From a GDPR viewpoint, the publisher is often considered a data controller using the data for its own interest (to improve its AI).
On the other hand, professional/enterprise (paid) versions where the relationship changes: the client pays financially for the service, and in return, the provider commits not to exploit their data beyond the service. Here, the user company remains data owner, and the publisher acts more as a processor treating data only on client instruction (allowing GDPR compliance through a contract). Examples include OpenAI API/Enterprise, Anthropic Claude API/Business, Microsoft Azure OpenAI, or Google Cloud Vertex AI.

In short, free comes at a hidden cost: your data. With free ChatGPT, you have no way to prevent a good prompt you devised from being incorporated (anonymously) and eventually improving the model to which your competitors will have access. With the enterprise paid version, you can contractually demand data isolation. Similarly, the legal responsibility in case of data issues differs: on a free version, you use the tool “as is” and the publisher generally declines liability concerning the data you provide. On an enterprise version, they usually sign a data processing agreement (DPA) rendering them responsible for protecting your data and notifying you in case of a breach, etc.

In conclusion, if you use a tool in its standard version (free or not) and not within a specific B2B contract, assume your data feeds the publisher. Conversely, if you subscribe to an enterprise offer, the publisher commits not to alter your data and may allow you to define policy (retention durations, hosting location, etc.).

From an SME or independent viewpoint, does this mean you must absolutely pay to be compliant? Not necessarily, but it requires additional precautions (see best practices below) and accepting a risk portion. Many AI tools have yet to offer dedicated SME plans; paying for an enterprise contract may be disproportionate for some users. In such cases, you can continue using the free versions but limit what you share. For example, a marketing team may freely use ChatGPT for brainstorming slogan ideas, as no sensitive data is at stake and the creative gain outweighs the risk. However, an HR service should avoid pasting nominative employee evaluations in free ChatGPT – here, the legal and ethical risk is too significant. The golden rule: what is free should be considered public (in one way or another), what is contractual may be considered private.

When providing feedback (e.g., clicking the thumbs up, thumbs down, or choosing a provided version by ChatGPT), all information related to the prompt is shared to improve the models. This feedback also covers initial settings, especially if using an Enterprise account. Furthermore, Gemini specifies that, in their case, messages preceding the prompt concerned by feedback are also collected to better understand the request context.

4. Which AIs are the most privacy-respectful? (Ranking)

Summary: Local or professional solutions are the safest; free or default tools pose the greatest privacy risks for users.

Let's now take a look at an indicative ranking of tools and approaches by level of privacy respect, from the most prudent to the riskiest. This is not an absolute merit ranking but seeks to categorize which solutions offer the most user data control.

⭐ Open-source models run locally (e.g., Mistral 7B, LLaMA 2, GPT4All) – Maximum respect. Your data never leaves your environment. No third-party provider has access, no Cloud Act risk, or misuse. It's ideal for highly confidential information. Of course, managing security (well-protected servers, etc.) is necessary, but one is in control. Disadvantage: requires technical resources (infrastructure, skills) and open-source models may be less performant or comprehensive than cloud services from industry giants. Another major disadvantage: it's expensive, much more costly than an API call. However, for many cases (internal documents, prototyping), it's a choice solution if you have the skills.
🥈 Dedicated AI services or self-hosted (e.g., GPT-4o installation on a private cloud, service provided by a European provider) – Here too, we aim for total control. For instance, some companies opt for dedicated model instances (OpenAI offers via Azure an “instance” where only the client sends data, without mixing with others). Others go through European startups offering more respectful proxy models. The idea is to avoid large shared environments. Disadvantage: the cost can be higher (premium service) than using accounts like OpenAI Enterprise, for example.
🥉 Enterprise offers from major providers (e.g., OpenAI Enterprise, Google Vertex AI, Anthropic via API) – We already reach a good trust level. Your data is not used for training and remains isolated. Providers often offer tools to control retention, encrypt information, and monitor access. For example, OpenAI Enterprise encrypts data “at rest” and allows retention settings, Anthropic Claude Enterprise offers deletion options and GDPR-compliant DPA. Limitations: data is still hosted by a third party (so Cloud Act risk remains present if the third party is US-based), and having the size or financial means for these offers is necessary. But for a company intensively using AI, it's a secure path and often easier to deploy than pure open-source.
💡 Consumer tools with privacy options enabled (e.g., ChatGPT with “Improve the model for everyone” disabled, Gemini with model activity disabled) – A step below, we have consumer uses “protected” by user settings. If you use ephemeral chats on ChatGPT, your new conversations will not serve to train the model (OpenAI will process them only for moderation for 30 days then erase them). It's significantly better than nothing. Google similarly allows disabling your Gemini interactions recording. Limitations: it relies on trust with the provider (it still technically retains the data for a while and could change its policy), and in case of a bug or forgetting the setting on another device, you're back in “collection” mode. Moreover, this doesn't necessarily prevent a human examiner from seeing your data if flagged for abuse. So it's out of a regulated professional use – to test or use for non-sensitive purposes.
⚠️ Standard consumer tools, without precaution (e.g., free ChatGPT with history, Claude.ai public, Gemini default) – Highest confidentiality risk. Here your data is fully exploited by the service. For leisure use or innocuous questions, it's not a problem. But it's to be banned for sensitive company data or personal data. You have no guarantee protecting you if, for example, confidential information is reused by the model or compromised in case of a security breach at the provider.

In summary, the more you move up to controlled solutions (local open-source, dedicated services, enterprise offers), the more your risk of exposure and/or data leakage diminishes. On the contrary, default consumer solutions, although powerful and attractive, should be used with the understanding that every provided data is potentially shared. For professional use, aim for at least level 4 (tools with privacy options enabled) and ideally level 3 or beyond for strategic data.

5. What do the laws say: GDPR, Cloud Act, AI Act... what are the obligations?

Summary: GDPR imposes strict rules on personal data, the Cloud Act exposes you to US laws, and the AI Act will strengthen transparency and control requirements.

The regulatory framework around AI is rapidly evolving. Three legal pillars deserve the attention of professionals using generative AIs: GDPR (personal data protection), the Cloud Act (U.S. law with extraterritorial effects), and the AI Act (future European AI regulation). Here is an overview of what they entail and their impact on your practices.

GDPR – Guardian of personal data in Europe

The GDPR (General Data Protection Regulation) has been in effect in the EU since 2018. It imposes strict rules when you process personal data (PD) – i.e., any info related to an identified or identifiable person. For example, a name, work email, customer number, even a recorded voice are PDs. If your AI prompts include such data (employees, clients, etc.), then you are processing personal data and must comply with the GDPR.

Key GDPR obligations applicable:

Legal basis & purpose: You need a valid reason to use these data. In a professional context, it could be the legitimate interest of the company (improving productivity through AI) – provided it does not disproportionately harm privacy. For very sensitive data (e.g., health, HR data), legitimate interest could be challenged; the explicit consent of the person might be needed. For example, if you want to use AI to analyze resumes, ensure you have informed candidates and have a legal basis, as you potentially transfer their data to an external tool. The Italian authority criticized OpenAI for lacking a legal basis for using personal data in ChatGPT.
Transfer outside the EU: Sending PD to a service in the U.S. (or any country not “adequate” by the EU) is an international transfer. It must be legally secured (standard contractual clauses – SCC, or other mechanisms). With a free account on ChatGPT or Gemini, you do not have this guarantee (OpenAI/Google does not sign a contract with each user). In business, you should sign a DPA with the provider including SCCs. OpenAI offers a DPA for enterprise/API clients, but not for individual users. In short, using free ChatGPT on personal data puts the company at non-compliance GDPR risk, as it's an effectively illegal transfer (no strong legal basis or contractual guarantees). Many European CNILs examined the issue in 2023 (Italy temporarily blocking ChatGPT, investigations by France, Spain, etc. on AI DP usage).
Subcontracting: If AI acts as a subprocessor (processing PD on your company's behalf, without using them for its own account), GDPR obliges signing a contract with specific clauses, and ensuring the subcontractor offers sufficient guarantees. For free ChatGPT, OpenAI isn't truly a subprocessor (it uses data for its own account). Thus, this schema doesn't apply. However, if you have ChatGPT Enterprise, OpenAI positions itself as a subprocessor and offers a contract accordingly. Hence, ensure obtaining a processing agreement for regular professional use.
Minimization & security: GDPR enshrines the minimization principle – not processing more data than necessary. This aligns with our advice: only send the AI the minimum info required for desired results. Fewer personal data means less risk. Also, ensure appropriate security: e.g., using AI on a secure network, with a protected account (MFA), avoiding storing response copies with PDs in unsecured locations, etc. If a leak occurs via AI (imagine the provider gets hacked and your data is exposed), your company might have to notify CNIL as a data breach. Best to prevent this scenario by not providing too sensitive data or anonymizing them.

In summary, from a GDPR standpoint, any personal data sent to an external AI must be considered judiciously: am I allowed to do this? Have I informed the concerned person? Do I have a processing agreement with the provider? If not, avoid or anonymize. Note that anonymizing means removing any information identifying a person (name, email, but also overly specific context elements). For example, replacing “Dupont, 45, marketing director” with “X, experienced marketing executive”: precision is lost, but it exits the personal data realm.

Cloud Act – The long arm of US law

The CLOUD Act (Clarifying Lawful Overseas Use of Data Act) is a 2018 US federal law significantly impacting data hosted by US providers. Essentially, the Cloud Act allows US federal agencies (police, justice) to require an American company to provide electronic data it stores, even if this data is on foreign servers, as long as a valid warrant/legal document is issued. In short, if you use a US AI service (OpenAI, Google, Microsoft, Anthropic...), US authorities could, during an investigation, access data you entrusted to this service, without necessarily going through European legal channels.

Why is this a problem? Because it circumvents GDPR and European protections. A European company can find itself in a bind: on one side, required to protect its clients' data, on the other, compelled because its US provider must obey an extraterritorial law. The Cloud Act has faced heavy criticism in Europe for this potential conflict of laws. For instance, GDPR insists on not transferring personal data outside the EU without guarantee; the Cloud Act could force a transfer to US authorities, without prior notification. We then have a thorny sovereignty issue.

Practically, using AI, the Cloud Act risk is somewhat theoretical if you're not in a sensitive sector. Your data would need to interest US justice (anti-terrorism, criminal investigation...). A RH exchange or marketing draft is unlikely to be the target of a federal warrant! However, enterprises in strategic domains (defense, health, government data) should pay extreme attention. As a precaution, many European public organizations avoid using US cloud services for sensitive data.

What solutions? Currently, no simple legal workaround exists, except for choosing providers not subject to the Cloud Act (e.g., European or local open-source AI solutions). Another option is encryption: if you encrypt data before sending it and the provider lacks the key, even if requested, it would be unreadable. Yet, for an AI chatbot context, encrypting the prompt is nonsensical (the AI needs to read to respond). Encrypting homomorphic processing or secret computing allowing encrypted processing is far from operational with large language models. So, using a US AI = accepting the Cloud Act risk.

Some cloud providers offer artifices: e.g., Microsoft with its “Azure OpenAI” allows OpenAI models hosting in a European data center and ensures it would contest illegitimate requests. Yet even Microsoft can’t fully oppose a federal warrant (the Cloud Act being binding).

In summary: the Cloud Act isn't a ban, but a risk factor. If dealing with highly confidential data (industrial secrets, massive personal data, etc.), consider European or internally hosted solutions for AI processing, maintaining control. For non-personal or less sensitive data, the Cloud Act is a low and widely accepted risk in practice (most companies use US software like Office 365, despite this risk). The key is awareness: entrusting strategic data to a GAFAM means potentially also entrusting it to the US government if requested.

AI Act – The forthcoming European AI regulation

The AI Act (Artificial Intelligence Act) is a highly anticipated European regulation adopted in 2024 (EU Regulation 2024/1689), with provisions applicable from 2025 for the highest-risk categories, and its main provisions effective from January 2026. Its goal is to regulate AI usage to ensure safety, fundamental rights, and transparency. It's a substantial text, but for professional users of generative AI, here are the key points:

Classification by risk levels: The regulation distinguishes several AI system categories: minimal risk (no specific obligations), limited risk (transparency obligations, e.g., informing the user they're communicating with AI), high risk (stringent control, conformity assessment obligations, etc., as these AIs can significantly impact lives), and unacceptable risk (AI outright banned, such as mass biometric surveillance or social scoring). Generative AIs like ChatGPT are considered general-purpose AIs; they aren't banned, but if used in a risky context (e.g., recruitment, medical diagnosis), that context might be classified “high risk”.
Transparency toward users: For chatbots AIs (limited risk), the AI Act will require informing people they are interacting with AI, not a human. So, if deploying a customer service chatbot on your site, clear notification is necessary. Similarly, for generated content (text, image), there will be obligations to notify it’s AI-created in certain cases, preventing deception (e.g., mentioning a report or draft was AI-generated if shared internally could become a widely encouraged practice).
Impact on generative AI: the AI Act places obligations on AI providers and users, particularly for general-purpose models like ChatGPT. Providers (OpenAI, Google, etc.) will have to meet stringent quality, technical documentation, risk management requirements and provide more information about training data, model limitations, etc. A specific chapter addresses generative or “foundation models”: providers will need to ensure a certain system explainability and implement measures to avoid generating illegal content. For users, particularly those using high-risk AI systems (like CV sorting or credit granting), the AI Act requires compliance with provider guidelines, reporting major incidents, and capability to explain AI-driven decisions, consistent with GDPR principles. This implies closer oversight and maintaining human supervision, especially given models still widely perceived as “black boxes”.

In summary, the AI Act will hold both AI designers and users accountable. For standard use (email writing, creativity support), the impact will be minimal: maybe just mentioning tool usage if relevant, ensuring tool compliance (this will be the provider's concern). For advanced and critical use (AI making human-related decisions, such as hiring, assessment, diagnosis), the AI Act will involve strict framing: you can’t rely blindly on AI, and you'll need to use certified systems, retain human control and report incidents.

Practical advice regarding the AI Act: Start now to document your AI uses internally. List where you use it, for what purpose, and assess the risk. If it potentially falls into a “high risk” category, be proactive: define control procedures, prepare to justify system operation, etc. For example, if an HR department uses a GPT algorithm to assist in CV sorting, it should do so experimentally, in tandem with a human, measuring if biases occur, etc. When the AI Act comes into force, you’ll have already adopted a compliance approach.

Finally, remember the AI Act enshrines the idea AI must respect privacy by design – expect tools to evolve to natively incorporate anonymization, data purge options, etc. What today requires manual vigilance may be more automated tomorrow.

6. Best practices for protecting your data using AI

Summary: Using AI securely involves anonymizing your data, limiting shared information, adjusting privacy settings, and training teams.

After the theory, let's move to practice. As a professional, how can you concretely use generative AIs without compromising your data? Here are our concrete tips, in the form of practical boxes, checklists, and examples.

Examples of anonymized prompts

Anonymization involves removing or replacing any identifiable information in your data before submitting it to the AI. This considerably reduces risks. Let's explore some typical examples:

Case 1 – Human Resources (confidential):
Poor prompt (raw data): “Can you draft a termination letter for Mr. Jean Dupont, born on 03/05/1980, employed as an Accountant since 2015, indicating the reason as gross misconduct after embezzling €5,000?”
This prompt includes a name, birth date, job, and a specific accusation – a vast amount of sensitive info for the company and person!
Good prompt (anonymized): “Can you provide a template for a termination letter for gross misconduct (embezzlement) concerning a long-term employee in the accounting department? The tone should be formal and factual.”
In the anonymized version, the employee’s name is gone, so is the date, and the exact amount is omitted. We retain just what's necessary (gross misconduct, accounting domain, desired tone). AI can perfectly generate a letter template. You’ll then internally replace X's with the specifics (name, amount, date…). Thus, these personal details never transit through AI.
Case 2 – Marketing (customer data):
Poor prompt: “Generate an attractive case study from this client testimonial: “We, the company Trucmuche SARL, used product XYZ from 2020 to 2022 and increased our sales by 15%. Martin Durand, sales director, recommends this product.””
Here, the exact client company name, a precise percentage, and the director providing the testimonial are given. Problem: if the testimonial isn't public, we are disclosing private commercial info to AI.
Good prompt: “Generate a fictional but realistic case study for a client in industry X having used our product for 2 years, resulting in significant sales improvement. The testimony should appear authentic.”
In this version, we no longer include real names or exact figures. We stay general (2 years, significant improvement instead of 15%). AI will produce generic marketing text you can adjust with real figures outside the tool. This protects the client's secret while benefiting from AI's flair.
Case 3 – Technical support (internal log):
Poor prompt: “Diagnose the error in this log: User PAULINE_DSMITH failed login from IP 192.168.0.55 on 2025-03-20. Account ID 554433 locked.”
We’ve just provided a username (Pauline D. Smith), an internal IP, an ID… Potentially personal (nominative account) and technically sensitive data.
Good prompt: “A user encounters a login error with message “user failed login – account locked”. What could be the causes and how to resolve them?”
We’ve removed all unique identifiers. AI can explain generally what “account locked after login failure” means and how to deal with it, likely enough for support to troubleshoot, without exposing raw log data.

In general, always think: “Do I really need to provide this detail to get AI's help?”. Often, the answer is no. AI can operate with placeholders or abstract descriptions. You can use letters (X, Y) or fictional names if needed, or say “a person A, a company B”. Likewise for numbers, perhaps an approximate percentage will suffice. If you must absolutely handle sensitive text (e.g., want a grammatical correction of an email containing internal infos), consider substituting confidential elements with XXXXXXXXX before submitting. Sure, it’s extra effort, but it’s the cost of security. An anonymized data is one that cannot harm you if it leaks.

Checklist: Responsible AI usage in everyday work

Here's a checklist of best practices to systematically adopt when you use AI professionally. Feel free to print and keep it handy!

✅ 1. Share the minimum data possible in your prompts – Apply the “Need to know” rule: provide only what's useful for the task. The more concise and generic the prompt, the less information you expose.

✅ 2. Avoid personal or sensitive data – Rule out names, addresses, ID numbers, intimate details, confidential financials, etc., unless absolutely necessary. If you still need to handle such data, anonymize it (cf. examples above) or use a local/private solution.

✅ 3. Enable ephemeral chat – On ChatGPT, activate the ephemeral option at the top-right ensuring data deletion after 30 days. On Google, pause Gemini activity. This prevents (within stated limits) your data from entering AI training datasets.

✅ 4. Regularly delete shared content – Clear important conversations from service histories. For instance, after getting an AI answer, remove the chat from the list (OpenAI and Anthropic guarantee deletion after 30 days). Thus, even if the provider is compromised, your old data won't be there.

✅ 5. Be clear and precise in your queries – It might seem off-topic, but a vague prompt can lead to long dialogues where you end up providing more info. Conversely, a well-formulated prompt from the start avoids unnecessary back-and-forth. Fewer exchanges = less shared data. Plus, you'll save time!

✅ 6. Opt for AIs “less data-hungry” – I.e., those which, by design, require little from you or emphasize privacy. For example, a local spell checker (like LanguageTool offline) might suffice instead of a big online model for text correction. Likewise, if a smaller model can run internally for a given need, it's better than a giant external model.

✅ 7. Review terms of use – When in doubt, check the service policies. Are there data commitments? Privacy settings? Knowing the official stance helps adapt your usage. If the terms seem incompatible with your activity (e.g., inability to guarantee confidentiality), abstain.

✅ 8. Exercise your rights if necessary – GDPR grants access to your data. For example, you can request OpenAI or Google for a copy of personal data they stored about you. It's also a way to verify what they retain. You also have a right to erasure: OpenAI allows account and therefore data deletion. If you feel sensitive content lingers with a provider, don’t hesitate to use these mechanisms.

✅ 9. Train and sensitize your teams – If managing a team or company, educate your colleagues on these best practices. A leak can come from one person who, out of ignorance, pastes an entire client file into a prompt. Set an internal AI usage policy, with do's & don'ts (e.g., “Never place nominative client data in ChatGPT”). Share anonymization examples so they see how to proceed.

✅ 10. Monitor tool and regulation evolution – What's true today will evolve. Subscribe to updates from your AI tools (they often email with changes). Likewise, remain informed of regulatory news via your legal/DPO reference. For instance, if OpenAI offers tomorrow a regional storage option in Europe, it might alleviate some hurdles – good to know.

By following this checklist, you will drastically cut risks while benefiting from the power of AIs. The watchword is vigilance: each time you use AI, have a small mental “Secure by design” reflex. Over time, it will become natural.

To protect your data on ChatGPT, disable conversation history in settings (accessible via the menu when clicking your profile photo).

Common risks to avoid

Despite everything, some errors or imprudences often recur. Here's a list of traps to absolutely avoid when using AIs in your work:

Do not test AI with real sensitive data: A developer wanting to “see what the AI will answer” might be tempted to provide it with an excerpt of the real database. Bad idea. For your tests, create dummy data instead of using real ones.
Avoid disclosing trade secrets: Out of enthusiasm, one might seek strategy advice from AI (“What to do to counter competitor Z's product Y?”). Formulate this generally. Do not give the entire confidential marketing plan to seek an opinion, for instance. Remember, anything you say might surface elsewhere.
Don't forget AI isn’t confidential: Chatting with ChatGPT can give the illusion of a private, almost intimate conversation, especially with the tone of responses. It's misleading. Behind, this isn't a discreet friend, it's a cloud service. Do not drop your guard assuming “it's just a machine”. This false sense of security is a danger pointed out by CNIL.
Don’t leave generated data lying around: When AI gives a response containing fragments of your initial data, be careful. For instance, if you gave a paragraph from a confidential report for a summary, its response is supposed to be an abstract summary... but check it doesn't include entire original sentences. If so, don’t copy-paste this response directly into an external email inadvertently. Clean or paraphrase it.
Do not forget overall compliance: AI is one tool among others. If using it for personal data processing, don't forget your duties: processing record, impact assessment if necessary, informing people... GDPR doesn’t stop because it’s AI. Example, a university using ChatGPT to assist sorting applications should mention it in its candidate information process (transparency).
Caution with illicit content: This guide focuses on confidentiality, but remember AI use involves other aspects. Avoid requesting or generating illegal, defamatory content, or violating intellectual property through AI. Besides direct legal risk, it could lead the provider to suspend your account (their usage policies ban abuse). A professional must ensure remaining in legal and ethical boundaries on all these fronts.

By avoiding these pitfalls, you further reduce risk surfaces. Many points involve common sense and good digital hygiene, but with AI enthusiasm, it’s easy to get carried away and forget fundamentals. Take time to reflect before each potentially problematic use.

7. Advanced solutions: local models, private cloud, security vs innovation arbitration

Summary: For critical uses, solutions like local AIs, private clouds, or dedicated instances reconcile innovation with strict data control.

Despite all precautions, you may have needs where online consumer AI tools can’t be used (too risky), or you wish to go further by integrating AI extensively in your processes while maintaining high security. What options are available? Let's review advanced solutions and how to arbitrate between security, cost, and innovation.

Opt for a local model (on-premise)

This is the most secure solution: run AI on your infrastructure. Either on your premises or your private cloud (e.g., your virtual machines on providers like OVH Cloud, Outscale, etc.). With the rise of open-source models, this is increasingly feasible. Meta led the way with LLaMA (available in different sizes, including Llama-3 70B, quite competitive). Projects like GPT4All, Alpaca, Dolly also emerged providing lighter versions that even run on a good laptop.

Advantages: You retain total control. Data doesn't exit, you can even disconnect the internet on the machine running AI for assurance. You can customize the model (fine-tuning) with your internal data worry-free. For instance, training a small AI on your confidential documentation for answering business questions is possible locally and remains yours. No external service dependence (no unexpected downtime or policy change suffered). No costly subscription either after hardware investment – open-source models are usage-fee free.

Disadvantages: It requires technical expertise. Installing and optimizing isn’t always plug-and-play, although things simplify (there are simple web interfaces like oobabooga Text UI to manage a local chatbot). Especially, computational power is high for large models. GPT-4o for example isn't open-source, but suppose you wanted an equivalent model – it’d need dozens of GPUs and huge electricity bills for inference. Smaller open-source models fit more modest hardware but at lower performance costs. Accepting that your in-house AI may be less “intelligent” or expert than an online GPT-4o is key. For some tasks, this suffices, for others not. It's a trade-off: top-level AI vs total confidentiality. Many organizations opt for a hybrid approach (see later).

Another drawback involves costs, as hosting your model is much more financially burdensome and necessitates extensive usage volumes to justify hosting over API. Additionally, fine-tuning your model requires development skills and significant development time (particularly model testing). This solution suits businesses where the AI model is central to business and profoundly impactful.

Recommended onsite use cases: Internal document processing, proprietary code assistance (can locally code with AI trained on one's own codebase without exposing anything externally), etc. Also, ultra-sensitive sectors (defense, medical) will almost always prefer on-prem or sovereign clouds for AI.

Use a private cloud or dedicated instance

If you lack internal resources to manage AI, seek a compromise via a private cloud or a dedicated instance with a provider. For example:

Microsoft, Google, and AWS clouds: Microsoft, Google, and Amazon offer AI model hosting on their servers. Microsoft, for example, can host OpenAI models (GPT-4o, etc.) in your Azure Cloud instance. Other cloud providers offer equivalent services (Amazon with AWS and Bedrock, Google with Vertex AI, etc.). You can choose the region (including Europe), your data remains isolated in your instance, and are not used to train OpenAI or others. This option is usage-funded, but arrives turnkey, granting access to the latest models' power without sending data to ChatGPT's public environment.
European hosts and startups: In Europe, solutions claiming sovereignty emerge. Using an open-source model without managing infrastructure while having a local RGPD-compliant contract is possible, it’s not Cloud Act-subjected. Evaluating reliability and continuity of these actors is necessary, but a growing number of medium and large companies take this route.
Dedicated instances at OpenAI/Anthropic: Some major clients can negotiate with OpenAI for a dedicated model instance, potentially with additional training on their data and no external output. In such cases, the budget is more significant.

Advantages: Much on-premise benefits recur (better confidentiality, location choice, no client data mixing) minus technical hassles (it's provider-managed). Often, scaling easily (adding more capacity on-demand) is possible, retaining access to cutting-edge models (GPT-4o, etc.) through a managed service instead of limited open-source.

Disadvantages: Financially costly as you pay a premium service. Trust in the chosen provider – read contracts thoroughly, ensure they offer guarantees (encryption, possible audit, certifications). While a private cloud reduces risks significantly, if the provider is US-based, the Cloud Act still applies: hosting data on Europe's Azure mitigates risks greatly but doesn’t theoretically eliminate them.

Usage cases: An SMB wanting GPT-4o's superior reach without data leak risk might opt for Azure OpenAI – allowing developers to integrate GPT-4o into internal applications via Azure API, knowing data isn’t leaving or seen by OpenAI. Ideal for cases like legal assistance (analyzing internal legal documents), customer support (processing emails with client info), etc., where model power is needed without using public ChatGPT.

Arbitrating between security, cost, and innovation

Every company has a pointer to place between: fully leveraging AI’s advancements (innovation, performance) and safeguarding confidentiality (security, compliance) completely. Achieving 100% in both facets is challenging – seeking reasonable compromise is necessary. Here are arbitration tips:

Assess the sensitivity of use cases: Categorize planned AI uses into several categories (e.g., public data, non-personal internal data, non-sensitive personal data,

highly sensitive data). For each category, set a tool policy. Example: “For public or non-sensitive data, ChatGPT Plus is allowed.” “For personal data, API OpenAI usage with a DPA contract is required.” “For ultra-sensitive data, only approved on-prem models by IT are allowed.” Thus, innovation isn’t blocked on low-critical tasks while securing confidential correspondence by other means.

Start small on secure solutions: If on-prem or dedicated cloud tempt you but hesitating due to cost/uncertainty, initiate with a pilot project. For example, use a mid-sized open-source model (13B parameters) and let a team test on fictitious data. Or subscribe for a month to managed services to assess integration. Measure results : is it efficient enough? What is the workload? It gives arguments for a larger investment decision or directs superiors.

Benefit from both worlds (hybrid): Nothing prevents adopting a mixed approach. Example: using public ChatGPT for all generic tasks (brainstorming, creativity generation, open-source code, info monitoring), reserving local solutions for private data tasks (internal data analysis, confidential document generation). Even within workflows, combinability exists. Use public AI for non-sensitive work portions, switch to private AI for others. Requires compartmentalizing tasks, but feasibly. Example, in report preparation, ask ChatGPT “give me a typical report plan on [general subject]”, then after obtaining the skeleton, use local AI for coherent filling or writing improvement without exposing data. The hybrid maximizes productivity while minimizing exposure.
Consider direct vs indirect costs: Exclusively using free tools seems economic, but a data leak or GDPR fine will be more costly! Conversely, investing heavily in ultra-secure solutions bears opportunity costs: during implementation, competitors may already advance using public AI. Seeking the right balance is key. Sometimes, enterprise-grade subscription or contract pays off as peace of mind. Similarly, training staff in best practices averts costly incidents – it's a modest investment for big potential gain.
Remain agile and market-aware: The ecosystem evolves rapidly. Regular new, smaller, more efficient models emerge; commercial offerings diversify. Maintain active surveillance (tech blogs, company feedback). Perhaps the ideal tool wasn't available six months ago, but is now. For example, late 2023 had few platforms facilitating giant self-hosted models for non-experts; in 2024-2025 platforms simplifying via web interfaces have appeared. Avoid sticking with a single solution. Adapting strategy if innovation permits is wise. Engage pragmatism.

In conclusion, it's possible to combine security and innovation with good planning. Many companies take a progressive approach: authorizing basic AI uses (controlled) initially, gaining deeper understanding of risks, and investing in robust solutions to scale AI use. This appears the most sensible path for most: exploring, mastering, then confidently deploying.

8. Final Summary: Adopting AI with Confidence

Summary: With the right tools, settings, and practices, it's fully possible to use AI productively and compliant, without compromising data security.

Generative AI is a tremendous tool for professionals, provided one remains in control of their data. This guide highlighted the essentials:

Understanding how it works: We now know prompts don't vanish into a magical black hole but are stored, analyzed, sometimes reused. Awareness is the first step toward acting as an informed user.
Informed tool choice: We've compared ChatGPT, Claude, Gemini, Mistral... Each solution has its confidentiality advantages and limits. The comparative table and ranking help identify which tools to favor according to case sensitivity. In short: more open-source/private, more secure; more consumer/cloud US, more caution required.
Legal framework: GDPR imposes significant responsibilities – don’t forget personal data protection applies to AI too. The Cloud Act reminds jurisdiction and localization matter: using a US AI isn't neutral. The upcoming AI Act will refine rules, mainly for high-risk uses. Compliance isn't optional: integrate AI into governance like any data processing.
Everyday best practices: We detailed anonymizing methods, questions to ask before sending prompts, tool privacy settings adjustments, and errors to avoid. By applying these principles (checklist, etc.), you greatly reduce leak or breach risks. It's often free and easy to set up (e.g., unchecking a box, pondering for 10 extra seconds before speaking to the AI).
Advanced solutions exploration: For those wishing deep AI integration while safeguarding data, options exist: deploying internal models, private clouds, etc. This requires investment but is the peace of mind price for some organizations. wholesale transition isn't necessary – a hybrid, progressive approach is possible, aligning with priorities.

Ultimately, what strategy to adopt per your needs? If a small structure with no particularly sensitive data, you can freely leverage ChatGPT or Gemini tools to save time, adhering to best practices (no disclosed secrets, etc.). Just monitor offer evolution – maybe a Pro offer with more guarantees will soon fit you. For client, personal, or confidential data management, be stricter: avoid public free services, turn to professional solutions (API with contract, European tools, etc.), even investing a little. Finally, for regulated or secrete sectors (legal, medical, defense, R&D...), consider from the outset sovereign solutions (open-source or dedicated cloud), as utilizing grand public isn’t worth the risk.

Roles like “AI Privacy Officer” or responsible AI contacts are also emerging in companies, akin to DPOs for data, tasked with auditing AI usage and advising teams. Don't hesitate to identify who can handle this subject internally – a profile combining tech enthusiasm and legal knowledge would be ideal.

Finally, on a positive note: yes, it’s possible to use AI securely! It just requires some discipline and good tools. Innovation and data protection aren't contradictory, quite the opposite: responsibly approaching AI will bolster client, partner, and employee trust in your projects. Those mastering both AI's power and data governance will gain a decisive competitive edge in the coming years.

By following this guide, you now have the keys to harness the potential of generative AIs while retaining control. It's your turn to play, test, learn – and build with AI, knowing where you place your data. Happy exploration, securely!

Auteur :

Mathilde Brousse

Experte data et IA

Profil LinkedIn

Mathilde Brousse

Experte data et IA

Profil LinkedIn

Mathilde Brousse

Experte data et IA

Biographie

Mathilde Brousse, forte de 8 ans d'expérience dans les domaines de la data, de l'IA de la gestion de projets stratégiques, est diplômée d’un double Master of Science en Data Sciences & Business Analytics de CentraleSupélec et en Management de l’ESSEC Business School. Actuellement Head of Analytics & Data Science chez Harvest Groupe, elle pilote des projets innovants mêlant IA, data science et business intelligence. Elle intervient chez Side School en tant que formatrice et experte IA et data.

Outils IA utilisés

ChatGPT In-Depth Analysis

Le meilleur de l’IA, chaque mois.

La newsletter pour progresser avec l’IA dans ton métier.

Chaque mois, Side School sélectionne les ressources les plus utiles :

Des outils pour apprendre, appliquer et progresser dans l'IA
Nos nouveaux articles, et cas d'usages de nos participants
Des invitations aux prochains événements
Des places à saisir dans nos bootcamps IA

Déjà lu par plus de 4000 inscrits. Pas de spam.

Le meilleur de l’IA, chaque mois.

La newsletter pour progresser avec l’IA dans ton métier.

Chaque mois, Side School sélectionne les ressources les plus utiles :

Des outils pour apprendre, appliquer et progresser dans l'IA
Nos nouveaux articles, et cas d'usages de nos participants
Des invitations aux prochains événements
Des places à saisir dans nos bootcamps IA

Déjà lu par plus de 4000 inscrits. Pas de spam.

Le meilleur de l’IA, chaque mois.

La newsletter pour progresser avec l’IA dans ton métier.

Chaque mois, Side School sélectionne les ressources les plus utiles :

Des outils pour apprendre, appliquer et progresser dans l'IA
Nos nouveaux articles, et cas d'usages de nos participants
Des invitations aux prochains événements
Des places à saisir dans nos bootcamps IA

Déjà lu par plus de 4000 inscrits. Pas de spam.

Based in Paris, France, Side School SAS trains professionals how to leverage AI in their jobs. Our offices are located at 15 Quai de L’Oise, 75019, Paris, France. Side School is a registered trademark. To access the programs you are enrolled in, you must provide a valid email address that you can access. Payments are processed with Stripe, a third-party platform. Paypal, Visa, MasterCard are also supported, along with other payment networks via Google Pay and Apple Pay. For payment-related issues, contact "contact@side.school". For any other questions, collaborations, and media inquiries, contact "contact@side.school". The content and services provided by Side School are intended for educational and informational purposes only. Side School does not guarantee any specific results. Your success and potential acceleration with AI depend on your personal efforts, dedication, and application of acquired skills.

Bootcamps

Resources

Company

Client success stories

Social

Youtube

Contact email