This is how end-to-end application data anonymization works in SAP

Ben Ramhofer
Apr 27
8 min read

SAP landscapes contain highly sensitive personal data: customer master data, employee information, contract data, and payment information. As soon as this production data is copied into test, training, or development environments, a massive GDPR risk arises. Article 25 of the GDPR requires privacy by design. However, many companies fail to implement end-to-end application data anonymization in SAP landscapes.

The consequence: Personal data in unprotected non-production systems. Access by developers, external service providers, and test teams. No legal basis under the GDPR. Risk of fines of up to €20 million or 4% of global annual turnover (Art. 83 para. 5 GDPR).

This article explains why conventional approaches fail, what data protection risks arise without end-to-end anonymization, and how technical anonymization works for SAP test data management.

Why do many companies fail at data anonymization?

1. Manual masking is not scalable

Many companies attempt to manually mask personal data. SQL scripts are written to overwrite names, addresses, or email addresses in standard tables. This approach doesn't work for SAP landscapes with thousands of tables, custom Z-tables, and complex data relationships.

The result: Incomplete anonymization. Personal data remains in log files, archive tables, customizing tables, or free text fields. Incomplete anonymization means that the non-production system still contains personal data. Therefore, all GDPR obligations apply, from the legal basis and access controls to the obligation to report data breaches.

2. Inconsistent rules across multiple systems

System landscapes rarely consist of a single system. ECC, S/4HANA, BW, CRM, Ariba, external databases (Oracle, PostgreSQL, SQL Server) all contain personal data. If anonymization rules are not applied consistently across all systems, gaps arise.

Example: Customer number 12345 is anonymized in SAP ECC, but remains unchanged in SAP BW. Cross-references enable re-identification.

3. Broken referential integrity

Anonymization alters data. If values are not handled correctly, referential integrity is broken. Transactions fail. Business processes do not run. The anonymized system is no longer usable.

Many companies face a choice: either functional test systems with real personal data or anonymized but unusable copies. Neither is a solution.

4. Lack of automation

SAP systems are updated regularly. New data flows in. Test and development environments require recurring refreshes. Without automation, every anonymization process becomes a manual project. The effort is unsustainable.

What data privacy risks arise from a lack of end-to-end anonymization?

Data leaks due to extended access rights: Developers, test teams, and external service providers need access to non-production systems. Strict access controls apply in production environments, but often not in test and development environments. This means that personal data is accessible to a significantly larger group of people.

Legal basis under GDPR difficult to establish: The legal basis for processing genuine personal data in test and development environments is generally difficult to establish. In particular, a balancing of interests under Art. 6 para. 1 lit. f GDPR is likely to regularly favor the data subjects if anonymized data can fulfill the purpose equally well.

Violation of Privacy by Design (Art. 25 GDPR): Article 25 GDPR obliges companies to implement technical and organizational measures that effectively implement data protection principles. Copying production data without anonymization contradicts this principle.

Reporting obligations in the event of data breaches: If a security incident occurs in a test environment involving real personal data, the reporting obligation under Article 33 GDPR applies. Companies must report the incident to the supervisory authority within 72 hours. Data subjects must be informed in accordance with Article 34 GDPR if the data breach is likely to result in a high risk to their rights and freedoms. The reputational damage is considerable.

Recital 26 of the GDPR clarifies that anonymized data does not fall within the scope of the GDPR.

According to Recital 26 of the GDPR, the requirement is that, taking into account all objective factors, including costs, time expenditure, and technology available at the time of processing, identification of the data subject must no longer be likely according to general judgment. Pseudonymization (e.g., encryption with a key) is not sufficient for this purpose, as the data still qualifies as personal data (Recitals 26 and 28 of the GDPR). AppSafe uses DSIT algorithms designed to meet this standard.

This is how end-to-end application data anonymization works, step by step.

End-to-end data anonymization in applications requires a systematic process. AppSafe™ from Maya Data Privacy implements this process as follows:

Step 1: Connection to source systems

AppSafe is deployed as a containerized solution (Docker) and connects directly to the source systems: SAP S/4HANA, SAP ECC, SAP HANA, SAP BW, Oracle, PostgreSQL, SQL Server, and all JDBC-accessible databases. Processing takes place within the system. Personal data is not stored outside the source systems, a principle known as zero-storage architecture. This means: no intermediate copies, no temporary files containing real data, and no additional storage risk.

Step 2: AI-powered PII detection

Personal data is not only found in standard tables. Customer-specific Z-tables, free text fields, log files, customizing tables: personally identifiable information (PII) can be contained everywhere.

AppSafe uses AI-powered PII detection to automatically scan all tables. The algorithm identifies names, addresses, email addresses, phone numbers, bank details, and other sensitive information, even in undocumented fields. For SAP administrators, this means the manual effort of PII identification in complex environments with thousands of tables is eliminated.

Step 3: PET-based anonymization with DSIT algorithms

Following identification, anonymization is performed using DSIT (Data Safe Intelligent Transformation). DSIT employs Privacy-Enhancing Technologies (PET) to anonymize data in such a way that re-identification is no longer possible using any means reasonably expected to be used.

Anonymization differs fundamentally from masking. Masking replaces sensitive values with placeholders (e.g., "XXXXX") or hides them during output, thereby rendering the data useless for testing, development, or analysis. DSIT, on the other hand, transforms data into realistic, consistent substitutes: The values are fictitious, but structurally and semantically plausible.

Example: “Max Mustermann” is not transformed into “XXXXX”, but into a synthetic, yet credible name like “Lars Bergmann”, including appropriately transformed address, bank details, and email address. Referential integrity and system functionality remain fully intact.

Step 4: Fully functional anonymized copy

The result is a fully functional, anonymized copy of the production system. Users can log in. Transactions are processed. Business processes function end-to-end, from order-to-cash and procure-to-pay to hire-to-retire. The difference: All personal data has been anonymized to such an extent that re-identification, as defined in Recital 26 of the GDPR, is no longer possible using proportionate means.

Consistent anonymization across multiple SAP systems

SAP landscapes are heterogeneous. ECC and S/4HANA run in parallel. BW pulls data from multiple sources. CRM, Ariba, and external databases (Oracle, PostgreSQL) are integrated.

End-to-end application data anonymization in SAP requires system-wide consistency. The same anonymization rules must be defined centrally and applied to all systems.

AppSafe makes this possible through:

Central rule management: Anonymization rules are defined once and applied to all connected systems.
Cross-system consistency: Customer number 12345 is anonymized identically in SAP ECC, S/4HANA, BW, and Oracle. Re-identification through cross-references is prevented.
Preservation of referential integrity: Relationships between systems remain functional. Data flows between ECC and BW continue to function after anonymization.

This consistency is crucial for complying with GDPR requirements. Recital 26 requires that anonymized data "can no longer be attributed to a specific data subject." Inconsistent anonymization across multiple systems does not meet this requirement.

How technical anonymization supports GDPR compliance

Recital 26: Anonymized data is not covered by the GDPR

Recital 26 of the GDPR clarifies that anonymized data does not fall within the scope of the GDPR. According to Recital 26, the requirement is that, taking into account all objective factors, including costs, time expenditure, and technology available at the time of processing, identification of the data subject must no longer be likely according to general judgment. Pseudonymization is insufficient, as pseudonymized data still qualifies as personal data (Recitals 26 and 28 of the GDPR). Even simple masking methods that allow for re-identification with reasonable effort do not meet the requirements for effective anonymization.

Article 25: Privacy by Design and Privacy by Default

Article 25 of the GDPR obliges companies to implement technical measures that effectively enforce data protection principles. End-to-end data anonymization in applications supports this requirement.

Instead of relying solely on organizational controls (access rights, training, guidelines), the risk is addressed technically. Even in the event of a security incident in the test environment, no personal data is affected.

Article 5 paragraph 1 letter c: Data minimization

Article 5 paragraph 1 letter c GDPR requires that personal data must be "adequate, relevant and limited to what is necessary for the purposes for which they are processed".

For testing, training, and development purposes, real personal data is generally not required. Therefore, processing real data for these purposes is difficult to reconcile with the principle of data minimization (Art. 5 para. 1 lit. c GDPR).

Important: While data anonymization software can support compliance with the data minimization principle, it does not replace legal assessment by data protection officers or legal departments. Companies should have their DPOs validate the effectiveness of the anonymization measures they have implemented.

What you should consider when choosing data anonymization software

Not all data anonymization software is suitable for SAP test data management. Pay attention to the following criteria:

1. In-system processing instead of export-import

Legacy approaches export data, process it externally, and import it back. This approach doubles storage requirements, increases processing times, and expands the attack surface. Modern solutions like AppSafe process data in-system. No copies of personal data are stored outside the source systems (zero-storage architecture).

2. AI-powered PII detection

SAP landscapes contain thousands of tables. Manually identifying personally identifiable information (PII) is impractical. AI-powered PII detection automatically scans all tables, including custom Z-tables.

3. Cross-system consistency

Check if the solution supports multiple systems: SAP ECC, S/4HANA, BW, HANA DB, Oracle, PostgreSQL, SQL Server. Anonymization rules must be defined centrally and applied consistently.

4. Preservation of referential integrity

The anonymized copy must be fully functional. Users must be able to log in, execute transactions, and complete business processes. Automated integrity checks are essential.

5. SAP Certification

SAP-certified solutions have been validated by SAP. AppSafe is available through the SAP Store and is SAP-certified.

6. Containerized Deployment

Modern architectures rely on containerized solutions because these can be deployed quickly, scaled, and integrated into existing infrastructures (on-premisis).

7. Technically irreversible anonymization

Check whether the solution uses anonymization (irreversible) or pseudonymization (reversible). Only anonymization that, according to the standard of Recital 26 GDPR, precludes re-identification by proportionate means, results in the data no longer falling within the scope of the GDPR.

Conclusion: End-to-end anonymization as a technical necessity

End-to-end application data anonymization in SAP is not an optional measure. Article 25 of the GDPR requires privacy by design. Article 5(1)(c) requires data minimization. Recital 26 offers a solution: data that has been anonymized to such an extent that identification is no longer likely according to general judgment is not subject to the GDPR.

Traditional approaches (manual masking, inconsistent rules, lack of automation) fail due to the complexity of modern SAP landscapes. Modern data anonymization software like AppSafe enables end-to-end, cross-system anonymization with AI-powered PII detection, PET-based DSIT algorithms, and anonymization within the corporate network.

The result: Fully functional test, training, and development environments without data privacy risks. Faster SAP projects. Reduced risk of fines. Effective implementation of Privacy by Design.

Would you like to learn more about end-to-end data anonymization for your SAP landscape? Find out more about AppSafe in the SAP Store or contact Maya Data Privacy for a technical evaluation.

Legal Notice: This article is for informational purposes only and does not constitute legal advice. The assessment of the GDPR compliance of specific anonymization measures should be carried out by data protection officers or legal advisors. Maya Data Privacy accepts no liability for decisions made based on this article.