BookmarkSubscribeRSS Feed

Back of the napkin: How can synthetic data help enterprises negotiate cross-environment challenges?

Started Thursday by
Modified Thursday by
Views 148

People have made MILLIONS starting with a back-of-the-napkin sketch!  I can't prove this, but it's what I hear.

 

Here's a back-of-the-napkin sketch showing how organisations can tackle constraints faced when operating in multiple environments, and how synthetic data can help work around those challenges.

 

Synth_data_lower_env_generated.png

 

Consider a bank, or any other complex organisation operating under tight regulations and policies.  Organisations have strict guidelines and restrictions regarding how they store and use customer data, especially data about customer identity and behaviour, to power their analytical solutions.  They are usually required to maintain compartmentalisation between production and other "lower" environments.  Lower environments are meant for development, testing and staging (pre-production environments which are as close as possible to production environments).  They are necessary to ensure compliance and vary in their security profile and data content.

 

Banks are constrained by regulation and internal policies regarding the data they can access in lower environments.  However, in order to ensure that their proposed solutions undergo rigorous testing,  they require data which is as similar to production data as possible.  

 

Access to realistic, production-grade data ensures solution robustness and enables the bank to develop and test a solution in a way very similar to how customers interact with solutions that make use of production data - credit data, behavioural and transactional data, demographics etc. - and gain value.

 

Organisations have conventionally worked around these challenges by generating a sample of production data and then rigorously running anonymisation, masking and data transformation to ensure this data is sufficiently anonymised and does not pose a data leakage problem. As they do this, organisations attach more importance to ensuring data is not compromised and may overcompensate in that direction, leading to test/dev data that does not resemble production data.  Furthermore, this process requires considerable manual efforts which hinder productivity.

 

How can Synthetic Data Help?

Synthetic data, generated through SAS Data Maker, enables organisations to create a  data that is very similar to production data, but which does not compromise customer privacy by leaking identifiable information (either through explicit or implicit identifiers).  Synthetic data is generated using original (production) data as source data, which ensures useful patterns are retained.  Data owners and data stewards, who have access to source as well as synthetic data, are provided opportunity to evaluate a synthetic data generator ( a machine learning model which generates synthetic data) on the basis of:

 

1. Similarity : How well do synthetic data patterns resemble the original? 

2. Privacy Disclosure:  How much risk does the synthetic data surface in terms of identifying original data

3. Structure:  How are relationships and specific data patterns (such as sequential data) preserved?

 

Once they are satisfied, data owners generate synthetic data of a desired volume and move this data to lower environments for use in the lower environment.  The data that's made available to the lower environment is completely synthetic and can be transferred through  workflows that are dependent on assessment of evaluation results, leading to proper governance around the process.

 

To summarise,  synthetic data enables rapid and convenient access to realistic, production-like data in lower environments through secure and governable processes.  

 

Now, don't lose your millions by throwing away this napkin!

 

Contributors
Version history
Last update:
Thursday
Updated by:

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Latest on SAS Data Maker
Want more? Visit our blog for more articles like these.
Article Tags