Launching Differentially Private Synthesizers in SDV

June 25, 2025

Product

Today, we are launching a powerful new functionality for our core product the Synthetic Data Vault (SDV) — the ability to train synthesizers (a.k.a. generative models) in a differentially private (DP) manner and a framework to validate the privacy preservation capabilities of these synthesizers. With this new capability, our users can train a synthesizer with state-of-the-art privacy protection techniques and share that synthesizer directly with end users, and/or for downstream use cases. This will break down data silos within enterprises and unlock the potential for increased experimentation with AI, internal and external AI application testing, and training and testing AI models in a federated setting.

With SDV, our users use their own vast data stores to build generative models, and then use those models to sample as much new data as they want. This new data has all the patterns of their real data, but is synthetic and has no connection to the real data. Our users use synthetic data to train AI models, develop and test software applications and simulate scenarios.

What are we releasing?

Generative models have transformed how we interact with all kinds of data, whether it's image, language, or tabular data. They have become immensely powerful at learning and emulating patterns from real data. When we considered introducing differential privacy to SDV, we had to think holistically about how to do so. In the process, we realized that generative models foundationally transform the classical differential privacy paradigm: Sharing differentially private synthesizers rather than just data, alleviates many of the burdens that data disclosure paradigms traditionally face. In an accompanying blog post: Differential Privacy for AI-Generated Synthetic Data (Part 1), we are sharing our thoughts about this powerful new functionality, how we built it, and how our users can make the most of it.

SDV Enterprise users can access the DP bundle and:

Build differentially private synthesizers using our DPGCSynthesizer. Once built, it can be shared with others and in most cases loaded in different environments to create synthetic data.
Use our evaluation framework to assess the privacy preservation capabilities of a synthesizer. While theoretically sound privacy preserving techniques are used to train these synthesizers, we wanted to work within a "trust-but-verify" framework by providing a way to empirically establish the privacy-preserving nature of the synthesizer. We call this SDV Verified. More on this in part 2 of our blog.
Perform privacy-quality (PQ) trade-off analysis using our PQ curves. Users can then choose a point on the curve depending on their downstream use case. PQ curves are designed with calibration in mind: On one end is random data (fake data) that still conforms to the format and is by definition private, and on the other end is the real data, which we know is not private.

A privacy-quality trade-off curve showing the empirical differential privacy estimated using our framework. The curve shows the existing SDV synthesizers that use no privacy enhancing techniques and DPGC synthesizers with different epsilon values. There is an inherent trade-off between privacy and quality and this curve captures it. More on this in Part 2 of our blog (coming soon)

Sharing a DP synthesizer not only changes the game for differential privacy — it's also a force multiplier for enterprise AI and data strategy. Here are a few scenarios in which this new tool has been useful to our beta customers.

Leveraging AI for your enterprise requires significant data experimentation, using differentially private synthetic data shortens this process. AI is bringing a never-ending stream of applications to build, try and test. Much of this data resides in silos, which must be brought together before you can test out a new idea. To do this, users create a virtual environment, bring the data together to experiment. The environment is then terminated after experimentation. The biggest bottleneck here is getting the data to the right environment while maintaining enough guardrails to protect it from any breach. Many experiments get blocked before experimentation even begins, putting a significant damper on AI strategy. With SDV’s DP synthesizers — which are just one file — you can instead load the synthesizer into the environment. Once it's there, you can create as much synthetic data as you'd like and use it to test the application. You can also be specific about what kind of data you want to generate, and experiment extensively.

Differentially private synthetic data can help you performance test external AI applications before they are deployed in your environment. Our customers are always trying to maximize the performance of their vendor’s applications. When they receive updates or enhancements of these applications they are not fully performance tested on their data because it is not available to their vendors. Hence they run slower in their environment. Ideally if vendors can test the application against customers' own data they could optimize the performance before they are deployed. However our customers can’t share large volumes of their data with a vendor. Instead they create SDV’s differentially private synthesizer and share it with the vendor. The vendor can use it to create a large volume of synthetic data and test the application. This results in the next release of the application being faster and more reliable when used in their environment.

Training AI models in a federated manner is made possible by SDV’s DP synthesizers.

In another example, say there is a hospital that wants to share data for an AI model development. Each hospital has only a few training examples. Using differential privacy, each hospital within a network can contribute private synthetic data to a central database that can be used for research.Hospitals that use SDV Enterprise can easily achieve this with our Differential Privacy bundle. This bundle includes synthesizers, transformers, and other tools that can be used to create differentially private synthetic data.

Launching Differentially Private Synthesizers in SDV

What are we releasing?

Sharing a DP Synthesizer is a force multiplier for data and AI strategy

Let’s put synthetic data to work