Delivering reliable AI support without accessing customer data, enabling enterprises to develop, deploy and run generative models with confidence.
Join DataCebo Forum Today!
We’ve delivered our core product, The Synthetic Data Vault (SDV Enterprise Edition) to many enterprises. SDV Enterprise interacts with complex enterprise data, creates generative AI models, runs securely within an organization, and integrates seamlessly with existing pipelines. One recurring question we get is how would we provide support? Specifically, enterprises that want to scale want to assess across three specific criteria. Would we need access to their data to support them? Would they have to depend on our services to be able to successfully build models? and How fast are we able to address their questions?

Building generative models for relational databases requires a full end-to-end pipeline—automated data understanding, lineage detection, preprocessing, transformation, modeling, and finally sampling, which retraces the pipeline to produce synthetic data. One of the key strengths of our product is that it encompasses this entire workflow, enabling seamless integration into enterprise processes. Throughout this pipeline, built-in guardrails, checks, and balances allow us to systematically diagnose and resolve issues a customer may encounter—without requiring direct access to their data. This ensures both operational efficiency and strict data privacy.
Key aspects of SDV Enterprise that play a role in our scalable support
Comprehensive, User-Centered Documentation We maintain over 250 pages of documentation—the largest known for generative modeling of tabular data (SDV, SDMetrics, RDT, SDGym). Users frequently praise it as clear and easy to use, with one paper even highlighting it as exemplary. Documentation is never an afterthought. We design software abstractions with ease-of-use in mind, which naturally produces clear, coherent documentation. We are transparent about our design decisions—our user-facing Python API is openly discussed in GitHub issues. This clarity helps make our documentation both thorough and intuitive.

Right Abstractions for Robustness and Flexibility. Defining the right abstractions was key to SDV Enterprise’s design. The system is divided into distinct modules with clear roles, inputs, and outputs, each with its own repository and release cycle for modular development and maintenance. These abstractions provide dual benefits:
- Advanced users can target specific settings and workflows.
- We can quickly and accurately pinpoint issues when they arise.
To test the effectiveness of our abstractions, we categorize every incoming issue into one of four bins:
- Issues we know exist and already have a workaround.
- Issues we know exist and are simple to fix but haven't been addressed yet.
- Issues we don’t fully understand but can debug using the stack trace.
- Issues that surprise us entirely, of which we were previously unaware.
We firmly believe that when abstractions are correct, technical debt is minimal, and most issues fall into categories 1 or 2. In practice, even issues initially in category 3, once investigated, resolve into category 1 or 2. This approach ensures SDV remains robust, maintainable, and predictable.
Rigorous Testing for Reliable AI Software. Testing a software product that lets users build AI models—especially generative models—requires a specialized test bench. SDV undergoes multiple levels of testing, including 10+ automated regression bots. Because the software is probabilistic and must handle diverse datasets, extensive tests run at every merge, release, and monthly. These evaluate the full workflow—from preprocessing to sampling—measuring quality via the synthetic data produced, ensuring robust results across datasets.
Open Core: Battle-Tested and Enterprise-Ready. Our commercial version is built on an open core battle-tested by thousands of users, giving us insight into the correctness of our abstractions and exposing the software to enterprise-like datasets. While we never access the data itself, feedback from issues encountered provides valuable insight, helping the product achieve unmatched maturity. With more than 10,000 community contributions in testing, deploying SDV and reporting back any issues, this foundation ensures most customer-reported issues fall into categories 1 or 2, keeping the software robust and reliable in real-world enterprise environments.
Simulated Data: Safe and Effective Testing. We’ve developed innovative ways to simulate customer data for testing and debugging. If customers share metadata, our system generates simulated data that mirrors the structure and relationships of their real data, allowing us to test the product as if it were operating on the actual dataset. Since metadata—containing only column names and relationships—is non-sensitive, it can also be anonymized. Even anonymized metadata is enough to reproduce issues and debug effectively, ensuring privacy while maintaining robust testing.


