Back

From Internal Script to Enterprise Feature: The DayZ Synthesizer

Plamen Valentinov Kolev
October 15, 2025
Product
Software testing


At DataCebo, our mission is to enable enterprises to create tabular synthetic data using the Synthetic Data Vault (SDV). Since our SDV Enterprise platform tackles the most complex customer schemas — packed with intricate relationships, constraints, and conditions — we built an internal script to streamline our testing process. What started as a simple debugging tool has evolved into a powerful feature that’s transforming how enterprises access data: the DayZSynthesizer.


The Spark: A Simple Script to Test Our Models

To test our enterprise platform with data similar to what our customers have, our team created a lightweight script to generate mock data. It was basic but effective:

  • Categorical columns? Generate random values from A–E.

  • Numerical columns? Generate integers from 0–100 or floats from 0–100.0.

  • Unique values? Generate a UUID.

  • Datetime? Generate a random range of datetimes.

  • Multi tables? Create data with referential integrity.

This script generates random data, which we use to test the generative modeling algorithms within SDV Enterprise. These algorithms create two kinds of synthesizers - the HSASynthesizer (which can generate an unlimited number of tables present in enterprise grade datasets) and the HMASynthesizer (which can generate 3-5 tables with a depth of <2). This testing confirms whether our enterprise product will work for the complex data most of our customers have. We also used data from this script to test our product on various dataset sizes, and shared the resulting performance estimates (for example, HSA takes 10 seconds to run, while HMA takes 1 hour). This utility stayed behind the scenes — until our customers took notice.


The Pivot: Customers Saw the Value

When we shared the performance estimates with one of our customers, we shared the mock data as well. Interestingly, the customer was impressed with the mock data itself. It created data that strictly followed their metadata. They asked, “If we provide a few statistical details, can you make it even more realistic?”— referring to details such as "a categorical column has 100 unique categories" or "a numerical column has a range from 0–300." We immediately enhanced the script to incorporate these inputs, creating more precise mock data and making our 'stress testing' more powerful.

This upgraded “mock data” creation script has become a key feature of our onboarding process. We use it to test our product, and to make demos for customers using mock data that structurally matches their real data — something they want to see before they buy the product. Many AI products do well on well-crafted demo datasets, but fail miserably when deployed on customer data. Being able to generate mock data that was so similar to our customers' real data reassured them (and us) that our products could handle their data. 

On its own, this was a big boost for us. But to our surprise, customers didn’t just want the results—they wanted the script, so they could generate this mock data themselves!


The Insight: A Hidden Need

Why the demand? Many enterprise departments lack access to real data, due to strict governance or long timelines to approval. Building systems without data is like coding in the dark. Metadata and schema details are often accessible, but without the actual data, engineers run into a bottleneck. Our script solved this by generating mock data that respected real data's uniqueness, relationships, and even statistical distributions. It enabled teams to:

  • Test new systems using realistic placeholders.

  • Prototype new database columns without historical data.

  • Accelerate development cycles by weeks or even months.

Our internal tool unexpectedly solved a critical enterprise challenge. Customers often received 'mock data' before accessing the 'real data' used to train our synthesizers, sparking curiosity about our 'magic script' and driving demand for this game-changing feature.


DayZSynthesizer was born

Recognizing its potential, our team decided to turn  this script into a polished feature. Enter the DayZSynthesizer (short for Day Zero Synthesizer)—a tool that generates synthetic data from scratch using only metadata. No real data. No machine learning. Just instant, high-quality test data from day zero.

The DayZSynthesizer enables enterprises to:

  • Bypass data access delays: Generate realistic test data while awaiting approvals for anonymized datasets.

  • Accelerate innovation: Build and validate systems faster, even without real data.

  • Enhance Flexibility: Test new schemas or columns with ease.


The Impact: A Strategic Win

What began as an internal tool is now a key feature of our enterprise offering. The DayZSynthesizer doesn’t just complement our synthetic data platform—it unlocks new possibilities for our customers. Departments that once waited months for data can now prototype and iterate in days. To us, this is a testament to how listening to customers and embracing unexpected opportunities can drive innovation.

Share:
The Synthetic Data Vault

Let’s put synthetic data to work

Contact us
Datacebo logo

Make synthetic data a reality

© 2026 DataCebo, Inc.