In late 2022, we updated many libraries in the SDV ecosystem to the Business Source License (BSL). In this article, we'll lay out more of the context behind this change and explain its impact on users.
Disclaimer: This article is not intended to provide legal advice. If you have questions about your specific usage, please consult with your legal department!
Keeping focus on our user centric mission
For too long, systems that were supposed to enable widespread use of machine learning (generative or predictive models) have had the wrong orientation towards users, implying things like:
- It's your responsibility to learn how to use this new technology, even though it takes a lot of time.
- You should meet the technology where it is right now, or else you'll miss out. If the technology is not directly usable for your data, you should put in the effort to modify your data.
- If you can’t measure ROI, don't worry — just focus on new possibilities and your ROI will come later.
This attitude of core developers has led enterprise users to spend time learning concepts and trying things that do not solve their problems. (For proof, look at the sheer number of hyperparameter tuning libraries and autoML libraries that have spun out in the last decade, promising that they will increase ML adoption!)
When we open sourced the Synthetic Data Vault, we had a specific mission: To enable our users to solve their data access and availability problems. Whenever you provide feedback, we interpret it from your perspective: If our software doesn't make sense to you, then we're not doing it right. We revisit our abstractions, add new features and bring in more business context.
Generative models can seem magical in their ability to take in an input and produce an output – until enterprise datasets are brought to them. It takes many data-centric features to make them useful and produce a return on our users' investment of time. Sometimes these are simple, like changing the names of steps, and sometimes they are complex, like inventing a whole new set of modules. We created several data centric modules in the SDV which include: inventing constraints so users can program business rules, creating the first end-to-end system that does reversible data transforms, and creating user-centric, interpretable metrics to measure success.
With our innovations in this space, we believe that synthetic data can provide real value to enterprises. However, as we target our mission, we are wary that others with marketing megaphones or business development mega teams could create noise using only parts of our software. This distracts the market, undermines the need for data-centric modules to create usable generative models and ultimately delays broader progress. Without the view into our roadmap or our active conversations with users, it is hard for anyone else to capture what will serve the user best.
For this reason, in 2023, we switched to a Business Source License (BSL). In the rest of this article, we'll walk through our decision to switch to the license and some scenarios that explain what the license means for our users.
Choosing the BSL
The BSL version 1.1 was created by the MariaDB corporation in 2017 to strike a balance between making code publicly available and allowing businesses who develop the code to grow sustainably. While there are many licenses we could choose from, the BSL directly addressed our core concerns without adding unnecessary restrictions for our users.
Our research showed that licenses typically fall into one of 3 categories:
- Permissive licenses allow most kinds of usage. An example is the MIT License, used in previous versions of the SDV.
- Copyleft licenses require that users release their code in open source with the same license, in a viral fashion. An example is the (A)GPL or SSPL, which are used in some other open source libraries (nowhere within the SDV).
- Business licenses allow usage based on the overall goal of the project. An example is the BSL.
We considered what these licenses would imply for a variety of SDV usage scenarios, as shown below.
As the name suggests, a permissive license such as the MIT License permits all usage, which leaves the SDV vulnerable. Meanwhile, a copyleft license would enforce a virality clause that would force users to publish any code they write. This would create an unnecessary burden for those just exploring or learning about the SDV, as it would prevent users from applying it to their own, private datasets. For us, the BSL ultimately finds the right balance between protecting our mission and enabling critical usage.
How does the BSL work?
The BSL is an eventual open source license that limits the amount of time our software stays in a restricted state. Four years past the release, the software turns into a permissive, open source MIT license. Note that each of the SDV releases starts a new 4-year clock, as illustrated below.
While we have applied the BSL to most libraries in the SDV ecosystem, we have kept the SDMetrics library under the permissive MIT license. We feel that no matter how you are creating synthetic data, a standardized library for independent evaluation benefits the entire synthetic data community. See the table below for a full list of libraries.
Can I still use the SDV library for my project?
While a library is under BSL, the allowable usage is based on the goal. Using the library for non-production work is always allowed, but extra restrictions apply if the project is meant for production. This allows a majority of our users to continue using the SDV, as illustrated below.
For further clarification for your particular usage, we encourage you to read the license and consult your legal department.
What if my project's usage is restricted by the BSL?
If the BSL prevents your usage of the SDV, you may purchase a license from the DataCebo that will allow you to continue with your project.
To get started, contact us to inquire about the SDV Enterprise and tell us more about your project. We're eager to work with you towards a solution!