The most important open source demographic that no one thinks about.

Defining an Open Source User for 2023 and beyond

Code contributors are an essential part of an open source (OS) project. But in our experience, making code contributors the sole focus of an open source project ends up disenfranchising another large segment of important people: a library's users. This segment, we have found, is more critical to our success, providing indispensable feedback, finding use cases and helping us to improve our open source and the product that relies on it. In this first article (in a series), we synthesize key attributes that we use to identify a user.

Traditionally, open source libraries have been centered around software development, as collaborating on code is vital for maintaining complex software. It has become customary to use the number of unique code contributors as a core metric of a given library's success. Metrics of success drive how the overall ecosystem is maintained, including how software is designed (APIs), the audience for which usage guides are developed, the types of demos that are built, and how communications are handled. To bring users into fold, all these need to be revisited keeping them in mind.

At the same time, open source is proving to be a successful model for startups. As the core maintainers of the Synthetic Data Vault project — the world's largest open source library for modeling and generating tabular synthetic data — we are constantly striving to realize the benefits of this model firsthand. For us, open source has been vital to building a trusted and usable machine learning system. With this and subsequent articles we are synthesizing our current thinking about open source, and some key lessons we have learned on our way to this point.

Who is a user?

Our definition of a user is: Anyone who attempts to use our open source library to solve their problem. Generally, users:

…are goal-oriented. A user comes to our library with a specific project that they're working on.
…have limited time. A user often has a deadline for their project. They may not have time to learn the nitty-gritty details of our software, or engage in deeper conversations about its development.
… have different expertise. A user is probably coming to our library to help with a project in their own domain, whether that's healthcare, clean energy or something else. They might not have the same knowledge base as a professional software developer would (although they also might — more on this later).

While this definition may seem straightforward, these attributes have become the cornerstone of how we maintain and communicate about our software, and how we develop APIs. They have also inspired the main question we use to measure our progress: Is our library making a material difference in users' projects? In subsequent articles, we plan to share how we applied these strategies to build the largest open source user community around synthetic data, and what we have learned in the process.

What changes are we making to set up an OS for success?

Charting this path with a laser-sharp focus on the “user” has required us to address some commonly asked questions up front, both for our team and externally. Here are just a few.

Users are developers too and probably more critical for our success

Just because a user isn't interested in learning the internal details of our software doesn't mean we can automatically categorize them as not a developer. They may be experts in other fields and may be developing software there. In addition, they are still using our Python API to help them with their project — and therefore, they are developing software. To expand and serve our user base, we focus our efforts on what we want to achieve with our open source strategy, rather than creating different strategies based on the perceived skill level of who is using our library. As a result, we want every API we publish to be understandable and usable by everyone. We want our communications to be cognizant of the fact - they don't have time! In 2023, we believe that everyone is a developer — or at least, we like to serve everyone and make them part of the software movement.

User friendly APIs are game changers

One question we asked ourselves was "shouldn't the user friendliness delegated to graphical user interfaces (GUIs)?". GUIs finalize a straight, stepwise process to successful project completion, while the code provides flexibility to try things slightly differently. When they feel restricted by the straight stepwise process for their specific project/use case, pioneering users instead use code. Creating a user-friendly API that lets users apply our open source to their project in a transparent way, and provides access to different metrics and progress states at different stages, gives our users a great chance at succeeding. It also helps us to efficiently discover more pathways, and most importantly, more use cases. This makes our open source essentially a low-code version of what goes into the product.

Github stars are not enough

Github star histories are regularly used to indicate an exponential growth curve for a library. They are often considered a leading indicator for a need in the market that the library may be targeting, or top of the funnel for an open source, and there are now well-developed strategies for growing stars over time. Used effectively, we find these strategies to be a good marketing tool, and well-intentioned for increasing the top of the funnel. We ourselves use them from time to time, as they increase reach and can bring in more users. But we find that they should be balanced with feature development, carefully listening to users, and measuring how often folks are downloading and using the library and raising issues. Star growth should be followed by growth in downloads and issues raised by users.

We look forward to discussing our experiences with open sourcing in 2023 and beyond. In the articles that follow, we will share some more of our strategies and measures for engagement. We welcome any thoughts, comments, suggestions and questions below.

The Most Important Open Source Demographic That No One Thinks About

Who is a user?

What changes are we making to set up an OS for success?

More blog articles

Why are comparisons to SDV Community misleading for enterprise evaluation?

How DataCebo Supports Enterprises: Fast, Safe, and Effective

Differential Privacy for Synthetic Data (Part II): Trust-but-Verify

Join the DataCebo Forum

Explore our blog