DataCeboOpen Source
The Most Important Open Source Demographic That No One Thinks About

The Most Important Open Source Demographic That No One Thinks About

23 January, 2023

Kalyan Veeramachaneni

Kalyan Veeramachaneni

Defining an Open Source User for 2023 and beyond

Code contributors are an essential part of an open source (OS) project. But in our experience, making code contributors the sole focus of an open source project ends up disenfranchising another large segment of important people: a library's users. This segment, we have found, is also critical to our success, providing indispensable feedback, finding use cases and helping us to improve our open source and the product that relies on it. In this article, we synthesize key attributes that we use to identify a user.

Traditionally, open source libraries have been centered around software development, as collaborating on code is vital for maintaining complex software. It has become customary to use the number of unique code contributors as a core metric of a given library's success. Metrics of success drive how the overall ecosystem is maintained, including how software is designed (APIs), the audience for which usage guides are developed, the types of demos that are built, and how communications are handled. To bring users into fold, all these need to be revisited keeping them in mind.

Open source is proving to be a successful model for startups. As the core maintainers of the Synthetic Data Vault project — the world's largest open source library for modeling and generating tabular synthetic data — we are constantly striving to realize the benefits of this model firsthand. For us, open source has been vital to building a trusted and usable machine learning system. With this and subsequent articles we are synthesizing our current thinking about open source, and some key lessons we have learned on our way to this point.

Who is a user?

Our definition of a user is: Anyone who attempts to use our open source library to solve their problem. Generally, users:

  • …are goal-oriented. A user comes to our library with a specific project that they're working on.
  • …have limited time. A user often has a deadline for their project. They may not have time to learn the nitty-gritty details of our software, or engage in deeper conversations about its development.
  • … have different expertise. A user is probably coming to our library to help with a project in their own domain, whether that's healthcare, clean energy or something else. They might not have the same knowledge base as a professional software developer would (although they also might — more on this later).

While this definition may seem straightforward, these attributes have become the cornerstone of how we maintain and communicate about our software, and  how we develop APIs. They have also inspired the main question we use to measure our progress: Is our library making a material difference in users' projects? In subsequent articles, we plan to share how we applied these strategies to build the largest open source user community around synthetic data, and what we have learned in the process.

What changes are we making to set up an OS for success?

Charting this path with a laser-sharp focus on the “user” has required us to address some commonly asked questions up front, both for our team and externally. Here are just a few.

Non-devs are critical

Just because a user isn't interested in learning the internal details of our software doesn't mean we can automatically categorize them as not a developer. They may be experts in other fields and may be developing software there. In addition, they are still using our Python API to help them with their project — and therefore, they are developing software. To expand our user base, we focus our efforts on what we want to achieve with our open source strategy, rather than creating different strategies based on the perceived skill level of who is using our library. As a result, we want every API we publish to be understandable and usable by everyone. In 2023, we believe that everyone is a developer — or at least, we like to serve everyone at their level so they can be part of the software movement.

User friendly APIs are game changers

Graphical user interfaces finalize a straight, stepwise process to successful project completion, while the code provides flexibility to try things slightly differently. When they feel restricted by the straight stepwise process for their specific project/use case, pioneering users instead use code. Creating a user-friendly API that lets users apply our open source to their project in a transparent way, and provides access to different metrics and progress states at different stages, gives our users a great chance at succeeding. It also helps us to efficiently discover more pathways, and most importantly, more use cases. This makes our open source essentially a low-code version of what goes into the product.

Github stars are not enough

Github star histories are regularly used to indicate an exponential growth curve for a library. They are often considered a leading indicator for a need in the market that the library may be targeting, or top of the funnel for an open source, and there are now well-developed strategies for growing stars over time. Used effectively, we find these strategies to be a good marketing tool, and well-intentioned for increasing the top of the funnel. We ourselves use them from time to time, as they increase reach and can bring in more users. But we find that they should be balanced with feature development, carefully listening to users, and measuring how often folks are downloading and using the library and raising issues. Star growth should be followed by growth in downloads and issues raised by users.

We look forward to discussing our experiences with open sourcing in 2023 and beyond. In the articles that follow, we will share some more of our strategies and measures for engagement. We welcome any thoughts, comments, suggestions and questions below.

Share:
Open Source

More Articles

3 user-centric growth strategies for open source
Neha Patki

Neha Patki

26 January, 2023

3 user-centric growth strategies for open source

Our open source grew faster when we adopted a user-centric mindset. Here are 3 strategies we used along the way.

Read more

The Most Important Open Source Demographic That No One Thinks About
Kalyan Veeramachaneni

Kalyan Veeramachaneni

23 January, 2023

The Most Important Open Source Demographic That No One Thinks About

How we define a user in 2023 to build a community around synthetic data.

Read more

Can you use synthetic data for label balancing?
Neha Patki

Neha Patki

10 January, 2023

Can you use synthetic data for label balancing?

Imbalanced data can prevent your projects from succeeding. Will synthetic data work? Explore the rationale behind label balancing.

Read more

Site

  • Contact Us
  • About Us

Open Source

  • SDV
  • RDT
  • GitHub