Is a given open source tool popular? Is it used outside of Western countries? Do people actually use a tool after downloading it once?
For most open source tools, answers to these basic questions remain out of reach.
What is Open Source?
“Open source software is code that is designed to be publicly accessible—anyone can see, modify, and distribute the code as they see fit. Open source software is developed in a decentralized and collaborative way, relying on peer review and community production. Open source software is often cheaper, more flexible, and has more longevity than its proprietary peers because it is developed by communities rather than a single author or company. Open source has become a movement and a way of working that reaches beyond software production. The open source movement uses the values and decentralized production model of open source software to find new ways to solve problems in their communities and industries.”
definition via https://www.redhat.com/en/topics/open-source/what-is-open-source
For a long time, as the developer of an open source tool, you faced a zero-sum choice:
- build analytics into your tool to understand user behavior that could feed back into improvements to the product – and risk alienating the large portion of open source users who deeply value their privacy, OR
- skip including most analytics in favor of ensuring your tool’s user base maintains faith in your tool as truly privacy-protecting – despite the gap this imposes on understanding how your tool is being used and how it might be improved on the basis of key user analytics.
Most open source tool developers opt for the latter.
As recently as this year, for example, SOAP – which provides civil society organizations an app for building better security policies – explained that they collect very little in the way of website statistics to help understand how widespread use of their tool might be:
“Due to the nature of [the tool], protecting the user’s privacy was a key concern during its development. As a result, no invasive tracking or analytics are used and only the web hosting platform’s basic statistics (e.g. visits, page loads) are available to indicate activity on the website.”
On the one hand, we should be pleased that so many open source developers prize and prioritize the privacy of their users!
Yet on the other, the status quo hamstrings open source tools by preventing developers from gaining a deeper understanding of their users that could in turn inform future directions for development. For example, if developers knew that their tool is widely used in Arabic-speaking countries, they might realize the need for more documentation and tool resources in Arabic.
Furthermore, under the current paradigm, funders in the open source space have no reliable data that reflect the impact of their investments. Open source tools without meaningful analytics cannot report on usage statistics that might shed light on a grant’s impact.
Making Privacy-Protecting Analytics a Reality
Since 2017, developers of open source tools at the Guardian Project and OKthanks have been working on a solution to this problem. With partnership from Internews, we’ve created a toolkit that open source developers can use to implement privacy-protecting analytics.
Clean Insights is a methodology and a set of software development kits (SDKs) and consent-interface resources for developers. It provides user insights through an approach to analytics that respects and maintains user privacy.
Rather than tracking everything a user does, Clean Insights provides a means for developers to receive answers to specific questions they have about their users. The Clean Insights approach generates these answers by minimizing data collection, aggregating data at the source, and diluting the attributes of users by modifying the scale or order of magnitude (e.g. a region rather than a city, or a month rather than a week).
On top of these technical protections, Clean Insights provides developers resources for how to ask users for their consent prior to running analytics – including offering the means for withdrawing consent at any time. We further encourage developers to make publicly available the scope of their data collection and any algorithms used, in addition to explaining these well for a non-technical audience.
The Clean Insights toolkit is available in a variety of languages to support applications on mobile, desktop, and the web. SDKs are available in Javascript, Python, Rust, Android, and iOS. The Guardian Project has also made available free server-side storage to make it as easy as possible for developers to implement Clean Insights and get started collecting meaningful data.
Spreading the Word about Clean Insights
Now that there is a usable tool for responsibly and respectfully collecting user data, the next hurdle to clear is ensuring that every open source developer knows about this option! Please help us spread the word about Clean Insights.
Help us spread the word
Our vision is for every open source developer to be able to generate meaningful insights into how their tools are being used and how large their user-base really is. At the same time, by prioritizing informative consent modules and clearly explaining how their data will be protected, we hope that users of open source tools will make the choice to share their data to help benefit other users and the tool developers themselves.
With Clean Insights in widespread use, funders will be able to ask for and receive meaningful data that helps reveal the true impact of their investments in the open source community.
If you’re an open source developer and want help integrating Clean Insights, contact Nathan Freitas at the Guardian Project.