title: Paradigm shifts for the decentralized Web
Most Web applications today follow the adage “your data for my services”. They motivate this deal from both a technical perspective (how could we provide services without your data?) and a business perspective (how could we earn money without your data?). Decentralizing the Web means that people gain the ability to store their data wherever they want, while still getting the services they need. This requires major changes in the way we develop applications, as we migrate from a closed back-end database to the open Web as our data source. In this post, I discuss three paradigm shifts a decentralized Web brings, demonstrating that decentralization is about much more than just controlling our own data. It is a fundamental rethinking of the relation between data and applications, which—if done right—will accelerate creativity and innovation for the years to come.
The movement to (re-)decentralize the Web is sometimes dismissively regarded as a modern-day hippie reaction to the ever increasing power of technology giants such as Facebook and Google. And while the David versus Goliath way of thinking is definitely present among decentralists, there are many more advantages to a world in which people and organisations regain the ability to store their data wherever they want—without missing out on the enormous potential and diversity the Web has to offer.
Ultimately, decentralization is about choice: we will choose where we store our data, who we give access to which parts of that data, which services we want on top of it, and how we pay for those. Nowadays, we are instead forced to accept package deals we cannot customize. For example, Facebook shows us their social feed featuring our friends, paid for by advertising—but only if we store our personal data on Facebook. We cannot see Twitter posts from others (unless they opt to copy their data to Facebook as well). These deals are common, as virtually all Web applications we interact with behave in such a way. Instead of arguing the appropriateness or ethics of this situation, I’ll sketch the far-reaching effects of a decentralized approach to Web development.
End users become data owners. This is the most well-known decentralization aspect: we store our data in places of our choice, which improves privacy and control.
Apps become views. As apps become decoupled from data, they start acting as interchangeable views rather than the single gateway to that data.
Interfaces become queries. Data will be distributed across highly diverse interfaces, so sustainable apps need declarative contracts instead of custom data requests.
The basis of decentralization is that people choose where they store their data. Instead of having to pick between a handful of providers such Google or Facebook, in a decentralized world, there will be many options to pick from—and we are free to create our own. This idea brings us back to the original vision for the Web, where anyone has their own website or blog and publishes their thoughts on there, rather than in a single stream owned by one company.
To a certain extent, we already have that choice: since its inception, the Web’s decentralized architecture has allowed anyone to have their own space. However, we want the convenience of the single stream without the central control that currently comes with that. We want to continue enjoying the same types of services that nowadays are only available on centralized platforms. So the important question is: can applications on top of decentralized data behave the same way as centralized apps? For example, can we still generate a friend list and news feed like Facebook does—even if our friends’ data is stored on different servers?
On the one end of the spectrum, centralized solutions store all personal data they use themselves: Twitter and Facebook are single data hubs for respectively millions and billions of users. In contrast, the decentralized microblogging network Mastodon lets anyone set up their own Twitter clone, counting around 1.5 million users spread across 2,400 servers. A couple of thousand people share a server, and the application can read content from people on other servers as well. The Solid platform takes this further, introducing the concept of one data pod per person. Such a data pod is a simple data storage location on a server, equipped with highly granular access control, so anyone can decide exactly which people and apps can access what parts of their data. Applications become clients of these servers, sourcing data from multiple data pods. Solid eventually envisages a world of multiple data pods per person: one at home for personal data, one at the office for sensitive work files, one at school to track study material, etc. In this post, I assume such a high degree of decentralization. Note that the names in the above axis are envisaged uses: it is theoretically possible to use Mastodon or Solid in different ways, and other platforms exist.
In a fully decentralized social network, every single part of an interaction—which would now be stored in its entirety on Facebook—could reside in different data pods. Consider this social media post, where an author states his professional opinion on an online news article. Literally each single piece of data can be in another data pod:
In the fully decentralized way of thinking, everything you post is stored on your own website or server. An app collects all posts from people I am following from different servers, and displays them in a feed for me. When I like your post, this “like” is stored on my server. This action triggers my server to send a notification to yours, so you can decide whether to display this like or not—for instance, by storing a copy of it. Every comment or like on any item is stored at a location chosen by the poster, together with its associated permissions to read, write, or delete it.
This paradigm of storing everything in a place we control is fundamentally different from the centralized one, and has several beneficial consequences for users. It improves privacy, since you can say whatever you want about anything, without having to disclose this to Facebook or anyone else. This positively impacts freedom of speech and goes against censorship (with all of the associated consequences and debates). The flexible access control can be used in any way imaginable: even individual likes or comments could only be visible to certain people, groups, or applications—and you can change those permissions at any time. All this is what it means to truly be a data owner.
Other than with centralized platforms, trust is not derived from a single party. For instance, if I claim my post has 124 likes, then we believe this because Facebook says so (and frankly, we have no objective reason to doubt that). In a decentralized scenario, I could prove that by linking to the individual likes that are stored on other servers, which form a provenance trail. And if those links break for any reason (for instance, if people retract their like), I can still prove they once liked it, if my app made a copy of their digitally signed like on my post. This mechanism can replace networks that are largely based on authority, such as LinkedIn, where people build a reputation from the people they are connected to. We can essentially replace LinkedIn by an address book, where somebody is a connection if they also have you in their contact list.
The main challenge with full decentralization of data is scalability. In the Mastodon scenario, there are still relatively few servers for many users. In the Solid scenario, there might even be more data pods than users. In the end, decentralization will go hand in hand with dynamic data replication, which will need to be balanced carefully with fine-grained access control possibilities in order to guarantee data privacy.
By breaking the tight coupling between data and applications, decentralization questions and alters the very nature of an application. While the second paradigm shift comes as a direct consequence of the first paradigm shift we discussed above, it is equally crucial in its own way.
Basically, the competitive advantage of many of today’s popular centralized platforms is their data silo, and the fact that their service depends entirely on access to that data. Conceptually speaking, the service offered by Facebook, Twitter, and LinkedIn is fairly simple and could be replicated easily by others. Yet a major reason why people appreciate the services of these platforms is because of their data: Facebook is engaging because our friends’ data is there, Twitter has all of the world’s tweets and direct messages, and LinkedIn showcases our broad networks. In fact, these platforms have become inseparable from their data: we use “Facebook” to refer to both the application and the data that drives that application. The result is that nearly every Web app today tries to ask you for more and more data again and again, leading to dangling data on duplicate and inconsistent profiles we can no longer manage. And of course, this comes with significant privacy concerns.
In contrast, decentralized Web applications decouple data and applications: you enter data only once—in your own data pod. Instead of maintaining credentials with each app, you log in through your data pod and give apps permission to read or write specific parts of your data. The Web’s ecosystem thereby evolves from bundled data+service packages into applications as interchangeable views, wherein each Web app provides consistent visualizations, interactions, and processing over your personal data pod. Furthermore, those apps let you interact with any other data pods you have access to, such as those of your friends. Applications ask rather than own, and they are able to reuse data create by other apps, avoiding vendor lock-in.
In this ecosystem, Facebook’s friend feeds becomes a view over your contact list in your data pod, combined with the latest messages your contacts have posted in their data pods. Decentralized LinkedIn and Doodle could be granted access to your address book, so your list of colleagues would always be in sync for meeting requests (because there would actually only be one list instead of multiple). Decentralized Doodle and Facebook could both be granted access to your calendar, where Doodle can only see when you are available, and Facebook can only add events. Any change in one view is directly reflected in another because they share the same storage.
Importantly, this disentanglement of data and services creates separate markets for data and applications. Each of those to markets comes with its own competitive forces that stimulate creativity and innovation at a higher rate, since the ability to provide a service no longer depends on ownership of data.
On the application market, whoever can make a more user-friendly social feed than Facebook, or show a better network overview than LinkedIn, is able to attract people solely based on its quality of the service. Moreover, people can choose the application that serves them best, and can switch between applications at any time, since all apps are views over your personal data pod. Instead of entering your name and e-mail address over and over again, you instead log in with your data pod to give access to these pieces of data—and you can revoke this permission at any moment. Moreover, integration becomes simple: if an existing application lacks specific functionality, you can easily write a small app that provides a new view on the same data.
On the data market, different options emerge as well. Depending on your requirements, you might prefer different storage providers. The most technologically advanced of us could decide to host their own server, possibly based on existing software packages. For personal purposes, people might select providers similar to Dropbox—the difference being that their choice only depends on storage aspects and not on application functionality. More expensive plans could for instance provide additional backup or security options. For professional purposes, an even wider range of solutions could exist, ranging from on-site storage to cloud-hosted packages. Universities could provide their students with storage for anything related to their education, and governments could do the same for citizens’ official documents. Specialized software solutions could emerge for law offices or hospitals, where sensitive data is treated appropriately according to data retention policies. Currently, such use cases require people to accept whatever application comes with their desired storage option.
The key to a healthy ecosystem is the independence of these two markets, realized through a noncommittal relationship between apps and data. Since there currently exists no such separation, new innovative application platforms have trouble emerging because they don’t have the data—and existing platforms lack incentives to innovate adequately because they already possess data data anyway. This competition argument is highly similar to the Net Neutrality debate, which strives to maintain the separation of the content and connectivity markets. Indeed, we can regard a fully decentralized approach as a way to realize platform neutrality, where applications and storage solutions become interchangeable, just like websites and Internet providers.
The current generation of Web applications communicates with servers through a highly specific sequence of steps that are hard-coded into the application logic. These steps contain specific requests to a Web API, a (typically custom) interface exposed by the server. If applications become views over many different kinds of data pods, an important question is what interface these data pods need to expose.
It seems unrealistic to hope that all of these data pods would have the same Web API (be it Linked Data Platform, SPARQL, or GraphQL). Not only would this require a standardization effort without precedent, such a standard could never cover all cases. Given that we aim for competition on the data market as well, different kinds of data pods are expected to provide different kinds of interfaces with varying expressivity. On top of this, on a decentralized Web, the data needed by applications will be scattered across multiple data pods. So even if all pods had the same interface, apps would still need to route requests to the right pods and combine their data.
This indicates that decentralized apps shouldn’t bind directly to concrete Web APIs, because this would limit them to specific data pods at a specific point in time. If their interfaces evolve, or if we want to access different data pods, apps would need to be reprogrammed. Clearly, such a fragile contract between the app and data markets would form a major bottleneck to sustainable growth and scalability. Instead of hard-coding a specific sequence of requests, the application logic should formulate in a higher-level language what operation it wants to perform with data.
Therefore, I believe that decentralized Web applications should exclusively use declarative queries to view and update data on our pods, so their expression of the intended data operation remains constant—even if interfaces are different. Rather than directly interacting with pod interfaces, queries are processed by a client-side library, which translates these queries into concrete HTTP requests against one or multiple data pods. This means that, rather than a horizontal interface orientation or a vertical interface that directly accesses the Web API, decentralized Web applications need a vertical interface orientation with an internal horizontal separation.
By abstracting all of an application’s operations as declarative queries, we enable an independent evolution of apps and server-side interfaces. At design time, apps only bind to slowly changing high-level queries instead of rapidly moving and changing low-level interfaces, so they don’t need to commit to a specific data pod. At runtime, the client-side query engine library—which can be shared across many applications—is responsible for interfacing with the concrete Web APIs of the relevant data pods for a given user. This also enables transparent data replication and aggregation, which will be necessary to speed up data collection across many pods.
While reducing the dependency of applications to queries facilitates their development and improves their sustainability, it implies a complex, cross-API query engine. I envision that multiple implementations of such a query library would compete, and eventually replace the API-specific client-side libraries that are symptomatic of tight coupling between clients, services, and their underlying data. A possible direction to realize this in a scalable way is to split monolithic Web APIs into API features, which can be reused across data pods. These pods could then opt to provide different kinds of capabilities—such as Linked Data Platform, (subsets of) SPARQL and GraphQL, or Triple Pattern Fragments—depending on the service level chosen by their users. In the ideal case, a data pod supports all queries required by the application, so the library can send them straight through; in other cases, it splits queries into multiple requests.
The combination of decentralization and query execution also confronts us with a temporally different way of interacting with data. In traditional Web applications, the procedure is typically “send query—wait for execution to complete—act on all results”. In a decentralized setting, we know that data collection will take time, so applications should be prepared to do more useful things instead of just waiting. The procedure becomes “send query—act on each incoming result”, processing every piece of incoming data a streaming way. In general, completeness should never be assumed, given that the Web is an open world. This is an additional indication of how radically the relation between data and applications will change.
Each of the above paradigm shifts show that decentralizing the Web is about reorganizing power. First, people gain the power to control their own data and privacy. Second, new applications and data solutions gain competitive power through the resulting decoupling of apps and data. Third, the expressive power of applications improves by depending on transferable queries instead of low-level interfaces.
What I describe in this blog post is slowly but steadily happening, and was in fact inspired by prototypes that currently exist. The decentralized editor dokieli and its annotation functionalities convincingly demonstrate that every atomic piece of data can be stored in a different place. Spending the summer at MIT’s Decentralized Information Group revealed the possibilities of simple server-side data stores such as Solid for advanced client-side applications. It’s there that I saw for the first time how data can drive everything seamlessly—while apps become simple views. Mashlib proves how everyday needs can be addressed with small applications, since these can tap into existing data instead of needing to duplicate input functionality to ask for basic details over and over. Finally, our work on Linked Data Fragments—and its Solid plugin—aims to grow decentralized querying to a Web scale.
A question many people have is whether decentralization is realistic in real-world scenarios. On the one hand, I’m inclined to answer that, in any case, the Web cannot possibly become more centralized than it already is today. Facebook has already become a main gateway for such an immense number of people, that the only logical direction forward is less centralized. My conviction is based more than just gut feeling, since I see several parties come to similar conclusions. On the other hand, I have experienced the enormous potential of many aspects of the decentralized vision. The idea doesn’t need to start solely with enthusiastic technophiles, but can grow from concrete industry needs. The notion of a private, on-premise data pod appeals to sectors such as finance, law, and healthcare, which have a promising market for high-security data pods and transparent information access through apps as views. From a digital society perspective, personal data pods address the problems we are facing with an increasing number of parties asking consistent access to specific parts of our personal data. I also liked Jim Hendler quoting Marvin Minsky that we’ll know computers are truly becoming intelligent when they won’t ask for the same info twice ever again. Approaching decentralization the right way will enable exactly that.
The final question is who will pay for all of this. The good news is that we’ll have a choice there too. Bundled package deals such as Facebook and Twitter only offer the ad-based payment option with its infamous consequences. In a decentralized world, we can choose our data and app providers independently, and decide for each how we are willing to pay. The bad news is that this means not everything is going to be “free”, as it seemingly appears now. However, increased competition—on two separate markets—should lead to fair prices. And if we really want free options, we could even imagine paying with our personal data, giving selected parts away in exchange for ads. That’s of course how social media are implicitly supported now, but the main difference will be that we decide which data can be used for advertising purposes and which cannot. This proves once more that, at its core, decentralization starts with us taking back control of our data, as a source for a new generation of innovative Web applications.