title: Slow Data
slug: slow-data
date: 2016-10-04
lang: en
chapo: The OpenData cannot succeed with the current one-shot approach, it has to be a continuous process.

> In our search for answers to a problem which appears if not intractable then complex, is the speed of the media’s technology – and the politicians’ willing participation in the 24/7 news cycle – obscuring rather than illuminating the issues?
>
> Are we simplifying the arguments if only by default, by not investigating them fully, or by appealing to an emotional response rather than an explanatory one?
>
> […]
>
> But it does not mean we are covering the news more deeply or more analytically. We may be generating heat. But are we really delivering light?
>
> […]
>
> We may think we are absorbing more information. In fact we are simply giving in to the temptation of the easy over the hard, the quick over the slow.
> 
> <cite>*[BBC Radio Director Helen Boaden resigns, criticising state of journalism](http://www.independent.co.uk/news/media/tv-radio/bbc-radio-director-helen-boaden-to-announce-resignation-at-prix-italia-preview-in-lampedusa-a7337181.html)* ([cache](/david/cache/1a2734bb5dedbf20dae4fdfde4531769/))</cite>

The idea of slow journalism is not new (see [The Slow Media Manifesto](http://en.slow-media.net/manifesto) ([cache](/david/cache/f9724957bafe95afa037369fca13ce7a/))) and I recently discovered that it can be [applied to data too](http://beautifuldata.net/2015/02/slow-data/) ([cache](/david/cache/3eaf45b44f04c098054f9c759486c1ca/)). For quite a long time actually:

> Data is growing in *volume*, as it always has, but only a *small* amount of it is useful. Data is being generating and transmitted at an increasing *velocity*, but the race is not necessarily for the swift; *slow* and steady will win the information race. Data is branching out in ever-greater *variety*, but only a few of these new choices are *sure*. Small, slow, and sure should be our focus if we want to use data more effectively to create a better world.
>
> <cite>*[The Slow Data Movement: My Hope for 2013](http://www.perceptualedge.com/blog/?p=1460)* ([cache](/david/cache/b5c1c067a928f48722ddf2f5b3ae187e/))</cite>

As a member of a team building an [OpenData portal](http://udata.readthedocs.io/en/latest/), these are questions we’re discussing on a regular basis. I wondered what would happened if I had to build something new from scratch. A few months ago, I [made that experiment](https://github.com/davidbgk/justdata) using [Riot](http://riotjs.com/) and [Falcon](https://falconframework.org/) (eventually not published because I don’t want to maintain it). The goal was to play with technical concepts from these frameworks and to deal with the complexity to serve data from various sources and qualities. My [budget](/david/blog/2016/simplicite-defaut/) was quite constrained with less than ten evenings. After a while, I realized how hard the task was. Not (only) on a User eXperience point of view but because current data are so messy that you can’t easily pick up — even manually — some datasets and make them shine.

*Maybe what we need the most is a Chief Data Editor, not a [Chief Data Officer](https://fr.wikipedia.org/wiki/Chief_Data_Officer).* Someone in charge of refining, storytelling and finally caring about the data. And when I say someone, this is actually a whole team that is needed given how ambitious the task is. Indexing data submissions is only the stage 1 of what could be achieved with OpenData and we experienced how limited it is in its externalities. **[Raw data yesterday](http://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide), curated data tomorrow?**

What if *hackathons* were not gigantic [buzzword bingo](https://en.wikipedia.org/wiki/Buzzword_bingo) sprints. Maybe we can turn these events into marathons. Put together a team for a week that focuses on a unique dataset, not necessarily full-time. The goal is to deliver a usable version at the end of the week and to celebrate what has been accomplished. Turn the shiny investor/mentor crap demo into a useful explanation of dead-ends and tools in use for the clean up that can be useful to the whole community. *Curathons*, really?!

Another option is to improve data directly at the source. Data is somehow a static API and as such [a conversation too](/david/blog/2016/specifications-apis/)! Both producers and consumers of the data would benefit from more communication on how they actually (re)use it, why they are blocked, which are technical/political challenges to provide a better version and so on. **The OpenData cannot succeed with the current one-shot approach, it has to be a continuous process.**

It takes way more time to understand the actual issues in the lack of reutilizations and maybe it would lead to less datasets released at the end of the day. But hopefully of better quality. And quality matters to lower barriers to (re)adoption. Giving thousands of datasets to a couple of geeks does not produce the same results as giving a hundred of reusable datasets to millions of citizens. Don’t get me wrong, we desperately need geeks to make them reusable in the first place…