Repository with sources and generator of https://larlet.fr/david/ https://larlet.fr/david/
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

index.md 5.1KB

title: Slow Data slug: slow-data date: 2016-10-04 lang: en chapo: The OpenData cannot succeed with the current one-shot approach, it has to be a continuous process.

In our search for answers to a problem which appears if not intractable then complex, is the speed of the media’s technology – and the politicians’ willing participation in the 24/7 news cycle – obscuring rather than illuminating the issues?

Are we simplifying the arguments if only by default, by not investigating them fully, or by appealing to an emotional response rather than an explanatory one?

[…]

But it does not mean we are covering the news more deeply or more analytically. We may be generating heat. But are we really delivering light?

[…]

We may think we are absorbing more information. In fact we are simply giving in to the temptation of the easy over the hard, the quick over the slow.

*BBC Radio Director Helen Boaden resigns, criticising state of journalism* (cache)

The idea of slow journalism is not new (see The Slow Media Manifesto (cache)) and I recently discovered that it can be applied to data too (cache). For quite a long time actually:

Data is growing in volume, as it always has, but only a small amount of it is useful. Data is being generating and transmitted at an increasing velocity, but the race is not necessarily for the swift; slow and steady will win the information race. Data is branching out in ever-greater variety, but only a few of these new choices are sure. Small, slow, and sure should be our focus if we want to use data more effectively to create a better world.

*The Slow Data Movement: My Hope for 2013* (cache)

As a member of a team building an OpenData portal, these are questions we’re discussing on a regular basis. I wondered what would happened if I had to build something new from scratch. A few months ago, I made that experiment using Riot and Falcon (eventually not published because I don’t want to maintain it). The goal was to play with technical concepts from these frameworks and to deal with the complexity to serve data from various sources and qualities. My budget was quite constrained with less than ten evenings. After a while, I realized how hard the task was. Not (only) on a User eXperience point of view but because current data are so messy that you can’t easily pick up — even manually — some datasets and make them shine.

Maybe what we need the most is a Chief Data Editor, not a Chief Data Officer. Someone in charge of refining, storytelling and finally caring about the data. And when I say someone, this is actually a whole team that is needed given how ambitious the task is. Indexing data submissions is only the stage 1 of what could be achieved with OpenData and we experienced how limited it is in its externalities. Raw data yesterday, curated data tomorrow?

What if hackathons were not gigantic buzzword bingo sprints. Maybe we can turn these events into marathons. Put together a team for a week that focuses on a unique dataset, not necessarily full-time. The goal is to deliver a usable version at the end of the week and to celebrate what has been accomplished. Turn the shiny investor/mentor crap demo into a useful explanation of dead-ends and tools in use for the clean up that can be useful to the whole community. Curathons, really?!

Another option is to improve data directly at the source. Data is somehow a static API and as such a conversation too! Both producers and consumers of the data would benefit from more communication on how they actually (re)use it, why they are blocked, which are technical/political challenges to provide a better version and so on. The OpenData cannot succeed with the current one-shot approach, it has to be a continuous process.

It takes way more time to understand the actual issues in the lack of reutilizations and maybe it would lead to less datasets released at the end of the day. But hopefully of better quality. And quality matters to lower barriers to (re)adoption. Giving thousands of datasets to a couple of geeks does not produce the same results as giving a hundred of reusable datasets to millions of citizens. Don’t get me wrong, we desperately need geeks to make them reusable in the first place…