A place to cache linked articles (think custom and personal wayback machine)
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

index.md 5.0KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
  1. title: Datensparsamkeit
  2. url: http://martinfowler.com/bliki/Datensparsamkeit.html
  3. hash_url: 2d7e6b8ac2e7f33001a34f1884f00d6c
  4. <p>Datensparsamkeit is a German word that's difficult to translate
  5. properly into English. It's an attitude to how we capture and store
  6. data, saying that we should only handle data that we really
  7. need.</p>
  8. <img src="http://martinfowler.com/bliki/images/datensparsamkeit/sketch.png"/>
  9. <p>These days there's a lot of hype around the idea of Big Data -
  10. and with it the notion that we should capture and store every bit of
  11. data we can get our hands on. We might not have an immediate use for
  12. the contacts our users store in their address books, but we'll ask
  13. for it anyway in case it comes in useful later. We'll record every
  14. click on our website and squirrel it away in case we want to trawl
  15. it later. We set up our smartphone app to ask for location information so
  16. if we come up with some way to use that data later, we can. After
  17. all, storage is cheap - so why not?</p>
  18. <p>The problem with the "capture-it-all" approach is that it raises
  19. serious questions of privacy. Even if we trust ourselves to not
  20. abuse the data we collect, each data store represents a target for
  21. criminals or government surveillance agencies. This issue is
  22. particularly fraught in Germany which has seen successive regimes
  23. where governments have carried out extensive surveillance of their
  24. citizens in order to control them. Germany consequently has strong
  25. data privacy laws. </p>
  26. <p>Datensparsamkeit <a href="#footnote-pronunciation">[1]</a> is a concept from these privacy laws that is an
  27. opposite philosophy to "capture-all-the-things". A translation isn't
  28. straightforward (which is why I've retained the German word) but
  29. loosely you might translate it as something like "data austerity",
  30. "data minimization", "data parsimony", or "data frugality". It means
  31. that you should always ask yourself why you are capturing or storing
  32. data, and look to handle only the minimum amount of data you need
  33. for your purpose.</p>
  34. <p>An example of this is tracking users on your web site to
  35. determine how many unique visitors you have. If the same person
  36. accesses several pages within a few hours, you want to count that as
  37. one visit. If they visit several times a month, you still only want
  38. to count them as a single visitor. One way to do this is to log
  39. IP addresses, you count each IP address as a single person <a href="#footnote-nat">[2]</a>. But an IP address is very revealing, and could be
  40. used for much more than counting vistors. Datensparsamkeit suggests
  41. that you shouldn't store the IP address directly, perhaps instead
  42. you should hash it and only store the hash.</p>
  43. <p>A similar example involving IP addresses is using them to infer
  44. demographic information such as region and country. You can get most
  45. of this information and practice datensparsamkeit by just logging the first
  46. three octets of the IP address.</p>
  47. <p>Datensparsamkeit isn't just about bad people stealing data, it's
  48. also about your relationship with the primary company themselves.
  49. The default attitude at the moment is that any data you generate is
  50. not just freely usable by the capturer but furthermore becomes their
  51. valuable commercial property. Privacy advocates,
  52. including me, think this assumption needs to be changed. Companies
  53. should only capture what they need and the burden of demonstrating
  54. need should fall on them. In addition, of course, they must be
  55. completely transparent about what they capture, what they store, and
  56. who they share their data with. Any breaches of data security must
  57. be immediately publicized (instead of covered up, which is the
  58. current default).</p>
  59. <p>Even if you don't share my views on personal control of our own
  60. data, the risks of security breaches mean that datensparsamkeit is a
  61. wise course of action. If you hold data that you don't need, and
  62. someone steals it and causes damage, shouldn't you be liable for
  63. that damage? Even if there's no legal liability the publicity will
  64. have serious consequences - and thus there is risk for anyone who
  65. doesn't practice datensparsamkeit.</p>
  66. <div class="acknowledgements">
  67. <h2>Acknowledgements</h2>
  68. <a href="http://erik.doernenburg.com/">Erik Dörnenburg</a>
  69. introduced me to Datensparsamkeit. The meme "… all the things"
  70. seems to have been around forever (at least a decade) so I'm glad
  71. Korny Sietsma taught me that <a href="http://hyperboleandahalf.blogspot.com/2010/06/this-is-why-ill-never-be-adult.html">it started in 2010</a>.
  72. </p></div>
  73. <h2>Notes</h2>
  74. <div class = 'footnote-list-item' id = 'footnote-pronunciation'>
  75. <p><span class = 'num'>1: </span>
  76. Here's some <a href = 'http://www.forvo.com/word/datensparsamkeit/'>help on pronunciation</a></p>
  77. </div>
  78. <div class = 'footnote-list-item' id = 'footnote-nat'>
  79. <p><span class = 'num'>2: </span>
  80. I realize that with Network Address Translation, things are
  81. rather more involved than this, but I wanted a simple example.
  82. </p>
  83. </div>