A place to cache linked articles (think custom and personal wayback machine)
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

5 年之前

  1. <!doctype html><!-- This is a valid HTML5 document. -->
  2. <!-- Screen readers, SEO, extensions and so on. -->
  3. <html lang=fr>
  4. <!-- Has to be within the first 1024 bytes, hence before the <title>
  5. See: https://www.w3.org/TR/2012/CR-html5-20121217/document-metadata.html#charset -->
  6. <meta charset=utf-8>
  7. <!-- Why no `X-UA-Compatible` meta: https://stackoverflow.com/a/6771584 -->
  8. <!-- The viewport meta is quite crowded and we are responsible for that.
  9. See: https://codepen.io/tigt/post/meta-viewport-for-2015 -->
  10. <meta name=viewport content="width=device-width,minimum-scale=1,initial-scale=1,shrink-to-fit=no">
  11. <!-- Required to make a valid HTML5 document. -->
  12. <title>Forget privacy: you're terrible at targeting anyway (archive) — David Larlet</title>
  13. <!-- Generated from https://realfavicongenerator.net/ such a mess. -->
  14. <link rel="apple-touch-icon" sizes="180x180" href="/static/david/icons/apple-touch-icon.png">
  15. <link rel="icon" type="image/png" sizes="32x32" href="/static/david/icons/favicon-32x32.png">
  16. <link rel="icon" type="image/png" sizes="16x16" href="/static/david/icons/favicon-16x16.png">
  17. <link rel="manifest" href="/manifest.json">
  18. <link rel="mask-icon" href="/static/david/icons/safari-pinned-tab.svg" color="#5bbad5">
  19. <link rel="shortcut icon" href="/static/david/icons/favicon.ico">
  20. <meta name="apple-mobile-web-app-title" content="David Larlet">
  21. <meta name="application-name" content="David Larlet">
  22. <meta name="msapplication-TileColor" content="#da532c">
  23. <meta name="msapplication-config" content="/static/david/icons/browserconfig.xml">
  24. <meta name="theme-color" content="#f0f0ea">
  25. <!-- That good ol' feed, subscribe :p. -->
  26. <link rel=alternate type="application/atom+xml" title=Feed href="/david/log/">
  27. <meta name="robots" content="noindex, nofollow">
  28. <meta content="origin-when-cross-origin" name="referrer">
  29. <!-- Canonical URL for SEO purposes -->
  30. <link rel="canonical" href="https://apenwarr.ca/log/20190201">
  31. <style>
  32. /* http://meyerweb.com/eric/tools/css/reset/ */
  33. html, body, div, span,
  34. h1, h2, h3, h4, h5, h6, p, blockquote, pre,
  35. a, abbr, address, big, cite, code,
  36. del, dfn, em, img, ins,
  37. small, strike, strong, tt, var,
  38. dl, dt, dd, ol, ul, li,
  39. fieldset, form, label, legend,
  40. table, caption, tbody, tfoot, thead, tr, th, td,
  41. article, aside, canvas, details, embed,
  42. figure, figcaption, footer, header, hgroup,
  43. menu, nav, output, ruby, section, summary,
  44. time, mark, audio, video {
  45. margin: 0;
  46. padding: 0;
  47. border: 0;
  48. font-size: 100%;
  49. font: inherit;
  50. vertical-align: baseline;
  51. }
  52. /* HTML5 display-role reset for older browsers */
  53. article, aside, details, figcaption, figure,
  54. footer, header, hgroup, menu, nav, section { display: block; }
  55. body { line-height: 1; }
  56. blockquote, q { quotes: none; }
  57. blockquote:before, blockquote:after,
  58. q:before, q:after {
  59. content: '';
  60. content: none;
  61. }
  62. table {
  63. border-collapse: collapse;
  64. border-spacing: 0;
  65. }
  66. /* http://practicaltypography.com/equity.html */
  67. /* https://calendar.perfplanet.com/2016/no-font-face-bulletproof-syntax/ */
  68. /* https://www.filamentgroup.com/lab/js-web-fonts.html */
  69. @font-face {
  70. font-family: 'EquityTextB';
  71. src: url('/static/david/css/fonts/Equity-Text-B-Regular-webfont.woff2') format('woff2'),
  72. url('/static/david/css/fonts/Equity-Text-B-Regular-webfont.woff') format('woff');
  73. font-weight: 300;
  74. font-style: normal;
  75. font-display: swap;
  76. }
  77. @font-face {
  78. font-family: 'EquityTextB';
  79. src: url('/static/david/css/fonts/Equity-Text-B-Italic-webfont.woff2') format('woff2'),
  80. url('/static/david/css/fonts/Equity-Text-B-Italic-webfont.woff') format('woff');
  81. font-weight: 300;
  82. font-style: italic;
  83. font-display: swap;
  84. }
  85. @font-face {
  86. font-family: 'EquityTextB';
  87. src: url('/static/david/css/fonts/Equity-Text-B-Bold-webfont.woff2') format('woff2'),
  88. url('/static/david/css/fonts/Equity-Text-B-Bold-webfont.woff') format('woff');
  89. font-weight: 700;
  90. font-style: normal;
  91. font-display: swap;
  92. }
  93. @font-face {
  94. font-family: 'ConcourseT3';
  95. src: url('/static/david/css/fonts/concourse_t3_regular-webfont-20190806.woff2') format('woff2'),
  96. url('/static/david/css/fonts/concourse_t3_regular-webfont-20190806.woff') format('woff');
  97. font-weight: 300;
  98. font-style: normal;
  99. font-display: swap;
  100. }
  101. /* http://practice.typekit.com/lesson/caring-about-opentype-features/ */
  102. body {
  103. /* http://www.cssfontstack.com/ Palatino 99% Win 86% Mac */
  104. font-family: "EquityTextB", Palatino, serif;
  105. background-color: #f0f0ea;
  106. color: #07486c;
  107. font-kerning: normal;
  108. -moz-osx-font-smoothing: grayscale;
  109. -webkit-font-smoothing: subpixel-antialiased;
  110. text-rendering: optimizeLegibility;
  111. font-variant-ligatures: common-ligatures contextual;
  112. font-feature-settings: "kern", "liga", "clig", "calt";
  113. }
  114. pre, code, kbd, samp, var, tt {
  115. font-family: 'TriplicateT4c', monospace;
  116. }
  117. em {
  118. font-style: italic;
  119. color: #323a45;
  120. }
  121. strong {
  122. font-weight: bold;
  123. color: black;
  124. }
  125. nav {
  126. background-color: #323a45;
  127. color: #f0f0ea;
  128. display: flex;
  129. justify-content: space-around;
  130. padding: 1rem .5rem;
  131. }
  132. nav:last-child {
  133. border-bottom: 1vh solid #2d7474;
  134. }
  135. nav a {
  136. color: #f0f0ea;
  137. }
  138. nav abbr {
  139. border-bottom: 1px dotted white;
  140. }
  141. h1 {
  142. border-top: 1vh solid #2d7474;
  143. border-bottom: .2vh dotted #2d7474;
  144. background-color: #e3e1e1;
  145. color: #323a45;
  146. text-align: center;
  147. padding: 5rem 0 4rem 0;
  148. width: 100%;
  149. font-family: 'ConcourseT3';
  150. display: flex;
  151. flex-direction: column;
  152. }
  153. h1.single {
  154. padding-bottom: 10rem;
  155. }
  156. h1 span {
  157. position: absolute;
  158. top: 1vh;
  159. left: 20%;
  160. line-height: 0;
  161. }
  162. h1 span a {
  163. line-height: 1.7;
  164. padding: 1rem 1.2rem .6rem 1.2rem;
  165. border-radius: 0 0 6% 6%;
  166. background: #2d7474;
  167. font-size: 1.3rem;
  168. color: white;
  169. text-decoration: none;
  170. }
  171. h2 {
  172. margin: 4rem 0 1rem;
  173. border-top: .2vh solid #2d7474;
  174. padding-top: 1vh;
  175. }
  176. h3 {
  177. text-align: center;
  178. margin: 3rem 0 .75em;
  179. }
  180. hr {
  181. height: .4rem;
  182. width: .4rem;
  183. border-radius: .4rem;
  184. background: #07486c;
  185. margin: 2.5rem auto;
  186. }
  187. time {
  188. display: bloc;
  189. margin-left: 0 !important;
  190. }
  191. ul, ol {
  192. margin: 2rem;
  193. }
  194. ul {
  195. list-style-type: square;
  196. }
  197. a {
  198. text-decoration-skip-ink: auto;
  199. text-decoration-thickness: 0.05em;
  200. text-underline-offset: 0.09em;
  201. }
  202. article {
  203. max-width: 50rem;
  204. display: flex;
  205. flex-direction: column;
  206. margin: 2rem auto;
  207. }
  208. article.single {
  209. border-top: .2vh dotted #2d7474;
  210. margin: -6rem auto 1rem auto;
  211. background: #f0f0ea;
  212. padding: 2rem;
  213. }
  214. article p:last-child {
  215. margin-bottom: 1rem;
  216. }
  217. p {
  218. padding: 0 .5rem;
  219. margin-left: 3rem;
  220. }
  221. p + p,
  222. figure + p {
  223. margin-top: 2rem;
  224. }
  225. blockquote {
  226. background-color: #e3e1e1;
  227. border-left: .5vw solid #2d7474;
  228. display: flex;
  229. flex-direction: column;
  230. align-items: center;
  231. padding: 1rem;
  232. margin: 1.5rem;
  233. }
  234. blockquote cite {
  235. font-style: italic;
  236. }
  237. blockquote p {
  238. margin-left: 0;
  239. }
  240. figure {
  241. border-top: .2vh solid #2d7474;
  242. background-color: #e3e1e1;
  243. text-align: center;
  244. padding: 1.5rem 0;
  245. margin: 1rem 0 0;
  246. font-size: 1.5rem;
  247. width: 100%;
  248. }
  249. figure img {
  250. max-width: 250px;
  251. max-height: 250px;
  252. border: .5vw solid #323a45;
  253. padding: 1px;
  254. }
  255. figcaption {
  256. padding: 1rem;
  257. line-height: 1.4;
  258. }
  259. aside {
  260. display: flex;
  261. flex-direction: column;
  262. background-color: #e3e1e1;
  263. padding: 1rem 0;
  264. border-bottom: .2vh solid #07486c;
  265. }
  266. aside p {
  267. max-width: 50rem;
  268. margin: 0 auto;
  269. }
  270. /* https://fvsch.com/code/css-locks/ */
  271. p, li, pre, code, kbd, samp, var, tt, time, details, figcaption {
  272. font-size: 1rem;
  273. line-height: calc( 1.5em + 0.2 * 1rem );
  274. }
  275. h1 {
  276. font-size: 1.9rem;
  277. line-height: calc( 1.2em + 0.2 * 1rem );
  278. }
  279. h2 {
  280. font-size: 1.6rem;
  281. line-height: calc( 1.3em + 0.2 * 1rem );
  282. }
  283. h3 {
  284. font-size: 1.35rem;
  285. line-height: calc( 1.4em + 0.2 * 1rem );
  286. }
  287. @media (min-width: 20em) {
  288. /* The (100vw - 20rem) / (50 - 20) part
  289. resolves to 0-1rem, depending on the
  290. viewport width (between 20em and 50em). */
  291. p, li, pre, code, kbd, samp, var, tt, time, details, figcaption {
  292. font-size: calc( 1rem + .6 * (100vw - 20rem) / (50 - 20) );
  293. line-height: calc( 1.5em + 0.2 * (100vw - 50rem) / (20 - 50) );
  294. margin-left: 0;
  295. }
  296. h1 {
  297. font-size: calc( 1.9rem + 1.5 * (100vw - 20rem) / (50 - 20) );
  298. line-height: calc( 1.2em + 0.2 * (100vw - 50rem) / (20 - 50) );
  299. }
  300. h2 {
  301. font-size: calc( 1.5rem + 1.5 * (100vw - 20rem) / (50 - 20) );
  302. line-height: calc( 1.3em + 0.2 * (100vw - 50rem) / (20 - 50) );
  303. }
  304. h3 {
  305. font-size: calc( 1.35rem + 1.5 * (100vw - 20rem) / (50 - 20) );
  306. line-height: calc( 1.4em + 0.2 * (100vw - 50rem) / (20 - 50) );
  307. }
  308. }
  309. @media (min-width: 50em) {
  310. /* The right part of the addition *must* be a
  311. rem value. In this example we *could* change
  312. the whole declaration to font-size:2.5rem,
  313. but if our baseline value was not expressed
  314. in rem we would have to use calc. */
  315. p, li, pre, code, kbd, samp, var, tt, time, details, figcaption {
  316. font-size: calc( 1rem + .6 * 1rem );
  317. line-height: 1.5em;
  318. }
  319. p, li, pre, details {
  320. margin-left: 3rem;
  321. }
  322. h1 {
  323. font-size: calc( 1.9rem + 1.5 * 1rem );
  324. line-height: 1.2em;
  325. }
  326. h2 {
  327. font-size: calc( 1.5rem + 1.5 * 1rem );
  328. line-height: 1.3em;
  329. }
  330. h3 {
  331. font-size: calc( 1.35rem + 1.5 * 1rem );
  332. line-height: 1.4em;
  333. }
  334. figure img {
  335. max-width: 500px;
  336. max-height: 500px;
  337. }
  338. }
  339. figure.unsquared {
  340. margin-bottom: 1.5rem;
  341. }
  342. figure.unsquared img {
  343. height: inherit;
  344. }
  345. @media print {
  346. body { font-size: 100%; }
  347. a:after { content: " (" attr(href) ")"; }
  348. a, a:link, a:visited, a:after {
  349. text-decoration: underline;
  350. text-shadow: none !important;
  351. background-image: none !important;
  352. background: white;
  353. color: black;
  354. }
  355. abbr[title] { border-bottom: 0; }
  356. abbr[title]:after { content: " (" attr(title) ")"; }
  357. img { page-break-inside: avoid; }
  358. @page { margin: 2cm .5cm; }
  359. h1, h2, h3 { page-break-after: avoid; }
  360. p3 { orphans: 3; widows: 3; }
  361. img {
  362. max-width: 250px !important;
  363. max-height: 250px !important;
  364. }
  365. nav, aside { display: none; }
  366. }
  367. ul.with_columns {
  368. column-count: 1;
  369. }
  370. @media (min-width: 20em) {
  371. ul.with_columns {
  372. column-count: 2;
  373. }
  374. }
  375. @media (min-width: 50em) {
  376. ul.with_columns {
  377. column-count: 3;
  378. }
  379. }
  380. ul.with_two_columns {
  381. column-count: 1;
  382. }
  383. @media (min-width: 20em) {
  384. ul.with_two_columns {
  385. column-count: 1;
  386. }
  387. }
  388. @media (min-width: 50em) {
  389. ul.with_two_columns {
  390. column-count: 2;
  391. }
  392. }
  393. .gallery {
  394. display: flex;
  395. flex-wrap: wrap;
  396. justify-content: space-around;
  397. }
  398. .gallery figure img {
  399. margin-left: 1rem;
  400. margin-right: 1rem;
  401. }
  402. .gallery figure figcaption {
  403. font-family: 'ConcourseT3'
  404. }
  405. footer {
  406. font-family: 'ConcourseT3';
  407. display: flex;
  408. flex-direction: column;
  409. border-top: 3px solid white;
  410. padding: 4rem 0;
  411. background-color: #07486c;
  412. color: white;
  413. }
  414. footer > * {
  415. max-width: 50rem;
  416. margin: 0 auto;
  417. }
  418. footer a {
  419. color: #f1c40f;
  420. }
  421. footer .avatar {
  422. width: 200px;
  423. height: 200px;
  424. border-radius: 50%;
  425. float: left;
  426. -webkit-shape-outside: circle();
  427. shape-outside: circle();
  428. margin-right: 2rem;
  429. padding: 2px 5px 5px 2px;
  430. background: white;
  431. border-left: 1px solid #f1c40f;
  432. border-top: 1px solid #f1c40f;
  433. border-right: 5px solid #f1c40f;
  434. border-bottom: 5px solid #f1c40f;
  435. }
  436. </style>
  437. <h1>
  438. <span><a id="jumper" href="#jumpto" title="Un peu perdu ?">?</a></span>
  439. Forget privacy: you're terrible at targeting anyway (archive)
  440. <time>Pour la pérennité des contenus liés. Non-indexé, retrait sur simple email.</time>
  441. </h1>
  442. <section>
  443. <article>
  444. <h3><a href="https://apenwarr.ca/log/20190201">Source originale du contenu</a></h3>
  445. <p><b>Forget privacy: you're terrible at targeting anyway</b></p>
  446. <p>I don't mind letting your programs see my private data as long as I get
  447. something useful in exchange. But that's not what happens.</p>
  448. <p>A former co-worker told me once: "Everyone loves collecting data,
  449. but nobody loves analyzing it later." This claim is almost shocking, but
  450. people who have been involved in data collection and analysis have all seen
  451. it. It starts with a brilliant idea: we'll collect information about
  452. every click someone makes on every page in our app! And we'll track how
  453. long they hesitate over a particular choice! And how often they use the
  454. back button! How many seconds they watch our intro video before they abort!
  455. How many times they reshare our social media post!</p>
  456. <p>And then they do track all that. Tracking it all is easy. Add some log
  457. events, dump them into a database, off we go.</p>
  458. <p>But then what? Well, after that, we have to analyze it. And as someone who
  459. has <a href="/log/20160328">analyzed</a> a <a href="/log/20171213">lot</a> of <a href="/log/20180918">data</a>
  460. about various things, let me tell you: being a data analyst is difficult
  461. and mostly unrewarding (except financially).</p>
  462. <p>See, the problem is there's almost no way to know if you're right. (It's
  463. also not clear what the definition of "right" is, which I'll get to in a bit.)
  464. There are almost never any easy conclusions, just hard ones, and the hard
  465. ones are error prone. What analysts don't talk about is how many incorrect
  466. charts (and therefore conclusions) get made on the way to making correct
  467. ones. Or ones we think are correct. A good chart is so incredibly
  468. persuasive that it almost doesn't even matter if it's right, as long as what
  469. you want is to persuade someone... which is probably why newpapers,
  470. magazines, and lobbyists publish so many misleading charts.</p>
  471. <p>But let's leave errors aside for the moment. Let's assume, very
  472. unrealistically, that we as a profession are good at analyzing things. What
  473. then?</p>
  474. <p>Well, then, let's get rich on targeted ads and personalized recommendation
  475. algorithms. It's what everyone else does!</p>
  476. <p>Or do they?</p>
  477. <p>The state of personalized recommendations is surprisingly terrible. At this
  478. point, the top recommendation is always a clickbait rage-creating
  479. article about movie stars or whatever Trump did or didn't do in the last 6
  480. hours. Or if not an article, then a video or documentary. That's not what I
  481. want to read or to watch, but I sometimes get sucked in anyway, and then
  482. it's recommendation apocalypse time, because the algorithm now thinks I
  483. <em>like</em> reading about Trump, and now <em>everything</em> is Trump. Never give
  484. positive feedback to an AI.</p>
  485. <p>This is, by the way, the dirty secret of the machine learning movement:
  486. almost everything produced by ML could have been produced, more cheaply,
  487. using a very dumb heuristic you coded up by hand, because mostly the ML is
  488. trained by feeding it examples of what humans did while following a very
  489. dumb heuristic. There's no magic here. If you use ML to teach a computer
  490. how to sort through resumes, it will recommend you interview people with
  491. male, white-sounding names, because it turns out that's <a href="https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazonscraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G">what your HR
  492. department already
  493. does</a>.
  494. If you ask it what video a person like you wants to see next, it will
  495. recommend some political propaganda crap, because 50% of the time 90% of the
  496. people <em>do</em> watch that next, because they can't help themselves, and that's
  497. a pretty good success rate.</p>
  498. <p>(Side note: there really are some excellent uses of ML out there, for things
  499. traditional algorithms are bad at, like image processing or winning at
  500. strategy games. That's wonderful, but chances are good that <em>your</em> pet ML
  501. application is an expensive replacement for a dumb heuristic.)</p>
  502. <p>Someone who works on web search once told me that they already have an
  503. algorithm that guarantees the maximum click-through rate for any web search:
  504. just return a page full of porn links. (Someone else said you can reverse
  505. this to make a porn detector: any link which has a high click-through
  506. rate, regardless of which query it's answering, is probably porn.)</p>
  507. <p>Now, the thing is, legitimate-seeming businesses can't just give you porn
  508. links all the time, because that's Not Safe For Work, so the job of most
  509. modern recommendation algorithms is to return the closest thing to porn that
  510. is still Safe For Work. In other words, celebrities (ideally attractive
  511. ones, or at least controversial ones), or politics, or both. They walk that
  512. line as closely as they can, because that's the local maximum for their
  513. profitability. Sometimes they accidentally cross that line, and then have
  514. to apologize or pay a token fine, and then go back to what they were doing.</p>
  515. <p>This makes me sad, but okay, it's just math. And maybe human nature. And
  516. maybe capitalism. Whatever. I might not like it, but I understand it.</p>
  517. <p>My complaint is that none of the above had <em>anything</em> to do with hoarding
  518. my personal information.</p>
  519. <p><b>The hottest recommendations have nothing to do with me</b></p>
  520. <p>Let's be clear: the best targeted ads I will ever see are the ones I get from
  521. a search engine when it serves an ad for exactly the thing I was searching
  522. for. Everybody wins: I find what I wanted, the vendor helps me buy their
  523. thing, and the search engine gets paid for connecting us. I don't know
  524. anybody who complains about this sort of ad. It's a good ad.</p>
  525. <p>And it, too, had nothing to do with my personal information!</p>
  526. <p>Google was serving targeted search ads decades ago, before it ever occurred
  527. to them to ask me to log in. Even today you can still use every search
  528. engine web site without logging in. They all still serve ads targeted to
  529. your search keyword. It's an excellent business.</p>
  530. <p>There's another kind of ad that works well on me. I play video games
  531. sometimes, and I use Steam, and sometimes I browse through games on Steam
  532. and star the ones I'm considering buying. Later, when those games go on
  533. sale, Steam emails me to tell me they are on sale, and sometimes then I buy
  534. them. Again, everybody wins: I got a game I wanted (at a discount!), the
  535. game maker gets paid, and Steam gets paid for connecting us. And I can
  536. disable the emails if I want, but I don't want, because they are good ads.</p>
  537. <p>But nobody had to profile me to make that happen! Steam has my account, and
  538. I <em>told</em> it what games I wanted and then it sold me <em>those</em> games. That's
  539. not profiling, that's just remembering a list that I explicitly
  540. handed to you.</p>
  541. <p>Amazon shows a box that suggests I might want to re-buy certain kinds of
  542. consumable products that I've bought in the past. This is useful too, and
  543. requires no profiling other than remembering the transactions we've had with
  544. each other in the past, which they kinda have to do anyway. And again,
  545. everybody wins.</p>
  546. <p>Now, Amazon also recommends products <em>like</em> the ones I've bought before, or
  547. looked at before. That's, say, 20% useful. If I just bought a computer
  548. monitor, and you know I did because I bought it from you, then you might as
  549. well stop selling them to me. But for a few days after I buy any
  550. electronics they also keep offering to sell me USB cables, and
  551. they're probably right. So okay, 20% useful targeting is better than 0%
  552. useful. I give Amazon some credit for building a useful profile of me,
  553. although it's specifically a profile of stuff I did on <em>their</em> site and
  554. which they keep to themselves. That doesn't seem too invasive. Nobody is
  555. surprised that Amazon remembers what I bought or browsed on their
  556. site.</p>
  557. <p>Worse is when (non-Amazon) vendors get the idea that I might want something.
  558. (They get this idea because I visited their web site and looked at it.)
  559. So their advertising partner chases me around the web trying to sell me the
  560. same thing. They do that, even if I <em>already</em> bought it. Ironically, this
  561. is because of a half-hearted attempt to <em>protect</em> my privacy. The vendor
  562. doesn't give information about me or my transactions to their advertising
  563. partner (because there's an excellent chance it would land them in legal
  564. trouble eventually), so the advertising partner doesn't know that I bought
  565. it. All they know (because of the advertising partner's tracker gadget on
  566. the vendor's web site) is that I looked at it, so they keep advertising it
  567. to me just in case.</p>
  568. <p>But okay, now we're starting to get somewhere interesting. The advertiser
  569. has a tracker that it places on multiple sites and tracks me around. So it
  570. doesn't know what I bought, but it does know what I looked at, probably over
  571. a long period of time, across many sites.</p>
  572. <p>Using this information, its painstakingly trained AI makes conclusions about
  573. which other things I might want to look at, based on...</p>
  574. <p>...well, based on what? People similar to me? Things my Facebook friends
  575. like to look at? Some complicated matrix-driven formula humans can't
  576. possibly comprehend, but which is 10% better?</p>
  577. <p>Probably not. Probably what it does is infer my gender, age, income level,
  578. and marital status. After that, it sells me cars and gadgets if I'm a guy,
  579. and fashion if I'm a woman. Not because all guys like cars and gadgets, but
  580. because some very uncreative human got into the loop and said "please sell
  581. my car mostly to men" and "please sell my fashion items mostly to women."
  582. Maybe the AI infers the wrong demographic information (I know Google has
  583. mine wrong) but it doesn't really matter, because it's usually mostly right,
  584. which is better than 0% right, and advertisers get some mostly
  585. demographically targeted ads, which is better than 0% targeted ads.</p>
  586. <p>You <em>know</em> this is how it works, right? It has to be. You can infer it
  587. from how bad the ads are. Anyone can, in a few seconds, think of some stuff
  588. they really want to buy which The Algorithm has failed to offer them, all
  589. while Outbrain makes zillions of dollars sending links about car insurance
  590. to non-car-owning Manhattanites. It might as well be a 1990s late-night TV
  591. infomercial, where all they knew for sure about my demographic profile is
  592. that I was still awake.</p>
  593. <p>You tracked me everywhere I go, logging it forever, begging for someone to
  594. steal your database, desperately fearing that some new EU privacy regulation
  595. might destroy your business... for <em>this</em>?</p>
  596. <p><b>Statistical Astrology</b></p>
  597. <p>Of course, it's not really as simple as that. There is not just one
  598. advertising company tracking me across every web site I visit. There are...
  599. many advertising companies tracking me across every web site I visit. Some
  600. of them don't even do advertising, they just do tracking, and they sell that
  601. tracking data to advertisers who supposedly use it to do better targeting.</p>
  602. <p>This whole ecosystem is amazing. Let's look at online news web sites. Why
  603. do they load so slowly nowadays? Trackers. No, not ads - trackers. They
  604. only have a few ads, which mostly don't take that long to load. But they
  605. have a <em>lot</em> of trackers, because each tracker will pay them a tiny bit of
  606. money to be allowed to track each page view. If you're a giant publisher
  607. teetering on the edge of bankruptcy and you have 25 trackers on your web site
  608. already, but tracker company #26 calls you and says they'll pay you $50k a
  609. year if you add their tracker too, are you going to say no? Your page runs
  610. like sludge already, so making it 1/25th more sludgy won't change anything,
  611. but that $50k might.</p>
  612. <p>("Ad blockers" remove annoying ads, but they also speed up the web, mostly
  613. because they remove trackers. Embarrassingly, the trackers themselves don't
  614. even need to cause a slowdown, but they always do, because their developers
  615. are invariably idiots who each need to load thousands of lines of javascript
  616. to do what could be done in two. But that's another story.)</p>
  617. <p>Then the ad sellers, and ad networks, buy the tracking data from all the
  618. trackers. The more tracking data they have, the better they can target ads,
  619. right? I guess.</p>
  620. <p>The brilliant bit here is that each of the trackers has a bit of data about
  621. you, but not all of it, because not every tracker is on every web site. But
  622. on the other hand, cross-referencing individuals between trackers is kinda
  623. hard, because none of them wants to give away their secret sauce. So each
  624. ad seller tries their best to cross-reference the data from all the tracker
  625. data they buy, but it mostly doesn't work. Let's say there are 25 trackers
  626. each tracking a million users, probably with a ton of overlap. In a sane
  627. world we'd guess that there are, at most, a few million distinct users. But
  628. in an insane world where you can't <em>prove</em> if there's an overlap, it could be
  629. as many as 25 million distinct users! The more tracker data your ad network
  630. buys, the more information you have! Probably! And that means better
  631. targeting! Maybe! And so you should buy ads from our network instead of
  632. the other network with less data! I guess!</p>
  633. <p>None of this works. They are still trying to sell me car insurance for my
  634. subway ride.</p>
  635. <p><b>It's not just ads</b></p>
  636. <p>That's a lot about profiling for ad targeting, which obviously doesn't work,
  637. if anyone would just stop and look at it. But there are way too many people
  638. incentivized to believe otherwise. Meanwhile, if you care about your
  639. privacy, all that matters is they're still collecting your personal
  640. information whether it works or not.</p>
  641. <p>What about content recommendation algorithms though? Do those work?</p>
  642. <p>Obviously not. I mean, have you tried them. Seriously.</p>
  643. <p>That's not quite fair. There are a few things that work. <a href="https://www.theserverside.com/feature/How-Pandora-built-a-better-recommendation-engine">Pandora's
  644. music
  645. recommendations</a>
  646. are surprisingly good, but they are doing it in a very non-obvious way. The
  647. obvious way is to take the playlist of all the songs your users listen to,
  648. blast it all into an ML training dataset, and then use that to produce a new
  649. playlist for new users based on... uh... their... profile? Well, they
  650. don't have a profile yet because they just joined. Perhaps based on the
  651. first few songs they select manually? Maybe, but they probably started
  652. with either a really popular song, which tells you nothing, or a really
  653. obscure song to test the thoroughness of your library, which tells you less
  654. than nothing.</p>
  655. <p>(I'm pretty sure this is how Mixcloud works. After each mix, it tries to
  656. find the "most similar" mix to continue with. Usually this is someone
  657. else's upload of the exact same mix. Then the "most similar" mix to that
  658. one is the first one, so it does that. Great job, machine learning, keep it
  659. up.)</p>
  660. <p>That leads us to the "random song followed by thumbs up/down" system that
  661. everyone uses. But everyone sucks, except Pandora. Why? Apparently
  662. because Pandora spent a lot of time hand-coding a bunch of music
  663. characteristics and writing a "real algorithm" (as opposed to ML) that tries
  664. to generate playlists based on the right combinations of those
  665. characteristics.</p>
  666. <p>In that sense, Pandora isn't pure ML. It often converges on a playlist
  667. you'll like within one or two thumbs up/down operations, because you're
  668. navigating through a multidimensional interconnected network of songs that
  669. people encoded the hard way, not a massive matrix of mediocre playlists
  670. scraped from average people who put no effort into generating those
  671. playlists in the first place. Pandora is bad at a lot of things (especially
  672. "availability in Canada") but their music recommendations are top notch.</p>
  673. <p>Just one catch. If Pandora can figure out a good playlist based
  674. on a starter song and one or two thumbs up/down clicks, then... I guess it's
  675. not profiling you. They didn't need your personal information either.</p>
  676. <p><b>Netflix</b></p>
  677. <p>While we're here, I just want to rant about Netflix, which is an odd case
  678. of starting off with a really good recommendation algorithm
  679. and then making it worse on purpose.</p>
  680. <p>Once upon a time, there was the <a href="https://journals.sagepub.com/doi/full/10.1177/1461444814538646">Netflix
  681. prize</a>,
  682. which granted $1 million to the best team that could predict people's movie
  683. ratings, based on their past ratings, with better accuracy than Netflix
  684. could themselves. (This not-so-shockingly resulted in a <a href="https://www.cs.cornell.edu/~shmat/netflix-faq.html">privacy
  685. fiasco</a> when it turned
  686. out you could de-anonymize the data set that they publicly released, oops.
  687. Well, that's what you get when you long-term store people's personal
  688. information in a database.)</p>
  689. <p>Netflix believed their business depended on a good
  690. recommendation algorithm. It was already pretty good: I remember using
  691. Netflix around 10 years ago and getting several recommendations for things I
  692. would never have discovered, but which I turned out to like.
  693. That hasn't happened to me on Netflix in a long, long time.</p>
  694. <p>As the story goes, once upon a time Netflix was a DVD-by-mail service.
  695. DVD-by-mail is really slow, so it was absolutely essential that at least one
  696. of this week's DVDs was good enough to entertain you for
  697. your Friday night movie. Too many Fridays with only bad movies, and
  698. you'd surely unsubscribe. A good recommendation system was key. (I guess
  699. there was also some interesting math around trying to make sure to rent out
  700. as much of the inventory as possible each week, since having a zillion
  701. copies of the most recent blockbuster, which would be popular this month and
  702. then die out next month, was not really viable.)</p>
  703. <p>Eventually though, Netflix moved online, and the cost of a bad
  704. recommendation was much less: just stop watching and switch to a new movie.
  705. Moreover, it was perfectly fine if everyone watched the same blockbuster.
  706. In fact, it was better, because they could cache it at your ISP and caches
  707. always work better if people are boring and average.</p>
  708. <p>Worse, as the story goes, Netflix noticed a pattern: the more hours people
  709. watch, the less likely they are to cancel. (This makes sense: the more
  710. hours you spend on Netflix, the more you feel like you "need" it.) And with
  711. new people trying the service at a fixed or proportional rate, higher retention
  712. translates directly to faster growth.</p>
  713. <p>When I heard this was also when I learned the word
  714. "<a href="https://en.wikipedia.org/wiki/Satisficing">satisficing</a>," which
  715. essentially means searching through sludge not for the best option, but for
  716. a good enough option. Nowadays Netflix isn't about finding the best movie,
  717. it's about satisficing. If it has the choice between an award-winning movie
  718. that you 80% might like or 20% might hate, and a mainstream movie that's 0%
  719. special but you 99% won't hate, it will recommend the second one every time.
  720. Outliers are bad for business.</p>
  721. <p>The thing is, you don't need a risky, privacy-invading profile to recommend
  722. a mainstream movie. Mainstream movies are specially designed to be
  723. inoffensive to just about everyone. My Netflix
  724. recommendations screen is no longer "Recommended for you," it's "New
  725. Releases," and then "Trending Now," and "Watch it again."</p>
  726. <p>As promised, Netflix paid out their $1 million prize to buy the winning
  727. recommendation algorithm, which was even better than their old one. But
  728. <a href="https://medium.com/netflix-techblog/netflix-recommendations-beyond-the-5-stars-part-1-55838468f429">they didn't use it</a>, they threw it away.</p>
  729. <p>Some very expensive A/B testers determined that this is what makes me watch
  730. the most hours of mindless TV. Their revenues keep going up. And they
  731. don't even need to invade my privacy to do it.</p>
  732. <p>Who am I to say they're wrong?</p>
  733. </article>
  734. </section>
  735. <nav id="jumpto">
  736. <p>
  737. <a href="/david/blog/">Accueil du blog</a> |
  738. <a href="https://apenwarr.ca/log/20190201">Source originale</a> |
  739. <a href="/david/stream/2019/">Accueil du flux</a>
  740. </p>
  741. </nav>
  742. <footer>
  743. <div>
  744. <img src="/static/david/david-larlet-avatar.jpg" loading="lazy" class="avatar" width="200" height="200">
  745. <p>
  746. Bonjour/Hi!
  747. Je suis <a href="/david/" title="Profil public">David&nbsp;Larlet</a>, je vis actuellement à Montréal et j’alimente cet espace depuis 15 ans. <br>
  748. Si tu as apprécié cette lecture, n’hésite pas à poursuivre ton exploration. Par exemple via les <a href="/david/blog/" title="Expériences bienveillantes">réflexions bimestrielles</a>, la <a href="/david/stream/2019/" title="Pensées (dés)articulées">veille hebdomadaire</a> ou en t’abonnant au <a href="/david/log/" title="S’abonner aux publications via RSS">flux RSS</a> (<a href="/david/blog/2019/flux-rss/" title="Tiens c’est quoi un flux RSS ?">so 2005</a>).
  749. </p>
  750. <p>
  751. Je m’intéresse à la place que je peux avoir dans ce monde. En tant qu’humain, en tant que membre d’une famille et en tant qu’associé d’une coopérative. De temps en temps, je fais aussi des <a href="https://github.com/davidbgk" title="Principalement sur Github mais aussi ailleurs">trucs techniques</a>. Et encore plus rarement, <a href="/david/talks/" title="En ce moment je laisse plutôt la place aux autres">j’en parle</a>.
  752. </p>
  753. <p>
  754. Voici quelques articles choisis :
  755. <a href="/david/blog/2019/faire-equipe/" title="Accéder à l’article complet">Faire équipe</a>,
  756. <a href="/david/blog/2018/bivouac-automnal/" title="Accéder à l’article complet">Bivouac automnal</a>,
  757. <a href="/david/blog/2018/commodite-effondrement/" title="Accéder à l’article complet">Commodité et effondrement</a>,
  758. <a href="/david/blog/2017/donnees-communs/" title="Accéder à l’article complet">Des données aux communs</a>,
  759. <a href="/david/blog/2016/accompagner-enfant/" title="Accéder à l’article complet">Accompagner un enfant</a>,
  760. <a href="/david/blog/2016/senior-developer/" title="Accéder à l’article complet">Senior developer</a>,
  761. <a href="/david/blog/2016/illusion-sociale/" title="Accéder à l’article complet">L’illusion sociale</a>,
  762. <a href="/david/blog/2016/instantane-scopyleft/" title="Accéder à l’article complet">Instantané Scopyleft</a>,
  763. <a href="/david/blog/2016/enseigner-web/" title="Accéder à l’article complet">Enseigner le Web</a>,
  764. <a href="/david/blog/2016/simplicite-defaut/" title="Accéder à l’article complet">Simplicité par défaut</a>,
  765. <a href="/david/blog/2016/minimalisme-esthetique/" title="Accéder à l’article complet">Minimalisme et esthétique</a>,
  766. <a href="/david/blog/2014/un-web-omni-present/" title="Accéder à l’article complet">Un web omni-présent</a>,
  767. <a href="/david/blog/2014/manifeste-developpeur/" title="Accéder à l’article complet">Manifeste de développeur</a>,
  768. <a href="/david/blog/2013/confort-convivialite/" title="Accéder à l’article complet">Confort et convivialité</a>,
  769. <a href="/david/blog/2013/testament-numerique/" title="Accéder à l’article complet">Testament numérique</a>,
  770. et <a href="/david/blog/" title="Accéder aux archives">bien d’autres…</a>
  771. </p>
  772. <p>
  773. On peut <a href="mailto:david%40larlet.fr" title="Envoyer un courriel">échanger par courriel</a>. Si éventuellement tu souhaites que l’on travaille ensemble, tu devrais commencer par consulter le <a href="http://larlet.com">profil dédié à mon activité professionnelle</a> et/ou contacter directement <a href="http://scopyleft.fr/">scopyleft</a>, la <abbr title="Société coopérative et participative">SCOP</abbr> dont je fais partie depuis six ans. Je recommande au préalable de lire <a href="/david/blog/2018/cout-site/" title="Attention ce qui va suivre peut vous choquer">combien coûte un site</a> et pourquoi je suis plutôt favorable à une <a href="/david/pro/devis/" title="Discutons-en !">non-demande de devis</a>.
  774. </p>
  775. <p>
  776. Je ne traque pas ta navigation mais mon
  777. <abbr title="Alwaysdata, 62 rue Tiquetonne 75002 Paris, +33.184162340">hébergeur</abbr>
  778. conserve des logs d’accès.
  779. </p>
  780. </div>
  781. </footer>
  782. <script type="text/javascript">
  783. ;(_ => {
  784. const jumper = document.getElementById('jumper')
  785. jumper.addEventListener('click', e => {
  786. e.preventDefault()
  787. const anchor = e.target.getAttribute('href')
  788. const targetEl = document.getElementById(anchor.substring(1))
  789. targetEl.scrollIntoView({behavior: 'smooth'})
  790. })
  791. })()
  792. </script>