@@ -0,0 +1,242 @@ | |||
<!doctype html><!-- This is a valid HTML5 document. --> | |||
<!-- Screen readers, SEO, extensions and so on. --> | |||
<html lang="fr"> | |||
<!-- Has to be within the first 1024 bytes, hence before the `title` element | |||
See: https://www.w3.org/TR/2012/CR-html5-20121217/document-metadata.html#charset --> | |||
<meta charset="utf-8"> | |||
<!-- Why no `X-UA-Compatible` meta: https://stackoverflow.com/a/6771584 --> | |||
<!-- The viewport meta is quite crowded and we are responsible for that. | |||
See: https://codepen.io/tigt/post/meta-viewport-for-2015 --> | |||
<meta name="viewport" content="width=device-width,initial-scale=1"> | |||
<!-- Required to make a valid HTML5 document. --> | |||
<title>Expérimentations GPTiennes: assistant vocal (archive) — David Larlet</title> | |||
<meta name="description" content="Publication mise en cache pour en conserver une trace."> | |||
<!-- That good ol' feed, subscribe :). --> | |||
<link rel="alternate" type="application/atom+xml" title="Feed" href="/david/log/"> | |||
<!-- Generated from https://realfavicongenerator.net/ such a mess. --> | |||
<link rel="apple-touch-icon" sizes="180x180" href="/static/david/icons2/apple-touch-icon.png"> | |||
<link rel="icon" type="image/png" sizes="32x32" href="/static/david/icons2/favicon-32x32.png"> | |||
<link rel="icon" type="image/png" sizes="16x16" href="/static/david/icons2/favicon-16x16.png"> | |||
<link rel="manifest" href="/static/david/icons2/site.webmanifest"> | |||
<link rel="mask-icon" href="/static/david/icons2/safari-pinned-tab.svg" color="#07486c"> | |||
<link rel="shortcut icon" href="/static/david/icons2/favicon.ico"> | |||
<meta name="msapplication-TileColor" content="#f7f7f7"> | |||
<meta name="msapplication-config" content="/static/david/icons2/browserconfig.xml"> | |||
<meta name="theme-color" content="#f7f7f7" media="(prefers-color-scheme: light)"> | |||
<meta name="theme-color" content="#272727" media="(prefers-color-scheme: dark)"> | |||
<!-- Documented, feel free to shoot an email. --> | |||
<link rel="stylesheet" href="/static/david/css/style_2021-01-20.css"> | |||
<!-- See https://www.zachleat.com/web/comprehensive-webfonts/ for the trade-off. --> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_regular.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_bold.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_italic.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_regular.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_bold.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_italic.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<script> | |||
function toggleTheme(themeName) { | |||
document.documentElement.classList.toggle( | |||
'forced-dark', | |||
themeName === 'dark' | |||
) | |||
document.documentElement.classList.toggle( | |||
'forced-light', | |||
themeName === 'light' | |||
) | |||
} | |||
const selectedTheme = localStorage.getItem('theme') | |||
if (selectedTheme !== 'undefined') { | |||
toggleTheme(selectedTheme) | |||
} | |||
</script> | |||
<meta name="robots" content="noindex, nofollow"> | |||
<meta content="origin-when-cross-origin" name="referrer"> | |||
<!-- Canonical URL for SEO purposes --> | |||
<link rel="canonical" href="http://dataholic.ca/2023/04/05/gpt-assistant-vocal/"> | |||
<body class="remarkdown h1-underline h2-underline h3-underline em-underscore hr-center ul-star pre-tick" data-instant-intensity="viewport-all"> | |||
<article> | |||
<header> | |||
<h1>Expérimentations GPTiennes: assistant vocal</h1> | |||
</header> | |||
<nav> | |||
<p class="center"> | |||
<a href="/david/" title="Aller à l’accueil"><svg class="icon icon-home"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-home"></use> | |||
</svg> Accueil</a> • | |||
<a href="http://dataholic.ca/2023/04/05/gpt-assistant-vocal/" title="Lien vers le contenu original">Source originale</a> | |||
</p> | |||
</nav> | |||
<hr> | |||
<p>Dernière exploration avec GPT: est-il possible d’interfacer un <a href="https://fr.wikipedia.org/wiki/Mod%C3%A8le_de_langage">modèle de langage (LLM)</a> avec des outils logiciels existants, par exemple pour envoyer des courriels? Et d’ailleurs pourquoi?</p> | |||
<p>Démontrer <em>ad nauseam</em> que les connaissances générales de GPT ne sont pas si bonnes ou qu’il est facile de lui faire dire n’importe quoi et son contraire, tout cela fait que l’on passe à côté d’une réelle compréhension de ce genre d’outil et donc de son impact possible. Le fait que GPT fasse preuve d’une certaine “culture générale” mâtinée d’une tendance à l’affabulation est un bénéfice secondaire.</p> | |||
<p>La fonction première de ces modèles est celle d’interprétation du “langage naturel”. Cette fonction d’interprétation du langage est ce qui fait défaut aux outils informatiques depuis des lunes; barrière qui, une fois éliminée, permettrait de s’affranchir du symbolisme actuellement nécessaire et représenté par des interfaces d’utilisation contraignantes.</p> | |||
<p>Sauf que pour être en mesure de s’affranchir réellement de cette barrière, il faut que les LLM soient capables de faire le pont: comprendre d’un côté le langage humain et être capable de l’autre côté d’utiliser du langage “machine”, suivant un certain formalisme, pour transformer le verbe en action (informatique).</p> | |||
<p>GPT démontre d’ores et déjà cette capacité: la version Copilot qui permet de générer du code est en exemple. L’intégration avec Bing pour faire un moteur de recherche assisté en est une autre. Toutefois, je voulais tester moi-même comment cela pourrait fonctionner. Mon précédent test sur le code de sécurité routière (billet <a href="/2023/02/19/apprendre-a-gpt3/">1</a> et <a href="/2023/03/11/addendum-gpt/">2</a>) visait à tester la capacité de traitement et d’interprétation de GPT sur des volumes d’information supérieurs à sa fenêtre de contexte, ici, je cherche à évaluer la capacité du modèle de langage à jouer le rôle d’interface d’interprétation humain-machine.</p> | |||
<h2 id="commande-vocale-pour-courriel">Commande vocale pour courriel</h2> | |||
<p>Mon défi: était-il possible de passer une commande vocale instruisant Gmail d’envoyer un courriel?</p> | |||
<p>Les blocs Lego utilisés pour l’occasion:</p> | |||
<ul> | |||
<li>Une interface me permettant d’envoyer des messages vocaux, de récupérer ces messages vocaux dans un script de ma conception (via une <a href="https://fr.wikipedia.org/wiki/Interface_de_programmation">API</a>) et de renvoyer des réponses écrites à l’utilisateur. J’étais parti pour utiliser Discord, mais ça ne marchait pas à mon goût. En donnant mes contraintes à ChatGPT, il m’a conseillé <a href="https://telegram.org/">Telegram</a> qui s’est avéré effectivement un très bon choix.</li> | |||
<li>Un outil parole-vers-texte, là aussi pouvant être appelé par script/API, en l’occurrence le module <a href="https://platform.openai.com/docs/guides/speech-to-text">Whisper API</a> d’OpenAI</li> | |||
<li>Évidemment GPT et Gmail, les deux offrant là aussi des API pour être contrôlés par un script.</li> | |||
</ul> | |||
<p>Je m’étais fixé un objectif supplémentaire: avoir un mécanisme modulaire qui serait capable de recevoir d’autres commandes de manière flexible: par exemple, créer des événements dans un agenda, gérer des tâches, etc. J’ai donc mis en place un mécanisme de recette: un fichier de configuration définit l’ensemble des étapes et des fonctions à appeler pour réaliser une tâche particulière.</p> | |||
<p>Résultat net: un succès, avec quelques bémols. Ci-dessous une capture d’écran montrant l’échange sur l’interface web de Telegram.</p> | |||
<p>Le déclencheur de la séquence est un message vocal qui va comme suit (ceci est exactement la chaîne de caractère produite par Whisper): « Est-ce que tu peux écrire un courriel à Stéphane Guidoin pour lui dire que demain je ne rentrerai pas au travail, car il fait trop beau pour travailler. Je rentrerai après demain. Signé Robert. »</p> | |||
<p><img src="/images/2023-04-05_echange_telegram.png" alt="Échange via Telegram"></p> | |||
<p class="photoattrib">Échange avec le bot Telegram</p> | |||
<p>Pour les curieux, une section méthodologie à la fin rentre plus dans le détail (et présente quelques limites). | |||
Tout commence par un fichier de configuration qui contient les recettes. Le fichier décrit ce que chaque recette est capable de faire ainsi que les étapes pour la réaliser. Ensuite, j’ai créé un <a href="https://core.telegram.org/bots/">bot</a> Telegram, lequel est contrôlé par mon script Python.</p> | |||
<p>Lorsque l’usager envoie un message vocal au bot, le fichier son est reçu par mon script qui l’envoie à Whisper API, ce dernier générant une transcription en texte. La transcription est envoyée à GPT conjointement avec une liste contenant les noms et descriptions des recettes et une instruction: retourner le nom de la recette correspondant à la demande de l’utilisateur. Pour rendre le tout facilement utilisable par mon script Python -et c’est la clé de la démarche, je demande à GPT d’utiliser en guise de réponse le format descriptif JSON. Ça prend le format <code class="highlighter-rouge"><span class="p">{</span><span class="nt">"nom_recette"</span><span class="p">:</span><span class="w"> </span><span class="s2">"send_mail"</span><span class="p">}</span></code></p> | |||
<p>Une fois la recette sélectionnée, une confirmation est envoyée à l’utilisateur via Telegram et le script va ensuite s’en tenir à suivre les étapes de la recette, à savoir une alternance de requêtes à GPT et de fonctions auprès d’autres services, Gmail dans ce cas-ci. Les requêtes GPT sont entièrement décrites dans le fichier de configuration, les fonctions Gmail sont nommées dans le fichier de configuration, mais doivent évidemment être codées. La recette pour l’envoi de courriel ressemble à ceci:</p> | |||
<ol> | |||
<li>La requête de l’utilisateur est envoyée à GPT avec l’instruction de retourner le nom du ou des destinataires, là encore en retournant les résultats au format JSON;</li> | |||
<li>Les noms des destinataires sont envoyés à Gmail pour récupérer les adresses courriel;</li> | |||
<li>La requête de l’utilisateur est de nouveau envoyée à GPT avec l’instruction, cette fois-ci, de générer un titre et un contenu de courriel;</li> | |||
<li>Mon script produit un brouillon de courriel qui est envoyé à l’utilisateur via Telegram pour confirmation;</li> | |||
<li>Sur approbation de l’utilisateur, grâce un bouton oui/non, le courriel est envoyé.</li> | |||
</ol> | |||
<h2 id="est-ce-que-ça-marche">Est-ce que ça marche?</h2> | |||
<p>Ça fonctionne étonnamment bien, considérant que mon code ferait surement hurler un vrai développeur. De manière générale, GPT interprète de manière fiable les requêtes. Quand on lui fournit un canevas de réponse (ici une structure JSON avec des trous à remplir), il comprend toujours comment faire. Sur des dizaines d’essai, il a toujours bien procédé. Tel qu’expliqué dans la méthodologie, il a juste fallu que je gère les excès verbomoteurs de GPT.</p> | |||
<p>Je dois dire que Whisper API m’a aussi impressionné pour la transcription: à peu près pas d’erreur, il ôte les onomatopées diverses et variées et autres hésitations et arrive même à bien épelé la majorité des noms de famille.</p> | |||
<p>Mon produit est loin d’être « production ready », mais les quelques heures que j’ai passé dessus m’ont confirmé ce dont j’avais l’impression: la capacité de GPT à interpréter les demandes fait des LLM un candidat vraiment sérieux pour servir d’interface flexible. Vous me direz que Siri, Alexa et autres font déjà cela. C’est en partie vrai: Siri et Alexa font plus d’erreurs (à mes yeux) et surtout ce sont des systèmes pour lesquels il est plus difficile de s’intégrer. Ici, il est possible de faire des intégrations multiples et jusqu’à un certain point de contrôler ces intégrations. Nombre de plateformes proposent d’ores et déjà des fonctionnalités “AI-improved” et cela va surement exploser dans les prochains mois.</p> | |||
<p>Évidemment, reste la question de la réelle fiabilité de la chose. C’est à travers des intégrations à grand volume qu’il sera possible d’évaluer réellement si la fiabilité est de l’ordre de 99% ou de 90%, la différence entre un bidule perçu comme fiable ou pas fiable.</p> | |||
<p>Dernier commentaire de fond: jusqu’à un certain point, en expliquant les règles du jeu à GPT, il serait capable de générer des recettes. En lui fournissant comme exemple ma recette, je lui ai demandé de faire de même pour créer une tâche Asana; il m’a fourni une réponse qui se tenait. De la même manière, ici je me limite à envoyer un courriel à partir de zéro, mais il serait possible de répondre à un courriel. De manière plus générale, la même approche pourrait être utilisée pour faire une synthèse des courriels d’une journée, faire ressortir les courriels qui semblent nécessiter une action urgente et y répondre, etc.</p> | |||
<p>Tel que mentionné, le principal point où GPT manquait de constance et de prévisibilité pour servir de pont humain-machine est cette tendance à être inutilement verbeux et à fournir une réponse du type</p> | |||
<p><code class="highlighter-rouge">Voici la structure JSON répondant à votre requête: | |||
{"recette": "send_mail"}</code></p> | |||
<p>Alors que l’on voudrait simplement la structure JSON. J’ai contourné le problème avec une expression régulière, mais c’est… bof bof. L’exemple de Copilot montre toutefois que lorsqu’entrainé dans cet objectif, un LLM est capable de s’en tenir à des formats structurés.</p> | |||
<p>L’autre enjeu dans ce cas d’usage est la manière d’épeler les noms de famille. À ma surprise, Whisper avait la majorité des noms de famille correctement. Mais quand il les manquait, je n’ai pas trouvé de manière fiable de faire comprendre à GPT que si je lui donnais une série de lettres après le nom de famille, ça disait comme épeler le nom. Par ailleurs, l’API de Gmail n’est pas très tolérante aux fautes d’orthographe quand on cherche un nom, donc récupérer une adresse courriel avec une erreur dans le nom ne marche pas. C’est la principale limite, insurmontée à ce stade, dans ma démarche.</p> | |||
<p>Whisper API supporte uniquement des messages d’une minute. Il existe évidemment des approches pour segmenter un fichier audio et le transcrire en plusieurs morceaux, toutefois je n’ai pas implémenté cette fonction. Mes tests se sont donc limités sur des messages vocaux de moins d’une minute. Quoiqu’il en soit, dans la majorité de mes tests, GPT a suivi les consignes; que je lui demande un courriel court ou plus long, formel ou informel, tutoiement ou vouvoiement et autres permutations que j’ai tentées. La génération du titre du courriel laissait parfois à désirer, mais c’est mieux que beaucoup de titre de courriel que nous nous envoyons quotidiennement (quand il y a un titre…). Genre de petite limitation un peu dommage: GPT n’interprétait pas que quand je lui disais que le message allait à ma conjointe, il pouvait automatiquement sélectionner une formulation informelle et le tutoiement.</p> | |||
<p>Je n’ai pas mis en place beaucoup de chemins alternatifs: si l’adresse courriel n’est pas trouvée, si l’utilisateur veut ajuster le brouillon, etc. Ça se ferait parfaitement, ça prenait du temps dont je ne disposais plus.</p> | |||
<p>Tout cela est accompli avec environ 300 lignes de script Python et un fichier de configuration JSON d’une centaine de lignes. Je demeure impressionné par la facilité de mise en œuvre. Les deux tâches qui m’ont pris le plus de temps: corriger mon installation de Homebrew qui n’avait pas appréciée de passer sur une puce M1 et gérer les <em>callbacks</em> de l’API de Telegram. Le contrôle de Telegram se fait avec la librairie <a href="https://pypi.org/project/pyTelegramBotAPI/">Telebot</a>, tandis que pour Whisper, GPT et Gmail, j’utilise les librairies officielles. Le modèle utilisé pour GPT est <code class="highlighter-rouge">gpt-3.5-turbo</code>, je n’ai pas encore accès à GPT4 via l’API.</p> | |||
</article> | |||
<hr> | |||
<footer> | |||
<p> | |||
<a href="/david/" title="Aller à l’accueil"><svg class="icon icon-home"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-home"></use> | |||
</svg> Accueil</a> • | |||
<a href="/david/log/" title="Accès au flux RSS"><svg class="icon icon-rss2"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-rss2"></use> | |||
</svg> Suivre</a> • | |||
<a href="http://larlet.com" title="Go to my English profile" data-instant><svg class="icon icon-user-tie"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-user-tie"></use> | |||
</svg> Pro</a> • | |||
<a href="mailto:david%40larlet.fr" title="Envoyer un courriel"><svg class="icon icon-mail"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-mail"></use> | |||
</svg> Email</a> • | |||
<abbr class="nowrap" title="Hébergeur : Alwaysdata, 62 rue Tiquetonne 75002 Paris, +33184162340"><svg class="icon icon-hammer2"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-hammer2"></use> | |||
</svg> Légal</abbr> | |||
</p> | |||
<template id="theme-selector"> | |||
<form> | |||
<fieldset> | |||
<legend><svg class="icon icon-brightness-contrast"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-brightness-contrast"></use> | |||
</svg> Thème</legend> | |||
<label> | |||
<input type="radio" value="auto" name="chosen-color-scheme" checked> Auto | |||
</label> | |||
<label> | |||
<input type="radio" value="dark" name="chosen-color-scheme"> Foncé | |||
</label> | |||
<label> | |||
<input type="radio" value="light" name="chosen-color-scheme"> Clair | |||
</label> | |||
</fieldset> | |||
</form> | |||
</template> | |||
</footer> | |||
<script src="/static/david/js/instantpage-5.1.0.min.js" type="module"></script> | |||
<script> | |||
function loadThemeForm(templateName) { | |||
const themeSelectorTemplate = document.querySelector(templateName) | |||
const form = themeSelectorTemplate.content.firstElementChild | |||
themeSelectorTemplate.replaceWith(form) | |||
form.addEventListener('change', (e) => { | |||
const chosenColorScheme = e.target.value | |||
localStorage.setItem('theme', chosenColorScheme) | |||
toggleTheme(chosenColorScheme) | |||
}) | |||
const selectedTheme = localStorage.getItem('theme') | |||
if (selectedTheme && selectedTheme !== 'undefined') { | |||
form.querySelector(`[value="${selectedTheme}"]`).checked = true | |||
} | |||
} | |||
const prefersColorSchemeDark = '(prefers-color-scheme: dark)' | |||
window.addEventListener('load', () => { | |||
let hasDarkRules = false | |||
for (const styleSheet of Array.from(document.styleSheets)) { | |||
let mediaRules = [] | |||
for (const cssRule of styleSheet.cssRules) { | |||
if (cssRule.type !== CSSRule.MEDIA_RULE) { | |||
continue | |||
} | |||
// WARNING: Safari does not have/supports `conditionText`. | |||
if (cssRule.conditionText) { | |||
if (cssRule.conditionText !== prefersColorSchemeDark) { | |||
continue | |||
} | |||
} else { | |||
if (cssRule.cssText.startsWith(prefersColorSchemeDark)) { | |||
continue | |||
} | |||
} | |||
mediaRules = mediaRules.concat(Array.from(cssRule.cssRules)) | |||
} | |||
// WARNING: do not try to insert a Rule to a styleSheet you are | |||
// currently iterating on, otherwise the browser will be stuck | |||
// in a infinite loop… | |||
for (const mediaRule of mediaRules) { | |||
styleSheet.insertRule(mediaRule.cssText) | |||
hasDarkRules = true | |||
} | |||
} | |||
if (hasDarkRules) { | |||
loadThemeForm('#theme-selector') | |||
} | |||
}) | |||
</script> | |||
</body> | |||
</html> |
@@ -0,0 +1,79 @@ | |||
title: Expérimentations GPTiennes: assistant vocal | |||
url: http://dataholic.ca/2023/04/05/gpt-assistant-vocal/ | |||
hash_url: 08f83e8893cad4d5a2eb6a560f73dd65 | |||
<p>Dernière exploration avec GPT: est-il possible d’interfacer un <a href="https://fr.wikipedia.org/wiki/Mod%C3%A8le_de_langage">modèle de langage (LLM)</a> avec des outils logiciels existants, par exemple pour envoyer des courriels? Et d’ailleurs pourquoi?</p> | |||
<p>Démontrer <em>ad nauseam</em> que les connaissances générales de GPT ne sont pas si bonnes ou qu’il est facile de lui faire dire n’importe quoi et son contraire, tout cela fait que l’on passe à côté d’une réelle compréhension de ce genre d’outil et donc de son impact possible. Le fait que GPT fasse preuve d’une certaine “culture générale” mâtinée d’une tendance à l’affabulation est un bénéfice secondaire.</p> | |||
<p>La fonction première de ces modèles est celle d’interprétation du “langage naturel”. Cette fonction d’interprétation du langage est ce qui fait défaut aux outils informatiques depuis des lunes; barrière qui, une fois éliminée, permettrait de s’affranchir du symbolisme actuellement nécessaire et représenté par des interfaces d’utilisation contraignantes.</p> | |||
<p>Sauf que pour être en mesure de s’affranchir réellement de cette barrière, il faut que les LLM soient capables de faire le pont: comprendre d’un côté le langage humain et être capable de l’autre côté d’utiliser du langage “machine”, suivant un certain formalisme, pour transformer le verbe en action (informatique).</p> | |||
<p>GPT démontre d’ores et déjà cette capacité: la version Copilot qui permet de générer du code est en exemple. L’intégration avec Bing pour faire un moteur de recherche assisté en est une autre. Toutefois, je voulais tester moi-même comment cela pourrait fonctionner. Mon précédent test sur le code de sécurité routière (billet <a href="/2023/02/19/apprendre-a-gpt3/">1</a> et <a href="/2023/03/11/addendum-gpt/">2</a>) visait à tester la capacité de traitement et d’interprétation de GPT sur des volumes d’information supérieurs à sa fenêtre de contexte, ici, je cherche à évaluer la capacité du modèle de langage à jouer le rôle d’interface d’interprétation humain-machine.</p> | |||
<h2 id="commande-vocale-pour-courriel">Commande vocale pour courriel</h2> | |||
<p>Mon défi: était-il possible de passer une commande vocale instruisant Gmail d’envoyer un courriel?</p> | |||
<p>Les blocs Lego utilisés pour l’occasion:</p> | |||
<ul> | |||
<li>Une interface me permettant d’envoyer des messages vocaux, de récupérer ces messages vocaux dans un script de ma conception (via une <a href="https://fr.wikipedia.org/wiki/Interface_de_programmation">API</a>) et de renvoyer des réponses écrites à l’utilisateur. J’étais parti pour utiliser Discord, mais ça ne marchait pas à mon goût. En donnant mes contraintes à ChatGPT, il m’a conseillé <a href="https://telegram.org/">Telegram</a> qui s’est avéré effectivement un très bon choix.</li> | |||
<li>Un outil parole-vers-texte, là aussi pouvant être appelé par script/API, en l’occurrence le module <a href="https://platform.openai.com/docs/guides/speech-to-text">Whisper API</a> d’OpenAI</li> | |||
<li>Évidemment GPT et Gmail, les deux offrant là aussi des API pour être contrôlés par un script.</li> | |||
</ul> | |||
<p>Je m’étais fixé un objectif supplémentaire: avoir un mécanisme modulaire qui serait capable de recevoir d’autres commandes de manière flexible: par exemple, créer des événements dans un agenda, gérer des tâches, etc. J’ai donc mis en place un mécanisme de recette: un fichier de configuration définit l’ensemble des étapes et des fonctions à appeler pour réaliser une tâche particulière.</p> | |||
<p>Résultat net: un succès, avec quelques bémols. Ci-dessous une capture d’écran montrant l’échange sur l’interface web de Telegram.</p> | |||
<p>Le déclencheur de la séquence est un message vocal qui va comme suit (ceci est exactement la chaîne de caractère produite par Whisper): « Est-ce que tu peux écrire un courriel à Stéphane Guidoin pour lui dire que demain je ne rentrerai pas au travail, car il fait trop beau pour travailler. Je rentrerai après demain. Signé Robert. »</p> | |||
<p><img src="/images/2023-04-05_echange_telegram.png" alt="Échange via Telegram"></p> | |||
<p class="photoattrib">Échange avec le bot Telegram</p> | |||
<p>Pour les curieux, une section méthodologie à la fin rentre plus dans le détail (et présente quelques limites). | |||
Tout commence par un fichier de configuration qui contient les recettes. Le fichier décrit ce que chaque recette est capable de faire ainsi que les étapes pour la réaliser. Ensuite, j’ai créé un <a href="https://core.telegram.org/bots/">bot</a> Telegram, lequel est contrôlé par mon script Python.</p> | |||
<p>Lorsque l’usager envoie un message vocal au bot, le fichier son est reçu par mon script qui l’envoie à Whisper API, ce dernier générant une transcription en texte. La transcription est envoyée à GPT conjointement avec une liste contenant les noms et descriptions des recettes et une instruction: retourner le nom de la recette correspondant à la demande de l’utilisateur. Pour rendre le tout facilement utilisable par mon script Python -et c’est la clé de la démarche, je demande à GPT d’utiliser en guise de réponse le format descriptif JSON. Ça prend le format <code class="highlighter-rouge"><span class="p">{</span><span class="nt">"nom_recette"</span><span class="p">:</span><span class="w"> </span><span class="s2">"send_mail"</span><span class="p">}</span></code></p> | |||
<p>Une fois la recette sélectionnée, une confirmation est envoyée à l’utilisateur via Telegram et le script va ensuite s’en tenir à suivre les étapes de la recette, à savoir une alternance de requêtes à GPT et de fonctions auprès d’autres services, Gmail dans ce cas-ci. Les requêtes GPT sont entièrement décrites dans le fichier de configuration, les fonctions Gmail sont nommées dans le fichier de configuration, mais doivent évidemment être codées. La recette pour l’envoi de courriel ressemble à ceci:</p> | |||
<ol> | |||
<li>La requête de l’utilisateur est envoyée à GPT avec l’instruction de retourner le nom du ou des destinataires, là encore en retournant les résultats au format JSON;</li> | |||
<li>Les noms des destinataires sont envoyés à Gmail pour récupérer les adresses courriel;</li> | |||
<li>La requête de l’utilisateur est de nouveau envoyée à GPT avec l’instruction, cette fois-ci, de générer un titre et un contenu de courriel;</li> | |||
<li>Mon script produit un brouillon de courriel qui est envoyé à l’utilisateur via Telegram pour confirmation;</li> | |||
<li>Sur approbation de l’utilisateur, grâce un bouton oui/non, le courriel est envoyé.</li> | |||
</ol> | |||
<h2 id="est-ce-que-ça-marche">Est-ce que ça marche?</h2> | |||
<p>Ça fonctionne étonnamment bien, considérant que mon code ferait surement hurler un vrai développeur. De manière générale, GPT interprète de manière fiable les requêtes. Quand on lui fournit un canevas de réponse (ici une structure JSON avec des trous à remplir), il comprend toujours comment faire. Sur des dizaines d’essai, il a toujours bien procédé. Tel qu’expliqué dans la méthodologie, il a juste fallu que je gère les excès verbomoteurs de GPT.</p> | |||
<p>Je dois dire que Whisper API m’a aussi impressionné pour la transcription: à peu près pas d’erreur, il ôte les onomatopées diverses et variées et autres hésitations et arrive même à bien épelé la majorité des noms de famille.</p> | |||
<p>Mon produit est loin d’être « production ready », mais les quelques heures que j’ai passé dessus m’ont confirmé ce dont j’avais l’impression: la capacité de GPT à interpréter les demandes fait des LLM un candidat vraiment sérieux pour servir d’interface flexible. Vous me direz que Siri, Alexa et autres font déjà cela. C’est en partie vrai: Siri et Alexa font plus d’erreurs (à mes yeux) et surtout ce sont des systèmes pour lesquels il est plus difficile de s’intégrer. Ici, il est possible de faire des intégrations multiples et jusqu’à un certain point de contrôler ces intégrations. Nombre de plateformes proposent d’ores et déjà des fonctionnalités “AI-improved” et cela va surement exploser dans les prochains mois.</p> | |||
<p>Évidemment, reste la question de la réelle fiabilité de la chose. C’est à travers des intégrations à grand volume qu’il sera possible d’évaluer réellement si la fiabilité est de l’ordre de 99% ou de 90%, la différence entre un bidule perçu comme fiable ou pas fiable.</p> | |||
<p>Dernier commentaire de fond: jusqu’à un certain point, en expliquant les règles du jeu à GPT, il serait capable de générer des recettes. En lui fournissant comme exemple ma recette, je lui ai demandé de faire de même pour créer une tâche Asana; il m’a fourni une réponse qui se tenait. De la même manière, ici je me limite à envoyer un courriel à partir de zéro, mais il serait possible de répondre à un courriel. De manière plus générale, la même approche pourrait être utilisée pour faire une synthèse des courriels d’une journée, faire ressortir les courriels qui semblent nécessiter une action urgente et y répondre, etc.</p> | |||
<p>Tel que mentionné, le principal point où GPT manquait de constance et de prévisibilité pour servir de pont humain-machine est cette tendance à être inutilement verbeux et à fournir une réponse du type</p> | |||
<p><code class="highlighter-rouge">Voici la structure JSON répondant à votre requête: | |||
{"recette": "send_mail"}</code></p> | |||
<p>Alors que l’on voudrait simplement la structure JSON. J’ai contourné le problème avec une expression régulière, mais c’est… bof bof. L’exemple de Copilot montre toutefois que lorsqu’entrainé dans cet objectif, un LLM est capable de s’en tenir à des formats structurés.</p> | |||
<p>L’autre enjeu dans ce cas d’usage est la manière d’épeler les noms de famille. À ma surprise, Whisper avait la majorité des noms de famille correctement. Mais quand il les manquait, je n’ai pas trouvé de manière fiable de faire comprendre à GPT que si je lui donnais une série de lettres après le nom de famille, ça disait comme épeler le nom. Par ailleurs, l’API de Gmail n’est pas très tolérante aux fautes d’orthographe quand on cherche un nom, donc récupérer une adresse courriel avec une erreur dans le nom ne marche pas. C’est la principale limite, insurmontée à ce stade, dans ma démarche.</p> | |||
<p>Whisper API supporte uniquement des messages d’une minute. Il existe évidemment des approches pour segmenter un fichier audio et le transcrire en plusieurs morceaux, toutefois je n’ai pas implémenté cette fonction. Mes tests se sont donc limités sur des messages vocaux de moins d’une minute. Quoiqu’il en soit, dans la majorité de mes tests, GPT a suivi les consignes; que je lui demande un courriel court ou plus long, formel ou informel, tutoiement ou vouvoiement et autres permutations que j’ai tentées. La génération du titre du courriel laissait parfois à désirer, mais c’est mieux que beaucoup de titre de courriel que nous nous envoyons quotidiennement (quand il y a un titre…). Genre de petite limitation un peu dommage: GPT n’interprétait pas que quand je lui disais que le message allait à ma conjointe, il pouvait automatiquement sélectionner une formulation informelle et le tutoiement.</p> | |||
<p>Je n’ai pas mis en place beaucoup de chemins alternatifs: si l’adresse courriel n’est pas trouvée, si l’utilisateur veut ajuster le brouillon, etc. Ça se ferait parfaitement, ça prenait du temps dont je ne disposais plus.</p> | |||
<p>Tout cela est accompli avec environ 300 lignes de script Python et un fichier de configuration JSON d’une centaine de lignes. Je demeure impressionné par la facilité de mise en œuvre. Les deux tâches qui m’ont pris le plus de temps: corriger mon installation de Homebrew qui n’avait pas appréciée de passer sur une puce M1 et gérer les <em>callbacks</em> de l’API de Telegram. Le contrôle de Telegram se fait avec la librairie <a href="https://pypi.org/project/pyTelegramBotAPI/">Telebot</a>, tandis que pour Whisper, GPT et Gmail, j’utilise les librairies officielles. Le modèle utilisé pour GPT est <code class="highlighter-rouge">gpt-3.5-turbo</code>, je n’ai pas encore accès à GPT4 via l’API.</p> |
@@ -0,0 +1,215 @@ | |||
<!doctype html><!-- This is a valid HTML5 document. --> | |||
<!-- Screen readers, SEO, extensions and so on. --> | |||
<html lang="fr"> | |||
<!-- Has to be within the first 1024 bytes, hence before the `title` element | |||
See: https://www.w3.org/TR/2012/CR-html5-20121217/document-metadata.html#charset --> | |||
<meta charset="utf-8"> | |||
<!-- Why no `X-UA-Compatible` meta: https://stackoverflow.com/a/6771584 --> | |||
<!-- The viewport meta is quite crowded and we are responsible for that. | |||
See: https://codepen.io/tigt/post/meta-viewport-for-2015 --> | |||
<meta name="viewport" content="width=device-width,initial-scale=1"> | |||
<!-- Required to make a valid HTML5 document. --> | |||
<title>Aller voir les aurores boréales en train (archive) — David Larlet</title> | |||
<meta name="description" content="Publication mise en cache pour en conserver une trace."> | |||
<!-- That good ol' feed, subscribe :). --> | |||
<link rel="alternate" type="application/atom+xml" title="Feed" href="/david/log/"> | |||
<!-- Generated from https://realfavicongenerator.net/ such a mess. --> | |||
<link rel="apple-touch-icon" sizes="180x180" href="/static/david/icons2/apple-touch-icon.png"> | |||
<link rel="icon" type="image/png" sizes="32x32" href="/static/david/icons2/favicon-32x32.png"> | |||
<link rel="icon" type="image/png" sizes="16x16" href="/static/david/icons2/favicon-16x16.png"> | |||
<link rel="manifest" href="/static/david/icons2/site.webmanifest"> | |||
<link rel="mask-icon" href="/static/david/icons2/safari-pinned-tab.svg" color="#07486c"> | |||
<link rel="shortcut icon" href="/static/david/icons2/favicon.ico"> | |||
<meta name="msapplication-TileColor" content="#f7f7f7"> | |||
<meta name="msapplication-config" content="/static/david/icons2/browserconfig.xml"> | |||
<meta name="theme-color" content="#f7f7f7" media="(prefers-color-scheme: light)"> | |||
<meta name="theme-color" content="#272727" media="(prefers-color-scheme: dark)"> | |||
<!-- Documented, feel free to shoot an email. --> | |||
<link rel="stylesheet" href="/static/david/css/style_2021-01-20.css"> | |||
<!-- See https://www.zachleat.com/web/comprehensive-webfonts/ for the trade-off. --> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_regular.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_bold.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_italic.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_regular.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_bold.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_italic.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<script> | |||
function toggleTheme(themeName) { | |||
document.documentElement.classList.toggle( | |||
'forced-dark', | |||
themeName === 'dark' | |||
) | |||
document.documentElement.classList.toggle( | |||
'forced-light', | |||
themeName === 'light' | |||
) | |||
} | |||
const selectedTheme = localStorage.getItem('theme') | |||
if (selectedTheme !== 'undefined') { | |||
toggleTheme(selectedTheme) | |||
} | |||
</script> | |||
<meta name="robots" content="noindex, nofollow"> | |||
<meta content="origin-when-cross-origin" name="referrer"> | |||
<!-- Canonical URL for SEO purposes --> | |||
<link rel="canonical" href="https://blog.professeurjoachim.com/billet/2023-03-31-aller-voir-les-aurores-boreales-en-train"> | |||
<body class="remarkdown h1-underline h2-underline h3-underline em-underscore hr-center ul-star pre-tick" data-instant-intensity="viewport-all"> | |||
<article> | |||
<header> | |||
<h1>Aller voir les aurores boréales en train</h1> | |||
</header> | |||
<nav> | |||
<p class="center"> | |||
<a href="/david/" title="Aller à l’accueil"><svg class="icon icon-home"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-home"></use> | |||
</svg> Accueil</a> • | |||
<a href="https://blog.professeurjoachim.com/billet/2023-03-31-aller-voir-les-aurores-boreales-en-train" title="Lien vers le contenu original">Source originale</a> | |||
</p> | |||
</nav> | |||
<hr> | |||
<p>Depuis le début de l’année, j’avais besoin d’un changement d’atmosphère. Donc j’ai pris le train pour aller voir des aurores boréales.</p> | |||
<p>J’ai ressenti l’impulsion après qu’une amie a demandé à la cantonade “ma fille voudrait aller voir les aurores boréales, mais ma famille ne prend plus l’avion, vous pensez que c’est possible en train ?”. Ça doit être possible, je me suis dit, mais compliqué à organiser. Et puis j’ai regardé les cartes, les zones de visibilité des aurores, les meilleures périodes de l’année pour les voir, la météo scandinave, les prédictions d’activité solaire… en fait, c’est bien plus accessible que je ne le pensais. Et si j’y allais ?</p> | |||
<h2>Quoi ?</h2> | |||
<p>Une <a href="https://fr.wikipedia.org/wiki/Aurore_polaire">aurore boréale</a> apparait quand un vent solaire interagit avec la haute atmosphère terrestre, au niveau des pôles magnétiques terrestres. Hors orages solaires violents, les aurores se produisent généralement entre le 65e et le 75e degrés de latitude, grosso modo à cheval sur le cercle polaire.<br> | |||
Donc pour en voir il faut aller au nord tant qu’on peut, puis encore un peu plus au nord. On s’arrête quand il fait trop froid ou que le ciel est vert.</p> | |||
<h2>Quand ?</h2> | |||
<p>Est-ce qu’on peut les prévoir ? Oui, un peu, mais sans précision. Pour avoir une aurore, il faut que la Terre soit sur le chemin d’un vent solaire, ce qu’on peut prédire grossièrement avec un mois d’avance en surveillant <a href="https://jemma.mobi/noaa27d?e">l’index Kp</a> (qui mesure l’interaction entre l’activité solaire et le champ magnétique terrestre), et en l’extrapolant sur la période à venir (le soleil fait un tour sur lui-même en 27 jours). La prédiction à quelques heures est bien plus exacte.</p> | |||
<p>Étant donné que ça dépend du soleil, pour lequel on n’a pas d’excellents outils de prédiction d’activité, les scientifiques et observateurs peuvent se retrouver surpris par des orages solaires—comme <a href="https://www.livescience.com/most-powerful-solar-storm-in-6-years-caused-auroras-all-over-the-us-and-nobody-saw-it-coming">celui du 24–25 mars</a>, d’une puissance record depuis six ans, qui a pris tout le monde de court.</p> | |||
<p>Évidemment, il y a aussi la question de la météo ; un ciel nuageux empêchera de voir les aurores. Mais prédire la météo, vous connaissez. Sachez juste que la zone boréale a un temps qui change rapidement, sans vraie assurance la veille de la couverture nuageuse du lendemain.</p> | |||
<h2>Où ?</h2> | |||
<p>À la fin du 19e siècle, un gisement de fer est découvert à 145 km au nord du cercle boréal arctique. Pour l’exploiter il faut pouvoir déplacer le minerais là où il pourra être traité, et donc il faut une ligne de train qui partira vers l’Atlantique, étant donné que la Baltique plus éloignée, et est gelée une grosse partie de l’année. La ligne part du port norvégien de Narvik, et arrive jusqu’au gisement, où une ville est construite à son tour, Kiruna. La ligne continue ensuite jusqu’au port de Luleå sur la Baltique, où elle rejoint la ligne vers Stockholm. La ligne transporte des trains de passagers en plus des trains de minerais de fer.</p> | |||
<p>Dans les montagnes entre Narvik et Kiruna, un camp de travailleurs du chemin de fer est devenu un village, Abisko, et une base touristique a été créée pour accueillir les visiteurs.</p> | |||
<h2>Et donc ?</h2> | |||
<p>Fort de toutes ces informations, j’ai pu répondre positivement à la question de mon amie.</p> | |||
<ul> | |||
<li>✔︎ il est possible de prendre des trains de Paris à Stockholm (je l’ai fait en 2008 en train + ferry depuis Berlin et en 2017 en train de nuit depuis Hambourg)</li> | |||
<li>✔︎ depuis Stockholm, il y a un train de nuit qui s’arrête à Luleå, Kiruna, Abisko et termine à Narvik</li> | |||
<li>✔︎ on peut savoir grossièrement quelles périodes seront les meilleures pour l’observation des aurores</li> | |||
</ul> | |||
<p>Reste à savoir si :</p> | |||
<ul> | |||
<li>il y a un délai minimal pour réserver un aller-retour vers le cercle polaire en train</li> | |||
<li>les conditions seront réunies pour voir des aurores</li> | |||
<li>le coût du voyage sera abordable</li> | |||
<li>il existe des activités pour s’occuper en journée</li> | |||
</ul> | |||
<p>J’ai tendance à improviser mes voyages : une fois que j’ai décidé que je pars un de ces jours, je me documente un peu sur le trajet et la destination, je repère les options qui me permettent le plus de flexibilité. Au besoin, j’envoie un ou deux emails pour jauger du besoin de placer une réservation très à l’avance.</p> | |||
<h2>Comment ?</h2> | |||
<p>Pour ce voyage, j’ai pris ma décision de partir à une semaine de la date du départ. Il y avait <a href="https://jemma.mobi/mittaushistoria.php?p=2023-03-05">un pic d’activité Kp prévu pour le 5–6 mars</a>, je l’avais repéré quinze jours avant. Les mises à jour des estimations confirmaient l’activité, donc j’ai commencé à prévenir mon chef que je poserais sans doutes plusieurs jours très prochainement.</p> | |||
<p>Le trajet en train de Paris au cercle polaire dépend d’un train de nuit de Hambourg à Stockholm, puis d’un autre train de nuit de Stockholm à Narvik. Pour rejoindre Hambourg j’ai décidé de passer par Cologne, qui est desservie par le Thalys, et d’où on peut prendre un ICE (équivalent allemand du TGV) pour Hambourg. C’est quasiment huit heures de trains express.</p> | |||
<p>Pour être à Narvik le 5 mars au soir, il fallait donc que j’y arrive le 5 au matin, ce qui impliquait de prendre le train de nuit à Stockholm le 4 au soir, et donc prendre le train de Hambourg le 3 au soir, ce qui voulait dire partir de Paris le vendredi 3 au matin.</p> | |||
<p>Quoi faire après ça ? Ma sœur vit à Stockholm, c’est l’occasion parfaite pour passer la voir quelques jours au retour. Puis, pourquoi pas m’inviter chez des amis sur le chemin du retour ? Je connais du monde à Roskilde au Danemark (enfin je ne les connaissais pas avant d’y aller mais ils sont très sympa), à Berlin ou à Breda en Hollande… autant aller passer quelques jours et voir un peu d’Europe ! Et à l’aller, ça tombe bien : mon arrêt à Cologne me permet d’y déjeuner avec un ami montreuillois qui y a déménagé. Le programme est prévu, je préviens les amis et leur demande à quel point ils sont flexibles pour m’héberger, et zou ! Je peux commander mes premiers billets. On est dimanche 26 février, je partirai vendredi 3 mars.</p> | |||
<h2>Combien ?</h2> | |||
<p>Mon atout, quand je voyage en train en Europe, c’est le <a href="https://www.interrail.eu/fr">pass Interrail</a>. Je l’ai découvert en 2008 pour voyager de Venise à Stockholm, et pour ce voyage arctique j’ai découvert qu’il fonctionnait sur une app, au lieu d’un carnet dans lequel on doit marquer chacun de ses déplacements. Ce pass me donne la possibilité de voyager à volonté pendant 7 journées sur la durée d’un mois. Mis à part les trains express (TGV, Thalys, ICE…) et les trains de nuit, les trajets sont gratuits. Pour les trains payants, le prix ne couvre que la réservation. Par exemple une place de Thalys Paris–Cologne coûte 37 euros au lieu de 111, et une couchette Stockholm–Abisko coûte 240 couronnes suédoises au lieu de 1200 (soit environ 21 euros au lieu de 105). Comme je pouvais me le permettre, j’ai opté pour le pass Interrail 1e classe. La première classe s’applique pour tous les trains sauf les trains de nuit—j’ai donc pu profiter du café gratuit dans les trains scandinaves, de places plus larges, etc. Pour les trains de nuit, c’est le choix classique : place assise (avec un peu de chance c’est dans un compartiment couchette donc on peut quand même dormir allongé), couchette (avec oreiller, draps et couette) en compartiment de six places, ou en compartiment privé simple ou double.</p> | |||
<p>Au final, j’ai payé 160 euros de réservations, en plus des 440 euros de pass Interrail, ce qui fait 606 euros de train. Sans le pass et pour les mêmes places, j’aurais payé 984 euros. 40 % d’économie, ça change vraiment la donne.</p> | |||
<p>Pour la comparaison, je viens de regarder les vols de Paris à Narvik, l’aller retour dans deux semaines, c’est 555 euros. Et ça, ça n’est que pour Paris—Narvik. Il n’y a pas la suite du voyage : un déjeuner à Cologne, une après-midi à Stockholm entre deux trains de nuit, un passage à Abisko au retour de Narvik, deux jours à Stockholm, puis Roskilde, Berlin, Breda… j’ai la flemme de voir ce que ça m’aurait coûté en avion. La rapidité de déplacement ne fait pas tout, et elle ne compense pas l’émission de carbone dans l’atmosphère. Vous me connaissez, <a href="https://blog.professeurjoachim.com/billet/2019-04-10-je-ne-prends-plus-l-avion">je ne voyage plus en avion</a>, et si vous ne vous posez pas la question on n’aura pas grand chose (de poli) à se dire sur le sujet.</p> | |||
<h2>Et c’était comment ?</h2> | |||
<p>Quelques jours avant l’arrivé à Narvik, j’ai réservé un tour (auprès de <a href="https://www.dayoutnarvik.com/the-northern-lights-package">Day Out Narvik</a>). Au programme : on prend un minibus et on va chasser les aurores dans le fjord, dans l’archipel (les îles Lofoten) ou dans les montagnes vers la frontière. La destination dépend de la météo. Sur place on admire les aurores et on prend des photos avec les conseils du guide, on boit une boisson chaude et on mange une gaufre faite au feu de bois. Ce tour ne donnait aucune assurance qu’on verrait des aurores. Par chance, ce soir là a eu les conditions idéales. Ciel dégagé à partir de 19 heures, grosse activité d’aurores boréales de 19 h 30 à 21 h, froid mais pas trop (-6 à -10 ºC). On n’a eu à faire que 16 km depuis Narvik pour trouver une plage au fond du fjord, avec la pleine lune qui éclairait la montagne et la mer qui reflétait les lumières.</p> | |||
<p>Et les aurores. D’abord c’est faible, on les confond avec des nuages, puis on voit leur couleur, vert pâle, puis la forme, qui bouge. Puis il y a des mouvements plus rapides, les couleurs s’intensifient, on peut voir du violet, du blanc lumineux, qui s’étendent d’un bout à l’autre du ciel. Avec de la chance on voit les “aurores dansantes”, aux couleurs et mouvements très intenses. La durée d’apparition des aurores dépend surtout des vents solaires, donc c’est variable. Parfois c’est quelques minutes, parfois plusieurs heures. À un moment elles ne réapparaissent pas après avoir disparu, ou les nuages se lèvent et cachent le ciel.</p> | |||
<p>Photos prises avec mon appareil numérique, un Fuji X100T, avec son 23mm f/2, à 3200 iso. Photo plus grande au clic.</p> | |||
<p>Photos prises avec mon appareil argentique, un Contax G1, avec son Contax G 35mm f/2, sur de la Kodak Portra 800. Temps de pose : 16 secondes. Photo plus grande au clic.</p> | |||
<p>Quasiment tout dans cette affaire est une histoire de chance, une fois qu’on est au bon endroit. Sur les trois soirs où j’étais dans le cercle polaire (deux à Narvik, un à Abisko), je n’ai eu les conditions idéales qu’une fois, la première nuit. Le deuxième soir était nuageux et le troisième a montré des aurores ni vives ni durables. C’était quand même une bonne surprise d’en apercevoir depuis le train de retour vers Stockholm ; les tchèques avec qui je partageais le compartiment, qui avaient été dans la région depuis une semaine, allaient repartir sans en avoir vu… mais on a passé plusieurs minutes collés contre la fenêtre gelée, à admirer le spectacle naturel qui se produisait au dessus de nos têtes alors qu’on traversait les forêts enneigées et rivières gelées du grand nord scandinave à 80 kilomètres par heure.</p> | |||
</article> | |||
<hr> | |||
<footer> | |||
<p> | |||
<a href="/david/" title="Aller à l’accueil"><svg class="icon icon-home"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-home"></use> | |||
</svg> Accueil</a> • | |||
<a href="/david/log/" title="Accès au flux RSS"><svg class="icon icon-rss2"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-rss2"></use> | |||
</svg> Suivre</a> • | |||
<a href="http://larlet.com" title="Go to my English profile" data-instant><svg class="icon icon-user-tie"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-user-tie"></use> | |||
</svg> Pro</a> • | |||
<a href="mailto:david%40larlet.fr" title="Envoyer un courriel"><svg class="icon icon-mail"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-mail"></use> | |||
</svg> Email</a> • | |||
<abbr class="nowrap" title="Hébergeur : Alwaysdata, 62 rue Tiquetonne 75002 Paris, +33184162340"><svg class="icon icon-hammer2"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-hammer2"></use> | |||
</svg> Légal</abbr> | |||
</p> | |||
<template id="theme-selector"> | |||
<form> | |||
<fieldset> | |||
<legend><svg class="icon icon-brightness-contrast"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-brightness-contrast"></use> | |||
</svg> Thème</legend> | |||
<label> | |||
<input type="radio" value="auto" name="chosen-color-scheme" checked> Auto | |||
</label> | |||
<label> | |||
<input type="radio" value="dark" name="chosen-color-scheme"> Foncé | |||
</label> | |||
<label> | |||
<input type="radio" value="light" name="chosen-color-scheme"> Clair | |||
</label> | |||
</fieldset> | |||
</form> | |||
</template> | |||
</footer> | |||
<script src="/static/david/js/instantpage-5.1.0.min.js" type="module"></script> | |||
<script> | |||
function loadThemeForm(templateName) { | |||
const themeSelectorTemplate = document.querySelector(templateName) | |||
const form = themeSelectorTemplate.content.firstElementChild | |||
themeSelectorTemplate.replaceWith(form) | |||
form.addEventListener('change', (e) => { | |||
const chosenColorScheme = e.target.value | |||
localStorage.setItem('theme', chosenColorScheme) | |||
toggleTheme(chosenColorScheme) | |||
}) | |||
const selectedTheme = localStorage.getItem('theme') | |||
if (selectedTheme && selectedTheme !== 'undefined') { | |||
form.querySelector(`[value="${selectedTheme}"]`).checked = true | |||
} | |||
} | |||
const prefersColorSchemeDark = '(prefers-color-scheme: dark)' | |||
window.addEventListener('load', () => { | |||
let hasDarkRules = false | |||
for (const styleSheet of Array.from(document.styleSheets)) { | |||
let mediaRules = [] | |||
for (const cssRule of styleSheet.cssRules) { | |||
if (cssRule.type !== CSSRule.MEDIA_RULE) { | |||
continue | |||
} | |||
// WARNING: Safari does not have/supports `conditionText`. | |||
if (cssRule.conditionText) { | |||
if (cssRule.conditionText !== prefersColorSchemeDark) { | |||
continue | |||
} | |||
} else { | |||
if (cssRule.cssText.startsWith(prefersColorSchemeDark)) { | |||
continue | |||
} | |||
} | |||
mediaRules = mediaRules.concat(Array.from(cssRule.cssRules)) | |||
} | |||
// WARNING: do not try to insert a Rule to a styleSheet you are | |||
// currently iterating on, otherwise the browser will be stuck | |||
// in a infinite loop… | |||
for (const mediaRule of mediaRules) { | |||
styleSheet.insertRule(mediaRule.cssText) | |||
hasDarkRules = true | |||
} | |||
} | |||
if (hasDarkRules) { | |||
loadThemeForm('#theme-selector') | |||
} | |||
}) | |||
</script> | |||
</body> | |||
</html> |
@@ -0,0 +1,48 @@ | |||
title: Aller voir les aurores boréales en train | |||
url: https://blog.professeurjoachim.com/billet/2023-03-31-aller-voir-les-aurores-boreales-en-train | |||
hash_url: 096a44a83d8d3f2bdfd21e3d378e4719 | |||
<p>Depuis le début de l’année, j’avais besoin d’un changement d’atmosphère. Donc j’ai pris le train pour aller voir des aurores boréales.</p> | |||
<p>J’ai ressenti l’impulsion après qu’une amie a demandé à la cantonade “ma fille voudrait aller voir les aurores boréales, mais ma famille ne prend plus l’avion, vous pensez que c’est possible en train ?”. Ça doit être possible, je me suis dit, mais compliqué à organiser. Et puis j’ai regardé les cartes, les zones de visibilité des aurores, les meilleures périodes de l’année pour les voir, la météo scandinave, les prédictions d’activité solaire… en fait, c’est bien plus accessible que je ne le pensais. Et si j’y allais ?</p> | |||
<h2>Quoi ?</h2> | |||
<p>Une <a href="https://fr.wikipedia.org/wiki/Aurore_polaire">aurore boréale</a> apparait quand un vent solaire interagit avec la haute atmosphère terrestre, au niveau des pôles magnétiques terrestres. Hors orages solaires violents, les aurores se produisent généralement entre le 65e et le 75e degrés de latitude, grosso modo à cheval sur le cercle polaire.<br> | |||
Donc pour en voir il faut aller au nord tant qu’on peut, puis encore un peu plus au nord. On s’arrête quand il fait trop froid ou que le ciel est vert.</p> | |||
<h2>Quand ?</h2> | |||
<p>Est-ce qu’on peut les prévoir ? Oui, un peu, mais sans précision. Pour avoir une aurore, il faut que la Terre soit sur le chemin d’un vent solaire, ce qu’on peut prédire grossièrement avec un mois d’avance en surveillant <a href="https://jemma.mobi/noaa27d?e">l’index Kp</a> (qui mesure l’interaction entre l’activité solaire et le champ magnétique terrestre), et en l’extrapolant sur la période à venir (le soleil fait un tour sur lui-même en 27 jours). La prédiction à quelques heures est bien plus exacte.</p> | |||
<p>Étant donné que ça dépend du soleil, pour lequel on n’a pas d’excellents outils de prédiction d’activité, les scientifiques et observateurs peuvent se retrouver surpris par des orages solaires—comme <a href="https://www.livescience.com/most-powerful-solar-storm-in-6-years-caused-auroras-all-over-the-us-and-nobody-saw-it-coming">celui du 24–25 mars</a>, d’une puissance record depuis six ans, qui a pris tout le monde de court.</p> | |||
<p>Évidemment, il y a aussi la question de la météo ; un ciel nuageux empêchera de voir les aurores. Mais prédire la météo, vous connaissez. Sachez juste que la zone boréale a un temps qui change rapidement, sans vraie assurance la veille de la couverture nuageuse du lendemain.</p> | |||
<h2>Où ?</h2> | |||
<p>À la fin du 19e siècle, un gisement de fer est découvert à 145 km au nord du cercle boréal arctique. Pour l’exploiter il faut pouvoir déplacer le minerais là où il pourra être traité, et donc il faut une ligne de train qui partira vers l’Atlantique, étant donné que la Baltique plus éloignée, et est gelée une grosse partie de l’année. La ligne part du port norvégien de Narvik, et arrive jusqu’au gisement, où une ville est construite à son tour, Kiruna. La ligne continue ensuite jusqu’au port de Luleå sur la Baltique, où elle rejoint la ligne vers Stockholm. La ligne transporte des trains de passagers en plus des trains de minerais de fer.</p> | |||
<p>Dans les montagnes entre Narvik et Kiruna, un camp de travailleurs du chemin de fer est devenu un village, Abisko, et une base touristique a été créée pour accueillir les visiteurs.</p> | |||
<h2>Et donc ?</h2> | |||
<p>Fort de toutes ces informations, j’ai pu répondre positivement à la question de mon amie.</p> | |||
<ul> | |||
<li>✔︎ il est possible de prendre des trains de Paris à Stockholm (je l’ai fait en 2008 en train + ferry depuis Berlin et en 2017 en train de nuit depuis Hambourg)</li> | |||
<li>✔︎ depuis Stockholm, il y a un train de nuit qui s’arrête à Luleå, Kiruna, Abisko et termine à Narvik</li> | |||
<li>✔︎ on peut savoir grossièrement quelles périodes seront les meilleures pour l’observation des aurores</li> | |||
</ul> | |||
<p>Reste à savoir si :</p> | |||
<ul> | |||
<li>il y a un délai minimal pour réserver un aller-retour vers le cercle polaire en train</li> | |||
<li>les conditions seront réunies pour voir des aurores</li> | |||
<li>le coût du voyage sera abordable</li> | |||
<li>il existe des activités pour s’occuper en journée</li> | |||
</ul> | |||
<p>J’ai tendance à improviser mes voyages : une fois que j’ai décidé que je pars un de ces jours, je me documente un peu sur le trajet et la destination, je repère les options qui me permettent le plus de flexibilité. Au besoin, j’envoie un ou deux emails pour jauger du besoin de placer une réservation très à l’avance.</p> | |||
<h2>Comment ?</h2> | |||
<p>Pour ce voyage, j’ai pris ma décision de partir à une semaine de la date du départ. Il y avait <a href="https://jemma.mobi/mittaushistoria.php?p=2023-03-05">un pic d’activité Kp prévu pour le 5–6 mars</a>, je l’avais repéré quinze jours avant. Les mises à jour des estimations confirmaient l’activité, donc j’ai commencé à prévenir mon chef que je poserais sans doutes plusieurs jours très prochainement.</p> | |||
<p>Le trajet en train de Paris au cercle polaire dépend d’un train de nuit de Hambourg à Stockholm, puis d’un autre train de nuit de Stockholm à Narvik. Pour rejoindre Hambourg j’ai décidé de passer par Cologne, qui est desservie par le Thalys, et d’où on peut prendre un ICE (équivalent allemand du TGV) pour Hambourg. C’est quasiment huit heures de trains express.</p> | |||
<p>Pour être à Narvik le 5 mars au soir, il fallait donc que j’y arrive le 5 au matin, ce qui impliquait de prendre le train de nuit à Stockholm le 4 au soir, et donc prendre le train de Hambourg le 3 au soir, ce qui voulait dire partir de Paris le vendredi 3 au matin.</p> | |||
<p>Quoi faire après ça ? Ma sœur vit à Stockholm, c’est l’occasion parfaite pour passer la voir quelques jours au retour. Puis, pourquoi pas m’inviter chez des amis sur le chemin du retour ? Je connais du monde à Roskilde au Danemark (enfin je ne les connaissais pas avant d’y aller mais ils sont très sympa), à Berlin ou à Breda en Hollande… autant aller passer quelques jours et voir un peu d’Europe ! Et à l’aller, ça tombe bien : mon arrêt à Cologne me permet d’y déjeuner avec un ami montreuillois qui y a déménagé. Le programme est prévu, je préviens les amis et leur demande à quel point ils sont flexibles pour m’héberger, et zou ! Je peux commander mes premiers billets. On est dimanche 26 février, je partirai vendredi 3 mars.</p> | |||
<h2>Combien ?</h2> | |||
<p>Mon atout, quand je voyage en train en Europe, c’est le <a href="https://www.interrail.eu/fr">pass Interrail</a>. Je l’ai découvert en 2008 pour voyager de Venise à Stockholm, et pour ce voyage arctique j’ai découvert qu’il fonctionnait sur une app, au lieu d’un carnet dans lequel on doit marquer chacun de ses déplacements. Ce pass me donne la possibilité de voyager à volonté pendant 7 journées sur la durée d’un mois. Mis à part les trains express (TGV, Thalys, ICE…) et les trains de nuit, les trajets sont gratuits. Pour les trains payants, le prix ne couvre que la réservation. Par exemple une place de Thalys Paris–Cologne coûte 37 euros au lieu de 111, et une couchette Stockholm–Abisko coûte 240 couronnes suédoises au lieu de 1200 (soit environ 21 euros au lieu de 105). Comme je pouvais me le permettre, j’ai opté pour le pass Interrail 1e classe. La première classe s’applique pour tous les trains sauf les trains de nuit—j’ai donc pu profiter du café gratuit dans les trains scandinaves, de places plus larges, etc. Pour les trains de nuit, c’est le choix classique : place assise (avec un peu de chance c’est dans un compartiment couchette donc on peut quand même dormir allongé), couchette (avec oreiller, draps et couette) en compartiment de six places, ou en compartiment privé simple ou double.</p> | |||
<p>Au final, j’ai payé 160 euros de réservations, en plus des 440 euros de pass Interrail, ce qui fait 606 euros de train. Sans le pass et pour les mêmes places, j’aurais payé 984 euros. 40 % d’économie, ça change vraiment la donne.</p> | |||
<p>Pour la comparaison, je viens de regarder les vols de Paris à Narvik, l’aller retour dans deux semaines, c’est 555 euros. Et ça, ça n’est que pour Paris—Narvik. Il n’y a pas la suite du voyage : un déjeuner à Cologne, une après-midi à Stockholm entre deux trains de nuit, un passage à Abisko au retour de Narvik, deux jours à Stockholm, puis Roskilde, Berlin, Breda… j’ai la flemme de voir ce que ça m’aurait coûté en avion. La rapidité de déplacement ne fait pas tout, et elle ne compense pas l’émission de carbone dans l’atmosphère. Vous me connaissez, <a href="https://blog.professeurjoachim.com/billet/2019-04-10-je-ne-prends-plus-l-avion">je ne voyage plus en avion</a>, et si vous ne vous posez pas la question on n’aura pas grand chose (de poli) à se dire sur le sujet.</p> | |||
<h2>Et c’était comment ?</h2> | |||
<p>Quelques jours avant l’arrivé à Narvik, j’ai réservé un tour (auprès de <a href="https://www.dayoutnarvik.com/the-northern-lights-package">Day Out Narvik</a>). Au programme : on prend un minibus et on va chasser les aurores dans le fjord, dans l’archipel (les îles Lofoten) ou dans les montagnes vers la frontière. La destination dépend de la météo. Sur place on admire les aurores et on prend des photos avec les conseils du guide, on boit une boisson chaude et on mange une gaufre faite au feu de bois. Ce tour ne donnait aucune assurance qu’on verrait des aurores. Par chance, ce soir là a eu les conditions idéales. Ciel dégagé à partir de 19 heures, grosse activité d’aurores boréales de 19 h 30 à 21 h, froid mais pas trop (-6 à -10 ºC). On n’a eu à faire que 16 km depuis Narvik pour trouver une plage au fond du fjord, avec la pleine lune qui éclairait la montagne et la mer qui reflétait les lumières.</p> | |||
<p>Et les aurores. D’abord c’est faible, on les confond avec des nuages, puis on voit leur couleur, vert pâle, puis la forme, qui bouge. Puis il y a des mouvements plus rapides, les couleurs s’intensifient, on peut voir du violet, du blanc lumineux, qui s’étendent d’un bout à l’autre du ciel. Avec de la chance on voit les “aurores dansantes”, aux couleurs et mouvements très intenses. La durée d’apparition des aurores dépend surtout des vents solaires, donc c’est variable. Parfois c’est quelques minutes, parfois plusieurs heures. À un moment elles ne réapparaissent pas après avoir disparu, ou les nuages se lèvent et cachent le ciel.</p> | |||
<p>Photos prises avec mon appareil numérique, un Fuji X100T, avec son 23mm f/2, à 3200 iso. Photo plus grande au clic.</p> | |||
<p>Photos prises avec mon appareil argentique, un Contax G1, avec son Contax G 35mm f/2, sur de la Kodak Portra 800. Temps de pose : 16 secondes. Photo plus grande au clic.</p> | |||
<p>Quasiment tout dans cette affaire est une histoire de chance, une fois qu’on est au bon endroit. Sur les trois soirs où j’étais dans le cercle polaire (deux à Narvik, un à Abisko), je n’ai eu les conditions idéales qu’une fois, la première nuit. Le deuxième soir était nuageux et le troisième a montré des aurores ni vives ni durables. C’était quand même une bonne surprise d’en apercevoir depuis le train de retour vers Stockholm ; les tchèques avec qui je partageais le compartiment, qui avaient été dans la région depuis une semaine, allaient repartir sans en avoir vu… mais on a passé plusieurs minutes collés contre la fenêtre gelée, à admirer le spectacle naturel qui se produisait au dessus de nos têtes alors qu’on traversait les forêts enneigées et rivières gelées du grand nord scandinave à 80 kilomètres par heure.</p> |
@@ -0,0 +1,215 @@ | |||
<!doctype html><!-- This is a valid HTML5 document. --> | |||
<!-- Screen readers, SEO, extensions and so on. --> | |||
<html lang="fr"> | |||
<!-- Has to be within the first 1024 bytes, hence before the `title` element | |||
See: https://www.w3.org/TR/2012/CR-html5-20121217/document-metadata.html#charset --> | |||
<meta charset="utf-8"> | |||
<!-- Why no `X-UA-Compatible` meta: https://stackoverflow.com/a/6771584 --> | |||
<!-- The viewport meta is quite crowded and we are responsible for that. | |||
See: https://codepen.io/tigt/post/meta-viewport-for-2015 --> | |||
<meta name="viewport" content="width=device-width,initial-scale=1"> | |||
<!-- Required to make a valid HTML5 document. --> | |||
<title>The mounting human and environmental costs of generative AI (archive) — David Larlet</title> | |||
<meta name="description" content="Publication mise en cache pour en conserver une trace."> | |||
<!-- That good ol' feed, subscribe :). --> | |||
<link rel="alternate" type="application/atom+xml" title="Feed" href="/david/log/"> | |||
<!-- Generated from https://realfavicongenerator.net/ such a mess. --> | |||
<link rel="apple-touch-icon" sizes="180x180" href="/static/david/icons2/apple-touch-icon.png"> | |||
<link rel="icon" type="image/png" sizes="32x32" href="/static/david/icons2/favicon-32x32.png"> | |||
<link rel="icon" type="image/png" sizes="16x16" href="/static/david/icons2/favicon-16x16.png"> | |||
<link rel="manifest" href="/static/david/icons2/site.webmanifest"> | |||
<link rel="mask-icon" href="/static/david/icons2/safari-pinned-tab.svg" color="#07486c"> | |||
<link rel="shortcut icon" href="/static/david/icons2/favicon.ico"> | |||
<meta name="msapplication-TileColor" content="#f7f7f7"> | |||
<meta name="msapplication-config" content="/static/david/icons2/browserconfig.xml"> | |||
<meta name="theme-color" content="#f7f7f7" media="(prefers-color-scheme: light)"> | |||
<meta name="theme-color" content="#272727" media="(prefers-color-scheme: dark)"> | |||
<!-- Documented, feel free to shoot an email. --> | |||
<link rel="stylesheet" href="/static/david/css/style_2021-01-20.css"> | |||
<!-- See https://www.zachleat.com/web/comprehensive-webfonts/ for the trade-off. --> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_regular.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_bold.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_italic.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_regular.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_bold.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_italic.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<script> | |||
function toggleTheme(themeName) { | |||
document.documentElement.classList.toggle( | |||
'forced-dark', | |||
themeName === 'dark' | |||
) | |||
document.documentElement.classList.toggle( | |||
'forced-light', | |||
themeName === 'light' | |||
) | |||
} | |||
const selectedTheme = localStorage.getItem('theme') | |||
if (selectedTheme !== 'undefined') { | |||
toggleTheme(selectedTheme) | |||
} | |||
</script> | |||
<meta name="robots" content="noindex, nofollow"> | |||
<meta content="origin-when-cross-origin" name="referrer"> | |||
<!-- Canonical URL for SEO purposes --> | |||
<link rel="canonical" href="https://arstechnica.com/gadgets/2023/04/generative-ai-is-cool-but-lets-not-forget-its-human-and-environmental-costs/"> | |||
<body class="remarkdown h1-underline h2-underline h3-underline em-underscore hr-center ul-star pre-tick" data-instant-intensity="viewport-all"> | |||
<article> | |||
<header> | |||
<h1>The mounting human and environmental costs of generative AI</h1> | |||
</header> | |||
<nav> | |||
<p class="center"> | |||
<a href="/david/" title="Aller à l’accueil"><svg class="icon icon-home"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-home"></use> | |||
</svg> Accueil</a> • | |||
<a href="https://arstechnica.com/gadgets/2023/04/generative-ai-is-cool-but-lets-not-forget-its-human-and-environmental-costs/" title="Lien vers le contenu original">Source originale</a> | |||
</p> | |||
</nav> | |||
<hr> | |||
<p>Over the past few months, the field of artificial intelligence has seen rapid growth, with wave after wave of new models like Dall-E and GPT-4 emerging one after another. Every week brings the promise of new and exciting models, products, and tools. It’s easy to get swept up in the waves of hype, but these shiny capabilities come at a real cost to society and the planet.</p> | |||
<p>Downsides include the environmental toll of mining rare minerals, the human costs of the labor-intensive process of data annotation, and the escalating financial investment required to train AI models as they incorporate more parameters.</p> | |||
<p>Let’s look at the innovations that have fueled recent generations of these models—and raised their associated costs.</p> | |||
<h2>Bigger models</h2> | |||
<p>In recent years, AI models have been getting bigger, with researchers now measuring their size in the hundreds of billions of parameters. “Parameters” are the internal connections used within the models to learn patterns based on the training data.</p> | |||
<p>For large language models (LLMs) like ChatGPT, we’ve gone from around 100 million parameters in 2018 to 500 billion in 2023 with Google’s PaLM model. The theory behind this growth is that models with more parameters should have better performance, even on tasks they were not initially trained on, although this hypothesis remains unproven. | |||
Model size growth over the years. | |||
Enlarge / Model size growth over the years.</p> | |||
<p>Bigger models typically take longer to train, which means they also need more GPUs, which cost more money, so only a select few organizations are able to train them. Estimates put the training cost of GPT-3, which has 175 billion parameters, at $4.6 million—out of reach for the majority of companies and organizations. (It's worth noting that the cost of training models is dropping in some cases, such as in the case of LLaMA, the recent model trained by Meta.)</p> | |||
<p>This creates a digital divide in the AI community between those who can train the most cutting-edge LLMs (mostly Big Tech companies and rich institutions in the Global North) and those who can’t (nonprofit organizations, startups, and anyone without access to a supercomputer or millions in cloud credits). Building and deploying these behemoths requires a lot of planetary resources: rare metals for manufacturing GPUs, water to cool huge data centers, energy to keep those data centers running 24/7 on a planetary scale… all of these are often overlooked in favor of focusing on the future potential of the resulting models.</p> | |||
<h2>Planetary impacts</h2> | |||
<p>A study from Carnegie Melon University professor Emma Strubell about the carbon footprint of training LLMs estimated that training a 2019 model called BERT, which has only 213 million parameters, emitted 280 metric tons of carbon emissions, roughly equivalent to the emissions from five cars over their lifetimes. Since then, models have grown and hardware has become more efficient, so where are we now?</p> | |||
<p>In a recent academic article I wrote to study the carbon emissions incurred by training BLOOM, a 176-billion parameter language model, we compared the power consumption and ensuing carbon emissions of several LLMs, all of which came out in the last few years. The goal of the comparison was to get an idea of the scale of emissions of different sizes of LLMs and what impacts them. | |||
Enlarge | |||
Sasha Luccioni, et al.</p> | |||
<p>Depending on the energy source used for training and its carbon intensity, training a 2022-era LLM emits at least 25 metric tons of carbon equivalents if you use renewable energy, as we did for the BLOOM model. If you use carbon-intensive energy sources like coal and natural gas, which was the case for GPT-3, this number quickly goes up to 500 metric tons of carbon emissions, roughly equivalent to over a million miles driven by an average gasoline-powered car.</p> | |||
<p>And this calculation doesn’t consider the manufacturing of the hardware used for training the models, nor the emissions incurred when LLMs are deployed in the real world. For instance, with ChatGPT, which was queried by tens of millions of users at its peak a month ago, thousands of copies of the model are running in parallel, responding to user queries in real time, all while using megawatt hours of electricity and generating metric tons of carbon emissions. It’s hard to estimate the exact quantity of emissions this results in, given the secrecy and lack of transparency around these big LLMs.</p> | |||
<h2>Closed, proprietary models</h2> | |||
<p>Let’s go back to the LLM size plot above. You may notice that neither ChatGPT nor GPT-4 are on it. Why? Because we have no idea how big they are. Although there are several reports published about them, we know almost nothing about their size and how they work. Access is provided via APIs, which means they are essentially black boxes that can be queried by users.</p> | |||
<p>These boxes may contain either a single model (with a trillion parameters?) or multiple models, or, as I told Bloomberg, “It could be three raccoons in a trench coat.” We really don’t know.</p> | |||
<p>The plot below presents a timeline of recent releases of LLMs and the type of access that each model creator provided. As you can see, the biggest models (Megatron, PaLM, Gopher, etc.) are all closed source. And if you buy into the theory that the bigger the model, the more powerful it is (I don’t), this means the most powerful AI tech is only accessible to a select few organizations, who monopolize access to it. | |||
A timeline of recent releases of LLMs and the type of access each model creator provided. | |||
Enlarge / A timeline of recent releases of LLMs and the type of access each model creator provided. | |||
Irene Solaiman</p> | |||
<p>Why is this problematic? It means it’s difficult to carry out external evaluations and audits of these models since you can’t even be sure that the underlying model is the same every time you query it. It also means that you can’t do scientific research on them, given that studies must be reproducible.</p> | |||
<p>The only people who can keep improving these models are the organizations that trained them in the first place, which is something they keep doing to improve their models and provide new features over time.</p> | |||
<h2>Human costs</h2> | |||
<p>How many humans does it take to train an AI model? You may think the answer is zero, but the amount of human labor needed to make recent generations of LLMs is steadily rising.</p> | |||
<p>When Transformer models came out a few years ago, researchers heralded them as a new era in AI because they could be trained on “raw data.” In this case, raw data means “unlabeled data”—books, encyclopedia articles, and websites that have been scraped and collected in massive quantities.</p> | |||
<p>That was the case for models like BERT and GPT-2, which required relatively little human intervention in terms of data gathering and filtering. While this was convenient for the model creators, it also meant that all sorts of undesirable content, like hate speech and pornography, were sucked up during the model training process, then often parroted back by the models themselves.</p> | |||
<p>This data collection approach changed with the advent of RLHF (reinforcement learning with human feedback), the technique used by newer generations of LLMs like ChatGPT. As its name indicates, RLHF adds additional steps to the LLM training process, and these steps require much more human intervention.</p> | |||
<p>Essentially, once a model has been trained on large quantities of unlabeled data (from the web, books, etc.), humans are then asked to interact with the model, coming up with prompts (e.g., “Write me a recipe for chocolate cake”) and provide their own answers or evaluate answers provided by the model. This data is used to continue training the model, which is then again tested by humans, ad nauseam, until the model is deemed good enough to be released into the world.</p> | |||
<p>This kind of RLHF training is what made ChatGPT feasible for wide release since it could decline to answer many classes of potentially harmful questions. | |||
An illustration of RLHF training. | |||
Enlarge / An illustration of RLHF training.</p> | |||
<p>But that success has a dirty secret behind it: To keep the costs of AI low, the people providing this “human feedback” are underpaid, overexploited workers. In January, Time wrote a report about Kenyan laborers paid less than $2 an hour to examine thousands of messages for OpenAI. This kind of work can have long-lasting psychological impacts, as we've seen in content-moderation workers.</p> | |||
<p>To make it worse, the efforts of these nameless workers aren’t recognized in the reports accompanying AI models. Their labor remains invisible.</p> | |||
<h2>What should we do about it?</h2> | |||
<p>For the creators of these models, instead of focusing on scale and size and optimizing solely for performance, it’s possible to train smaller, more efficient models and make models accessible so that they can be reused and fine-tuned (read: adapted) by members of the AI community, who won’t need to train models from scratch. Dedicating more efforts toward improving the safety and security of these models—developing features like watermarks for machine-generated content, more reliable safety filters, and the ability to cite sources when generating answers to questions—can also contribute toward making LLMs more accessible and robust.</p> | |||
<p>As users of these models (sometimes despite ourselves), it's within our power to demand transparency and push back against the deployment of AI models in high-risk scenarios, such as services that provide mental help therapy or generate forensic sketches. These models are still too new, poorly documented, and unpredictable to be deployed in circumstances that can have such major repercussions.</p> | |||
<p>And the next time someone tells you that the latest AI model will benefit humanity at large or that it displays evidence of artificial general intelligence, I hope you'll think about its hidden costs to people and the planet, some of which I’ve addressed in the sections above. And these are only a fraction of the broader societal impacts and costs of these systems (some of which you can see on the image below, crowdsourced via Twitter)—things like job impacts, the spread of disinformation and propaganda, and copyright infringement concerns. | |||
There are many hidden costs of generative AI. | |||
Enlarge / There are many hidden costs of generative AI.</p> | |||
<p>The current trend is toward creating bigger and more closed and opaque models. But there’s still time to push back, demand transparency, and get a better understanding of the costs and impacts of LLMs while limiting how they are deployed in society at large. Legislation like the Algorithmic Accountability Act in the US and legal frameworks on AI governance in the European Union and Canada are defining our AI future and putting safeguards in place to ensure safety and accountability in future generations of AI systems deployed in society. As members of that society and users of these systems, we should have our voices heard by their creators.</p> | |||
</article> | |||
<hr> | |||
<footer> | |||
<p> | |||
<a href="/david/" title="Aller à l’accueil"><svg class="icon icon-home"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-home"></use> | |||
</svg> Accueil</a> • | |||
<a href="/david/log/" title="Accès au flux RSS"><svg class="icon icon-rss2"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-rss2"></use> | |||
</svg> Suivre</a> • | |||
<a href="http://larlet.com" title="Go to my English profile" data-instant><svg class="icon icon-user-tie"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-user-tie"></use> | |||
</svg> Pro</a> • | |||
<a href="mailto:david%40larlet.fr" title="Envoyer un courriel"><svg class="icon icon-mail"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-mail"></use> | |||
</svg> Email</a> • | |||
<abbr class="nowrap" title="Hébergeur : Alwaysdata, 62 rue Tiquetonne 75002 Paris, +33184162340"><svg class="icon icon-hammer2"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-hammer2"></use> | |||
</svg> Légal</abbr> | |||
</p> | |||
<template id="theme-selector"> | |||
<form> | |||
<fieldset> | |||
<legend><svg class="icon icon-brightness-contrast"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-brightness-contrast"></use> | |||
</svg> Thème</legend> | |||
<label> | |||
<input type="radio" value="auto" name="chosen-color-scheme" checked> Auto | |||
</label> | |||
<label> | |||
<input type="radio" value="dark" name="chosen-color-scheme"> Foncé | |||
</label> | |||
<label> | |||
<input type="radio" value="light" name="chosen-color-scheme"> Clair | |||
</label> | |||
</fieldset> | |||
</form> | |||
</template> | |||
</footer> | |||
<script src="/static/david/js/instantpage-5.1.0.min.js" type="module"></script> | |||
<script> | |||
function loadThemeForm(templateName) { | |||
const themeSelectorTemplate = document.querySelector(templateName) | |||
const form = themeSelectorTemplate.content.firstElementChild | |||
themeSelectorTemplate.replaceWith(form) | |||
form.addEventListener('change', (e) => { | |||
const chosenColorScheme = e.target.value | |||
localStorage.setItem('theme', chosenColorScheme) | |||
toggleTheme(chosenColorScheme) | |||
}) | |||
const selectedTheme = localStorage.getItem('theme') | |||
if (selectedTheme && selectedTheme !== 'undefined') { | |||
form.querySelector(`[value="${selectedTheme}"]`).checked = true | |||
} | |||
} | |||
const prefersColorSchemeDark = '(prefers-color-scheme: dark)' | |||
window.addEventListener('load', () => { | |||
let hasDarkRules = false | |||
for (const styleSheet of Array.from(document.styleSheets)) { | |||
let mediaRules = [] | |||
for (const cssRule of styleSheet.cssRules) { | |||
if (cssRule.type !== CSSRule.MEDIA_RULE) { | |||
continue | |||
} | |||
// WARNING: Safari does not have/supports `conditionText`. | |||
if (cssRule.conditionText) { | |||
if (cssRule.conditionText !== prefersColorSchemeDark) { | |||
continue | |||
} | |||
} else { | |||
if (cssRule.cssText.startsWith(prefersColorSchemeDark)) { | |||
continue | |||
} | |||
} | |||
mediaRules = mediaRules.concat(Array.from(cssRule.cssRules)) | |||
} | |||
// WARNING: do not try to insert a Rule to a styleSheet you are | |||
// currently iterating on, otherwise the browser will be stuck | |||
// in a infinite loop… | |||
for (const mediaRule of mediaRules) { | |||
styleSheet.insertRule(mediaRule.cssText) | |||
hasDarkRules = true | |||
} | |||
} | |||
if (hasDarkRules) { | |||
loadThemeForm('#theme-selector') | |||
} | |||
}) | |||
</script> | |||
</body> | |||
</html> |
@@ -0,0 +1,80 @@ | |||
title: The mounting human and environmental costs of generative AI | |||
url: https://arstechnica.com/gadgets/2023/04/generative-ai-is-cool-but-lets-not-forget-its-human-and-environmental-costs/ | |||
hash_url: 230f8f7224199132de4ce030458536de | |||
Over the past few months, the field of artificial intelligence has seen rapid growth, with wave after wave of new models like Dall-E and GPT-4 emerging one after another. Every week brings the promise of new and exciting models, products, and tools. It’s easy to get swept up in the waves of hype, but these shiny capabilities come at a real cost to society and the planet. | |||
Downsides include the environmental toll of mining rare minerals, the human costs of the labor-intensive process of data annotation, and the escalating financial investment required to train AI models as they incorporate more parameters. | |||
Let’s look at the innovations that have fueled recent generations of these models—and raised their associated costs. | |||
## Bigger models | |||
In recent years, AI models have been getting bigger, with researchers now measuring their size in the hundreds of billions of parameters. “Parameters” are the internal connections used within the models to learn patterns based on the training data. | |||
For large language models (LLMs) like ChatGPT, we’ve gone from around 100 million parameters in 2018 to 500 billion in 2023 with Google’s PaLM model. The theory behind this growth is that models with more parameters should have better performance, even on tasks they were not initially trained on, although this hypothesis remains unproven. | |||
Model size growth over the years. | |||
Enlarge / Model size growth over the years. | |||
Bigger models typically take longer to train, which means they also need more GPUs, which cost more money, so only a select few organizations are able to train them. Estimates put the training cost of GPT-3, which has 175 billion parameters, at $4.6 million—out of reach for the majority of companies and organizations. (It's worth noting that the cost of training models is dropping in some cases, such as in the case of LLaMA, the recent model trained by Meta.) | |||
This creates a digital divide in the AI community between those who can train the most cutting-edge LLMs (mostly Big Tech companies and rich institutions in the Global North) and those who can’t (nonprofit organizations, startups, and anyone without access to a supercomputer or millions in cloud credits). Building and deploying these behemoths requires a lot of planetary resources: rare metals for manufacturing GPUs, water to cool huge data centers, energy to keep those data centers running 24/7 on a planetary scale… all of these are often overlooked in favor of focusing on the future potential of the resulting models. | |||
## Planetary impacts | |||
A study from Carnegie Melon University professor Emma Strubell about the carbon footprint of training LLMs estimated that training a 2019 model called BERT, which has only 213 million parameters, emitted 280 metric tons of carbon emissions, roughly equivalent to the emissions from five cars over their lifetimes. Since then, models have grown and hardware has become more efficient, so where are we now? | |||
In a recent academic article I wrote to study the carbon emissions incurred by training BLOOM, a 176-billion parameter language model, we compared the power consumption and ensuing carbon emissions of several LLMs, all of which came out in the last few years. The goal of the comparison was to get an idea of the scale of emissions of different sizes of LLMs and what impacts them. | |||
Enlarge | |||
Sasha Luccioni, et al. | |||
Depending on the energy source used for training and its carbon intensity, training a 2022-era LLM emits at least 25 metric tons of carbon equivalents if you use renewable energy, as we did for the BLOOM model. If you use carbon-intensive energy sources like coal and natural gas, which was the case for GPT-3, this number quickly goes up to 500 metric tons of carbon emissions, roughly equivalent to over a million miles driven by an average gasoline-powered car. | |||
And this calculation doesn’t consider the manufacturing of the hardware used for training the models, nor the emissions incurred when LLMs are deployed in the real world. For instance, with ChatGPT, which was queried by tens of millions of users at its peak a month ago, thousands of copies of the model are running in parallel, responding to user queries in real time, all while using megawatt hours of electricity and generating metric tons of carbon emissions. It’s hard to estimate the exact quantity of emissions this results in, given the secrecy and lack of transparency around these big LLMs. | |||
## Closed, proprietary models | |||
Let’s go back to the LLM size plot above. You may notice that neither ChatGPT nor GPT-4 are on it. Why? Because we have no idea how big they are. Although there are several reports published about them, we know almost nothing about their size and how they work. Access is provided via APIs, which means they are essentially black boxes that can be queried by users. | |||
These boxes may contain either a single model (with a trillion parameters?) or multiple models, or, as I told Bloomberg, “It could be three raccoons in a trench coat.” We really don’t know. | |||
The plot below presents a timeline of recent releases of LLMs and the type of access that each model creator provided. As you can see, the biggest models (Megatron, PaLM, Gopher, etc.) are all closed source. And if you buy into the theory that the bigger the model, the more powerful it is (I don’t), this means the most powerful AI tech is only accessible to a select few organizations, who monopolize access to it. | |||
A timeline of recent releases of LLMs and the type of access each model creator provided. | |||
Enlarge / A timeline of recent releases of LLMs and the type of access each model creator provided. | |||
Irene Solaiman | |||
Why is this problematic? It means it’s difficult to carry out external evaluations and audits of these models since you can’t even be sure that the underlying model is the same every time you query it. It also means that you can’t do scientific research on them, given that studies must be reproducible. | |||
The only people who can keep improving these models are the organizations that trained them in the first place, which is something they keep doing to improve their models and provide new features over time. | |||
## Human costs | |||
How many humans does it take to train an AI model? You may think the answer is zero, but the amount of human labor needed to make recent generations of LLMs is steadily rising. | |||
When Transformer models came out a few years ago, researchers heralded them as a new era in AI because they could be trained on “raw data.” In this case, raw data means “unlabeled data”—books, encyclopedia articles, and websites that have been scraped and collected in massive quantities. | |||
That was the case for models like BERT and GPT-2, which required relatively little human intervention in terms of data gathering and filtering. While this was convenient for the model creators, it also meant that all sorts of undesirable content, like hate speech and pornography, were sucked up during the model training process, then often parroted back by the models themselves. | |||
This data collection approach changed with the advent of RLHF (reinforcement learning with human feedback), the technique used by newer generations of LLMs like ChatGPT. As its name indicates, RLHF adds additional steps to the LLM training process, and these steps require much more human intervention. | |||
Essentially, once a model has been trained on large quantities of unlabeled data (from the web, books, etc.), humans are then asked to interact with the model, coming up with prompts (e.g., “Write me a recipe for chocolate cake”) and provide their own answers or evaluate answers provided by the model. This data is used to continue training the model, which is then again tested by humans, ad nauseam, until the model is deemed good enough to be released into the world. | |||
This kind of RLHF training is what made ChatGPT feasible for wide release since it could decline to answer many classes of potentially harmful questions. | |||
An illustration of RLHF training. | |||
Enlarge / An illustration of RLHF training. | |||
But that success has a dirty secret behind it: To keep the costs of AI low, the people providing this “human feedback” are underpaid, overexploited workers. In January, Time wrote a report about Kenyan laborers paid less than $2 an hour to examine thousands of messages for OpenAI. This kind of work can have long-lasting psychological impacts, as we've seen in content-moderation workers. | |||
To make it worse, the efforts of these nameless workers aren’t recognized in the reports accompanying AI models. Their labor remains invisible. | |||
## What should we do about it? | |||
For the creators of these models, instead of focusing on scale and size and optimizing solely for performance, it’s possible to train smaller, more efficient models and make models accessible so that they can be reused and fine-tuned (read: adapted) by members of the AI community, who won’t need to train models from scratch. Dedicating more efforts toward improving the safety and security of these models—developing features like watermarks for machine-generated content, more reliable safety filters, and the ability to cite sources when generating answers to questions—can also contribute toward making LLMs more accessible and robust. | |||
As users of these models (sometimes despite ourselves), it's within our power to demand transparency and push back against the deployment of AI models in high-risk scenarios, such as services that provide mental help therapy or generate forensic sketches. These models are still too new, poorly documented, and unpredictable to be deployed in circumstances that can have such major repercussions. | |||
And the next time someone tells you that the latest AI model will benefit humanity at large or that it displays evidence of artificial general intelligence, I hope you'll think about its hidden costs to people and the planet, some of which I’ve addressed in the sections above. And these are only a fraction of the broader societal impacts and costs of these systems (some of which you can see on the image below, crowdsourced via Twitter)—things like job impacts, the spread of disinformation and propaganda, and copyright infringement concerns. | |||
There are many hidden costs of generative AI. | |||
Enlarge / There are many hidden costs of generative AI. | |||
The current trend is toward creating bigger and more closed and opaque models. But there’s still time to push back, demand transparency, and get a better understanding of the costs and impacts of LLMs while limiting how they are deployed in society at large. Legislation like the Algorithmic Accountability Act in the US and legal frameworks on AI governance in the European Union and Canada are defining our AI future and putting safeguards in place to ensure safety and accountability in future generations of AI systems deployed in society. As members of that society and users of these systems, we should have our voices heard by their creators. |
@@ -0,0 +1,218 @@ | |||
<!doctype html><!-- This is a valid HTML5 document. --> | |||
<!-- Screen readers, SEO, extensions and so on. --> | |||
<html lang="fr"> | |||
<!-- Has to be within the first 1024 bytes, hence before the `title` element | |||
See: https://www.w3.org/TR/2012/CR-html5-20121217/document-metadata.html#charset --> | |||
<meta charset="utf-8"> | |||
<!-- Why no `X-UA-Compatible` meta: https://stackoverflow.com/a/6771584 --> | |||
<!-- The viewport meta is quite crowded and we are responsible for that. | |||
See: https://codepen.io/tigt/post/meta-viewport-for-2015 --> | |||
<meta name="viewport" content="width=device-width,initial-scale=1"> | |||
<!-- Required to make a valid HTML5 document. --> | |||
<title>We need to tell people ChatGPT will lie to them, not debate linguistics (archive) — David Larlet</title> | |||
<meta name="description" content="Publication mise en cache pour en conserver une trace."> | |||
<!-- That good ol' feed, subscribe :). --> | |||
<link rel="alternate" type="application/atom+xml" title="Feed" href="/david/log/"> | |||
<!-- Generated from https://realfavicongenerator.net/ such a mess. --> | |||
<link rel="apple-touch-icon" sizes="180x180" href="/static/david/icons2/apple-touch-icon.png"> | |||
<link rel="icon" type="image/png" sizes="32x32" href="/static/david/icons2/favicon-32x32.png"> | |||
<link rel="icon" type="image/png" sizes="16x16" href="/static/david/icons2/favicon-16x16.png"> | |||
<link rel="manifest" href="/static/david/icons2/site.webmanifest"> | |||
<link rel="mask-icon" href="/static/david/icons2/safari-pinned-tab.svg" color="#07486c"> | |||
<link rel="shortcut icon" href="/static/david/icons2/favicon.ico"> | |||
<meta name="msapplication-TileColor" content="#f7f7f7"> | |||
<meta name="msapplication-config" content="/static/david/icons2/browserconfig.xml"> | |||
<meta name="theme-color" content="#f7f7f7" media="(prefers-color-scheme: light)"> | |||
<meta name="theme-color" content="#272727" media="(prefers-color-scheme: dark)"> | |||
<!-- Documented, feel free to shoot an email. --> | |||
<link rel="stylesheet" href="/static/david/css/style_2021-01-20.css"> | |||
<!-- See https://www.zachleat.com/web/comprehensive-webfonts/ for the trade-off. --> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_regular.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_bold.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_italic.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_regular.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_bold.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_italic.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<script> | |||
function toggleTheme(themeName) { | |||
document.documentElement.classList.toggle( | |||
'forced-dark', | |||
themeName === 'dark' | |||
) | |||
document.documentElement.classList.toggle( | |||
'forced-light', | |||
themeName === 'light' | |||
) | |||
} | |||
const selectedTheme = localStorage.getItem('theme') | |||
if (selectedTheme !== 'undefined') { | |||
toggleTheme(selectedTheme) | |||
} | |||
</script> | |||
<meta name="robots" content="noindex, nofollow"> | |||
<meta content="origin-when-cross-origin" name="referrer"> | |||
<!-- Canonical URL for SEO purposes --> | |||
<link rel="canonical" href="https://simonwillison.net/2023/Apr/7/chatgpt-lies/"> | |||
<body class="remarkdown h1-underline h2-underline h3-underline em-underscore hr-center ul-star pre-tick" data-instant-intensity="viewport-all"> | |||
<article> | |||
<header> | |||
<h1>We need to tell people ChatGPT will lie to them, not debate linguistics</h1> | |||
</header> | |||
<nav> | |||
<p class="center"> | |||
<a href="/david/" title="Aller à l’accueil"><svg class="icon icon-home"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-home"></use> | |||
</svg> Accueil</a> • | |||
<a href="https://simonwillison.net/2023/Apr/7/chatgpt-lies/" title="Lien vers le contenu original">Source originale</a> | |||
</p> | |||
</nav> | |||
<hr> | |||
<p><strong>ChatGPT lies to people</strong>. This is a serious bug that has so far resisted all attempts at a fix. We need to prioritize helping people understand this, not debating the most precise terminology to use to describe it.</p> | |||
<h4>We accidentally invented computers that can lie to us</h4> | |||
<p>I <a href="https://twitter.com/simonw/status/1643469011127259136">tweeted</a> (and <a href="https://fedi.simonwillison.net/@simon/110144293948444462">tooted</a>) this:</p> | |||
<p>Mainly I was trying to be pithy and amusing, but this thought was inspired by reading Sam Bowman’s excellent review of the field, <a href="https://cims.nyu.edu/~sbowman/eightthings.pdf">Eight Things to Know about Large Language Models</a>. In particular this:</p> | |||
<blockquote> | |||
<p>More capable models can better recognize the specific circumstances under which they are trained. Because of this, they are more likely to learn to act as expected in precisely those circumstances while behaving competently but unexpectedly in others. This can surface in the form of problems that Perez et al. (2022) call sycophancy, where a model answers subjective questions in a way that flatters their user’s stated beliefs, and sandbagging, where models are more likely to endorse common misconceptions when their user appears to be less educated.</p> | |||
</blockquote> | |||
<p>Sycophancy and sandbagging are my two favourite new pieces of AI terminology!</p> | |||
<p>What I find fascinating about this is that these extremely problematic behaviours are not the system working as intended: they are bugs! And we haven’t yet found a reliable way to fix them.</p> | |||
<p>(Here’s the paper that snippet references: <a href="https://arxiv.org/abs/2212.09251">Discovering Language Model Behaviors with Model-Written Evaluations</a> from December 2022.)</p> | |||
<h4>“But a machine can’t deliberately tell a lie”</h4> | |||
<p>I got quite a few replies complaining that it’s inappropriate to refer to LLMs as “lying”, because to do so anthropomorphizes them and implies a level of intent which isn’t possible.</p> | |||
<p>I completely agree that anthropomorphism is bad: these models are fancy matrix arithmetic, not entities with intent and opinions.</p> | |||
<p>But in this case, I think the visceral clarity of being able to say “ChatGPT will lie to you” is a worthwhile trade.</p> | |||
<p>Science fiction has been presenting us with a model of “artificial intelligence” for decades. It’s firmly baked into our culture that an “AI” is an all-knowing computer, incapable of lying and able to answer any question with pin-point accuracy.</p> | |||
<p>Large language models like ChatGPT, on first encounter, seem to fit that bill. They appear astonishingly capable, and their command of human language can make them seem like a genuine intelligence, at least at first glance.</p> | |||
<p>But the more time you spend with them, the more that illusion starts to fall apart.</p> | |||
<p>They fail spectacularly when prompted with logic puzzles, or basic arithmetic, or when asked to produce citations or link to sources for the information they present.</p> | |||
<p>Most concerningly, they hallucinate or confabulate: they make things up! My favourite example of this remains <a href="https://simonwillison.net/2023/Mar/10/chatgpt-internet-access/#i-dont-believe-it">their ability to entirely imagine the content of a URL</a>. I still see this catching people out every day. It’s remarkably convincing.</p> | |||
<p><a href="https://arstechnica.com/information-technology/2023/04/why-ai-chatbots-are-the-ultimate-bs-machines-and-how-people-hope-to-fix-them/">Why ChatGPT and Bing Chat are so good at making things up</a> is an excellent in-depth exploration of this issue from Benj Edwards at Ars Technica.</p> | |||
<h4>We need to explain this in straight-forward terms</h4> | |||
<p>We’re trying to solve two problems here:</p> | |||
<ol> | |||
<li>ChatGPT cannot be trusted to provide factual information. It has a very real risk of making things up, and if people don’t understand it they are guaranteed to be mislead.</li> | |||
<li>Systems like ChatGPT are not sentient, or even intelligent systems. They do not have opinions, or feelings, or a sense of self. We must resist the temptation to anthropomorphize them.</li> | |||
</ol> | |||
<p>I believe that <strong>the most direct form of harm caused by LLMs today is the way they mislead their users</strong>. The first problem needs to take precedence.</p> | |||
<p>It is vitally important that new users understand that these tools cannot be trusted to provide factual answers. We need to help people get there as quickly as possible.</p> | |||
<p>Which of these two messages do you think is more effective?</p> | |||
<p><strong>ChatGPT will lie to you</strong></p> | |||
<p>Or</p> | |||
<p><strong>ChatGPT doesn’t lie, lying is too human and implies intent. It hallucinates. Actually no, hallucination still implies human-like thought. It confabulates. That’s a term used in psychiatry to describe when someone replaces a gap in one’s memory by a falsification that one believes to be true—though of course these things don’t have human minds so even confabulation is unnecessarily anthropomorphic. I hope you’ve enjoyed this linguistic detour!</strong></p> | |||
<p>Let’s go with the first one. We should be shouting this message from the rooftops: <strong>ChatGPT will lie to you</strong>.</p> | |||
<p>That doesn’t mean it’s not useful—it can be astonishingly useful, for all kinds of purposes... but seeking truthful, factual answers is very much not one of them. And everyone needs to understand that.</p> | |||
<p>Convincing people that these aren’t a sentient AI out of a science fiction story can come later. Once people understand their flaws this should be an easier argument to make!</p> | |||
<h4 id="warn-off-or-help-on">Should we warn people off or help them on?</h4> | |||
<p>This situation raises an ethical conundrum: if these tools can’t be trusted, and people are demonstrably falling for their traps, should we encourage people not to use them at all, or even campaign to have them banned?</p> | |||
<p>Every day I personally find new problems that I can solve more effectively with the help of large language models. Some recent examples from just the last few weeks:</p> | |||
<p>Each of these represents a problem I could have solved without ChatGPT... but at a time cost that would have been prohibitively expensive, to the point that I wouldn’t have bothered.</p> | |||
<p>I wrote more about this in <a href="https://simonwillison.net/2023/Mar/27/ai-enhanced-development/">AI-enhanced development makes me more ambitious with my projects</a>.</p> | |||
<p>Honestly, at this point using ChatGPT in the way that I do feels like a massively unfair competitive advantage. I’m not worried about AI taking people’s jobs: I’m worried about the impact of AI-enhanced developers like myself.</p> | |||
<p>It genuinely feels unethical for me <em>not</em> to help other people learn to use these tools as effectively as possible. I want everyone to be able to do what I can do with them, as safely and responsibly as possible.</p> | |||
<p>I think the message we should be emphasizing is this:</p> | |||
<p><strong>These are incredibly powerful tools. They are far harder to use effectively than they first appear. Invest the effort, but approach with caution: we accidentally invented computers that can lie to us and we can’t figure out how to make them stop.</strong></p> | |||
<p>There’s a time for linguistics, and there’s a time for grabbing the general public by the shoulders and shouting “It lies! The computer lies to you! Don’t trust anything it says!”</p> | |||
</article> | |||
<hr> | |||
<footer> | |||
<p> | |||
<a href="/david/" title="Aller à l’accueil"><svg class="icon icon-home"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-home"></use> | |||
</svg> Accueil</a> • | |||
<a href="/david/log/" title="Accès au flux RSS"><svg class="icon icon-rss2"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-rss2"></use> | |||
</svg> Suivre</a> • | |||
<a href="http://larlet.com" title="Go to my English profile" data-instant><svg class="icon icon-user-tie"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-user-tie"></use> | |||
</svg> Pro</a> • | |||
<a href="mailto:david%40larlet.fr" title="Envoyer un courriel"><svg class="icon icon-mail"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-mail"></use> | |||
</svg> Email</a> • | |||
<abbr class="nowrap" title="Hébergeur : Alwaysdata, 62 rue Tiquetonne 75002 Paris, +33184162340"><svg class="icon icon-hammer2"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-hammer2"></use> | |||
</svg> Légal</abbr> | |||
</p> | |||
<template id="theme-selector"> | |||
<form> | |||
<fieldset> | |||
<legend><svg class="icon icon-brightness-contrast"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-brightness-contrast"></use> | |||
</svg> Thème</legend> | |||
<label> | |||
<input type="radio" value="auto" name="chosen-color-scheme" checked> Auto | |||
</label> | |||
<label> | |||
<input type="radio" value="dark" name="chosen-color-scheme"> Foncé | |||
</label> | |||
<label> | |||
<input type="radio" value="light" name="chosen-color-scheme"> Clair | |||
</label> | |||
</fieldset> | |||
</form> | |||
</template> | |||
</footer> | |||
<script src="/static/david/js/instantpage-5.1.0.min.js" type="module"></script> | |||
<script> | |||
function loadThemeForm(templateName) { | |||
const themeSelectorTemplate = document.querySelector(templateName) | |||
const form = themeSelectorTemplate.content.firstElementChild | |||
themeSelectorTemplate.replaceWith(form) | |||
form.addEventListener('change', (e) => { | |||
const chosenColorScheme = e.target.value | |||
localStorage.setItem('theme', chosenColorScheme) | |||
toggleTheme(chosenColorScheme) | |||
}) | |||
const selectedTheme = localStorage.getItem('theme') | |||
if (selectedTheme && selectedTheme !== 'undefined') { | |||
form.querySelector(`[value="${selectedTheme}"]`).checked = true | |||
} | |||
} | |||
const prefersColorSchemeDark = '(prefers-color-scheme: dark)' | |||
window.addEventListener('load', () => { | |||
let hasDarkRules = false | |||
for (const styleSheet of Array.from(document.styleSheets)) { | |||
let mediaRules = [] | |||
for (const cssRule of styleSheet.cssRules) { | |||
if (cssRule.type !== CSSRule.MEDIA_RULE) { | |||
continue | |||
} | |||
// WARNING: Safari does not have/supports `conditionText`. | |||
if (cssRule.conditionText) { | |||
if (cssRule.conditionText !== prefersColorSchemeDark) { | |||
continue | |||
} | |||
} else { | |||
if (cssRule.cssText.startsWith(prefersColorSchemeDark)) { | |||
continue | |||
} | |||
} | |||
mediaRules = mediaRules.concat(Array.from(cssRule.cssRules)) | |||
} | |||
// WARNING: do not try to insert a Rule to a styleSheet you are | |||
// currently iterating on, otherwise the browser will be stuck | |||
// in a infinite loop… | |||
for (const mediaRule of mediaRules) { | |||
styleSheet.insertRule(mediaRule.cssText) | |||
hasDarkRules = true | |||
} | |||
} | |||
if (hasDarkRules) { | |||
loadThemeForm('#theme-selector') | |||
} | |||
}) | |||
</script> | |||
</body> | |||
</html> |
@@ -0,0 +1,51 @@ | |||
title: We need to tell people ChatGPT will lie to them, not debate linguistics | |||
url: https://simonwillison.net/2023/Apr/7/chatgpt-lies/ | |||
hash_url: 452be27c5cc8a4b9824d1d7e005546c6 | |||
<p><strong>ChatGPT lies to people</strong>. This is a serious bug that has so far resisted all attempts at a fix. We need to prioritize helping people understand this, not debating the most precise terminology to use to describe it.</p> | |||
<h4>We accidentally invented computers that can lie to us</h4> | |||
<p>I <a href="https://twitter.com/simonw/status/1643469011127259136">tweeted</a> (and <a href="https://fedi.simonwillison.net/@simon/110144293948444462">tooted</a>) this:</p> | |||
<p>Mainly I was trying to be pithy and amusing, but this thought was inspired by reading Sam Bowman’s excellent review of the field, <a href="https://cims.nyu.edu/~sbowman/eightthings.pdf">Eight Things to Know about Large Language Models</a>. In particular this:</p> | |||
<blockquote> | |||
<p>More capable models can better recognize the specific circumstances under which they are trained. Because of this, they are more likely to learn to act as expected in precisely those circumstances while behaving competently but unexpectedly in others. This can surface in the form of problems that Perez et al. (2022) call sycophancy, where a model answers subjective questions in a way that flatters their user’s stated beliefs, and sandbagging, where models are more likely to endorse common misconceptions when their user appears to be less educated.</p> | |||
</blockquote> | |||
<p>Sycophancy and sandbagging are my two favourite new pieces of AI terminology!</p> | |||
<p>What I find fascinating about this is that these extremely problematic behaviours are not the system working as intended: they are bugs! And we haven’t yet found a reliable way to fix them.</p> | |||
<p>(Here’s the paper that snippet references: <a href="https://arxiv.org/abs/2212.09251">Discovering Language Model Behaviors with Model-Written Evaluations</a> from December 2022.)</p> | |||
<h4>“But a machine can’t deliberately tell a lie”</h4> | |||
<p>I got quite a few replies complaining that it’s inappropriate to refer to LLMs as “lying”, because to do so anthropomorphizes them and implies a level of intent which isn’t possible.</p> | |||
<p>I completely agree that anthropomorphism is bad: these models are fancy matrix arithmetic, not entities with intent and opinions.</p> | |||
<p>But in this case, I think the visceral clarity of being able to say “ChatGPT will lie to you” is a worthwhile trade.</p> | |||
<p>Science fiction has been presenting us with a model of “artificial intelligence” for decades. It’s firmly baked into our culture that an “AI” is an all-knowing computer, incapable of lying and able to answer any question with pin-point accuracy.</p> | |||
<p>Large language models like ChatGPT, on first encounter, seem to fit that bill. They appear astonishingly capable, and their command of human language can make them seem like a genuine intelligence, at least at first glance.</p> | |||
<p>But the more time you spend with them, the more that illusion starts to fall apart.</p> | |||
<p>They fail spectacularly when prompted with logic puzzles, or basic arithmetic, or when asked to produce citations or link to sources for the information they present.</p> | |||
<p>Most concerningly, they hallucinate or confabulate: they make things up! My favourite example of this remains <a href="https://simonwillison.net/2023/Mar/10/chatgpt-internet-access/#i-dont-believe-it">their ability to entirely imagine the content of a URL</a>. I still see this catching people out every day. It’s remarkably convincing.</p> | |||
<p><a href="https://arstechnica.com/information-technology/2023/04/why-ai-chatbots-are-the-ultimate-bs-machines-and-how-people-hope-to-fix-them/">Why ChatGPT and Bing Chat are so good at making things up</a> is an excellent in-depth exploration of this issue from Benj Edwards at Ars Technica.</p> | |||
<h4>We need to explain this in straight-forward terms</h4> | |||
<p>We’re trying to solve two problems here:</p> | |||
<ol> | |||
<li>ChatGPT cannot be trusted to provide factual information. It has a very real risk of making things up, and if people don’t understand it they are guaranteed to be mislead.</li> | |||
<li>Systems like ChatGPT are not sentient, or even intelligent systems. They do not have opinions, or feelings, or a sense of self. We must resist the temptation to anthropomorphize them.</li> | |||
</ol> | |||
<p>I believe that <strong>the most direct form of harm caused by LLMs today is the way they mislead their users</strong>. The first problem needs to take precedence.</p> | |||
<p>It is vitally important that new users understand that these tools cannot be trusted to provide factual answers. We need to help people get there as quickly as possible.</p> | |||
<p>Which of these two messages do you think is more effective?</p> | |||
<p><strong>ChatGPT will lie to you</strong></p> | |||
<p>Or</p> | |||
<p><strong>ChatGPT doesn’t lie, lying is too human and implies intent. It hallucinates. Actually no, hallucination still implies human-like thought. It confabulates. That’s a term used in psychiatry to describe when someone replaces a gap in one’s memory by a falsification that one believes to be true—though of course these things don’t have human minds so even confabulation is unnecessarily anthropomorphic. I hope you’ve enjoyed this linguistic detour!</strong></p> | |||
<p>Let’s go with the first one. We should be shouting this message from the rooftops: <strong>ChatGPT will lie to you</strong>.</p> | |||
<p>That doesn’t mean it’s not useful—it can be astonishingly useful, for all kinds of purposes... but seeking truthful, factual answers is very much not one of them. And everyone needs to understand that.</p> | |||
<p>Convincing people that these aren’t a sentient AI out of a science fiction story can come later. Once people understand their flaws this should be an easier argument to make!</p> | |||
<h4 id="warn-off-or-help-on">Should we warn people off or help them on?</h4> | |||
<p>This situation raises an ethical conundrum: if these tools can’t be trusted, and people are demonstrably falling for their traps, should we encourage people not to use them at all, or even campaign to have them banned?</p> | |||
<p>Every day I personally find new problems that I can solve more effectively with the help of large language models. Some recent examples from just the last few weeks:</p> | |||
<p>Each of these represents a problem I could have solved without ChatGPT... but at a time cost that would have been prohibitively expensive, to the point that I wouldn’t have bothered.</p> | |||
<p>I wrote more about this in <a href="https://simonwillison.net/2023/Mar/27/ai-enhanced-development/">AI-enhanced development makes me more ambitious with my projects</a>.</p> | |||
<p>Honestly, at this point using ChatGPT in the way that I do feels like a massively unfair competitive advantage. I’m not worried about AI taking people’s jobs: I’m worried about the impact of AI-enhanced developers like myself.</p> | |||
<p>It genuinely feels unethical for me <em>not</em> to help other people learn to use these tools as effectively as possible. I want everyone to be able to do what I can do with them, as safely and responsibly as possible.</p> | |||
<p>I think the message we should be emphasizing is this:</p> | |||
<p><strong>These are incredibly powerful tools. They are far harder to use effectively than they first appear. Invest the effort, but approach with caution: we accidentally invented computers that can lie to us and we can’t figure out how to make them stop.</strong></p> | |||
<p>There’s a time for linguistics, and there’s a time for grabbing the general public by the shoulders and shouting “It lies! The computer lies to you! Don’t trust anything it says!”</p> |
@@ -0,0 +1,256 @@ | |||
<!doctype html><!-- This is a valid HTML5 document. --> | |||
<!-- Screen readers, SEO, extensions and so on. --> | |||
<html lang="fr"> | |||
<!-- Has to be within the first 1024 bytes, hence before the `title` element | |||
See: https://www.w3.org/TR/2012/CR-html5-20121217/document-metadata.html#charset --> | |||
<meta charset="utf-8"> | |||
<!-- Why no `X-UA-Compatible` meta: https://stackoverflow.com/a/6771584 --> | |||
<!-- The viewport meta is quite crowded and we are responsible for that. | |||
See: https://codepen.io/tigt/post/meta-viewport-for-2015 --> | |||
<meta name="viewport" content="width=device-width,initial-scale=1"> | |||
<!-- Required to make a valid HTML5 document. --> | |||
<title>Poking around OpenAI. (archive) — David Larlet</title> | |||
<meta name="description" content="Publication mise en cache pour en conserver une trace."> | |||
<!-- That good ol' feed, subscribe :). --> | |||
<link rel="alternate" type="application/atom+xml" title="Feed" href="/david/log/"> | |||
<!-- Generated from https://realfavicongenerator.net/ such a mess. --> | |||
<link rel="apple-touch-icon" sizes="180x180" href="/static/david/icons2/apple-touch-icon.png"> | |||
<link rel="icon" type="image/png" sizes="32x32" href="/static/david/icons2/favicon-32x32.png"> | |||
<link rel="icon" type="image/png" sizes="16x16" href="/static/david/icons2/favicon-16x16.png"> | |||
<link rel="manifest" href="/static/david/icons2/site.webmanifest"> | |||
<link rel="mask-icon" href="/static/david/icons2/safari-pinned-tab.svg" color="#07486c"> | |||
<link rel="shortcut icon" href="/static/david/icons2/favicon.ico"> | |||
<meta name="msapplication-TileColor" content="#f7f7f7"> | |||
<meta name="msapplication-config" content="/static/david/icons2/browserconfig.xml"> | |||
<meta name="theme-color" content="#f7f7f7" media="(prefers-color-scheme: light)"> | |||
<meta name="theme-color" content="#272727" media="(prefers-color-scheme: dark)"> | |||
<!-- Documented, feel free to shoot an email. --> | |||
<link rel="stylesheet" href="/static/david/css/style_2021-01-20.css"> | |||
<!-- See https://www.zachleat.com/web/comprehensive-webfonts/ for the trade-off. --> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_regular.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_bold.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_italic.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_regular.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_bold.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_italic.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<script> | |||
function toggleTheme(themeName) { | |||
document.documentElement.classList.toggle( | |||
'forced-dark', | |||
themeName === 'dark' | |||
) | |||
document.documentElement.classList.toggle( | |||
'forced-light', | |||
themeName === 'light' | |||
) | |||
} | |||
const selectedTheme = localStorage.getItem('theme') | |||
if (selectedTheme !== 'undefined') { | |||
toggleTheme(selectedTheme) | |||
} | |||
</script> | |||
<meta name="robots" content="noindex, nofollow"> | |||
<meta content="origin-when-cross-origin" name="referrer"> | |||
<!-- Canonical URL for SEO purposes --> | |||
<link rel="canonical" href="https://lethain.com/openai-exploration/"> | |||
<body class="remarkdown h1-underline h2-underline h3-underline em-underscore hr-center ul-star pre-tick" data-instant-intensity="viewport-all"> | |||
<article> | |||
<header> | |||
<h1>Poking around OpenAI.</h1> | |||
</header> | |||
<nav> | |||
<p class="center"> | |||
<a href="/david/" title="Aller à l’accueil"><svg class="icon icon-home"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-home"></use> | |||
</svg> Accueil</a> • | |||
<a href="https://lethain.com/openai-exploration/" title="Lien vers le contenu original">Source originale</a> | |||
</p> | |||
</nav> | |||
<hr> | |||
<p>I haven’t spent much time playing around with the latest LLMs, | |||
and decided to spend some time doing so. I was particularly curious | |||
about the usecase of using embeddings to supplement user prompts | |||
with additional, relevant data (e.g. supply the current status of their | |||
recent tickets into the prompt where they might inquire about progress on | |||
said tickets). This usecase is interesting because it’s very attainable | |||
for existing companies and products to take advantage of, and I imagine it’s | |||
roughly how e.g. Stripe’s GPT4 integration with their documentation works.</p> | |||
<p>To play around with that, I created a script that converts all of my writing | |||
into embeddings, tokenizes the user-supplied prompt to identify relevant sections | |||
of my content to inject into an expanded prompt, and sent that expanded prompt | |||
to OpenAI AI’s API.</p> | |||
<p>You can <a href="https://github.com/lethain/openai-experiments/blob/main/corpus.py">see the code on Github</a>, | |||
and read my notes on this project below.</p> | |||
<h2 id="references">References</h2> | |||
<p>This exploration is inspired by the recent work | |||
by <a href="https://eugeneyan.com/writing/llm-experiments/#llm-tools-to-summarize-query-and-advise">Eugene Yan</a> | |||
and <a href="https://simonwillison.net/2023/Apr/4/llm/">Simon Willison</a>. | |||
I owe particular thanks to <a href="https://twitter.com/eugeneyan/status/1646336530695467010">Eugene Yan</a> | |||
for his suggestions to improve the quality of the responses.</p> | |||
<p>The code I’m sharing below is scrapped together from a number of sources:</p> | |||
<p>I found none of the examples quite worked as documented, but ultimately I was able to get them working | |||
with some poking around, relearning Pandas, and so on.</p> | |||
<h2 id="project">Project</h2> | |||
<p>My project was to make the OpenAI API answer questions with awareness of all of my personal writing from this blog, | |||
<a href="https://staffeng.com">StaffEng</a> and <a href="https://infraeng.dev/">Infrastructure Engineering</a>. | |||
Specifically this means creating embeddings from Hugo blog posts in Markdown to use with OpenAI.</p> | |||
<p>You can <a href="https://github.com/lethain/openai-experiments/blob/main/corpus.py">read the code on Github</a>. | |||
I’ve done absolutely nothing to make it easy to read, but it is a complete example, and you could use | |||
it with your own writing by changing <a href="https://github.com/lethain/openai-experiments/blob/main/corpus.py#L112">Line 112</a> | |||
to point at your blog’s content directories. (Oh, and changing the prompts on <a href="https://github.com/lethain/openai-experiments/blob/main/corpus.py#L260">Line 260</a>.</p> | |||
<p>You can see a screenshot of what this looks like below.</p> | |||
<p><img src="/static/blog/2023/openai-experiment.png" alt="Screenshot of terminal program running Github lethain/openai-experiment"></p> | |||
<p>This project is pretty neat, in the sense that it works. It did take me a bit longer than expected, probably about three hours | |||
to get it working given some interruptions, mostly because the documentation’s examples were all subtly broken or didn’t actually connect | |||
together into working code. After it was working, I inevitably spent a few more hours fiddling around as well. | |||
My repo is terrible code, but is a full working code if anyone | |||
else had similar issues getting the question answering using embeddings stuff working!</p> | |||
<p>The other comment on this project is that I don’t really view this as a particularly effective solution to the problem I wanted to solve, | |||
as it’s performing a fairly basic k-means algorithm to match tokenized versions of my blog posts against the query, | |||
and then injecting the best matches into the GPT query as context. Going into this, I expected, I dunno, something more | |||
sophisticated than this. It’s a very reasonable solution, and a cost efficient solution because it avoids any model (re)training, | |||
but feels a bit more basic than I imagined.</p> | |||
<p>Also worth noting, the total cost to developing this app and running it a few dozen times: $0.50.</p> | |||
<h2 id="thoughts">Thoughts</h2> | |||
<p>This was a fun project, in part because it was a detour away from what I’ve spent most of my time on the last few months, | |||
which is writing my next book. Writing and editing a book is very valuable work, but it lacks the freeform joy of | |||
hacking around a small project with zero users. Without overthinking or overstructuring things too much, | |||
here are some bullet points thoughts about this project and expansion of AI in the industry at large:</p> | |||
<ul><li>As someone who’s been working in the industry for a while now, it’s easy to get jaded about new things. | |||
My first reaction to the recent AI hype is very similar to my first reaction to the crypto hype: | |||
we’ve seen hype before, and initial hype is rarely correlated with long-term impact on the industry | |||
or on society. In other words, I wasn’t convinced.</li><li>Conversely, I think part of long-term engineering leadership is remaining open to new things. | |||
The industry has radically changed from twenty years ago, with mobile development as the most obvious proof point. | |||
Most things won’t change the industry much, but some things will completely transform it, | |||
and we owe cautious interest to these potentially transformational projects.</li><li>My personal bet is that the new AI wave is moderately transformative but not massively so. | |||
Expanding on my thinking a bit, LLMs are showing significant promise at mediocre solutions to very general problems. | |||
A very common, often unstated, Silicon Valley model is to hire engineers, pretend the engineers are | |||
solving a problem, hire a huge number of non-engineers to actually solve the problem “until the technology automates it”, | |||
grow the business rapidly, and hope automation solves the margins in some later year. | |||
LLM adoption should be a valuable tool in improving margins in this kind of business, | |||
which in theory should enable new businesses to be created by improving the potential margin. | |||
However, we’ve been in a decade of <a href="https://www.readmargins.com/p/zirp-explains-the-world">zero-interest-rate policy</a> | |||
which has meant that current-year margins haven’t mattered much to folks, | |||
which implies that most of these ideas that should be enabled by improved margins should | |||
have already been attempted in the preceeding margin-agnostic decade. | |||
This means that LLMs will make those businesses better, but the businesses themselves should | |||
have already been tried, and many of them have failed ultimately due to market size preventing | |||
required returns moreso than margin of operating their large internal teams to mask over missing margin-enhancing technology.</li><li>If you ignore the margin-enhancement opporunties represented by LLMs, | |||
which I’ve argued shouldn’t generate new business ideas but improve existing business ideas already | |||
tried over the last decade, then it’s interesting to ponder what the sweet spot is for these tools. | |||
My take is that they’re very good at supporting domain experts, where the potential damaged caused by | |||
inaccuracies is constrained, e.g. Github Copilot is a very plausible way to empower a proficient programmer, | |||
and a very risky way to train a novice in a setting where the code has access to sensitive resources or data. | |||
However, to the extent that we’re pushing experts from authors to editors, I’m not sure that’s an actual speed | |||
improvement for our current generation of experts, who already have mastery in authorship and (often) a lesser | |||
skill in editing. Maybe there is a new generation of experts who are exceptional editors first, and authors second, | |||
which these tools will foster. If that’s true, then likely the current generation of leaders is unable to | |||
assess these tools appropriately, but&mldr; I think that most folks make this argument about most new technologies, | |||
and it’s only true sometimes. (Again, crypto is a clear example of something that has not overtaken existing | |||
technologies in the real world with significant regulatory overhead.)</li></ul> | |||
<p>Anyway, it was a fun project, and I have a much better intuitive sense of what’s possible | |||
in this space after spending some time here, which was my goal. I’ll remain very curious to | |||
see what comes together here as the timeline progresses.</p> | |||
<p class="mt6 instapaper_ignoref"></p> | |||
</article> | |||
<hr> | |||
<footer> | |||
<p> | |||
<a href="/david/" title="Aller à l’accueil"><svg class="icon icon-home"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-home"></use> | |||
</svg> Accueil</a> • | |||
<a href="/david/log/" title="Accès au flux RSS"><svg class="icon icon-rss2"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-rss2"></use> | |||
</svg> Suivre</a> • | |||
<a href="http://larlet.com" title="Go to my English profile" data-instant><svg class="icon icon-user-tie"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-user-tie"></use> | |||
</svg> Pro</a> • | |||
<a href="mailto:david%40larlet.fr" title="Envoyer un courriel"><svg class="icon icon-mail"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-mail"></use> | |||
</svg> Email</a> • | |||
<abbr class="nowrap" title="Hébergeur : Alwaysdata, 62 rue Tiquetonne 75002 Paris, +33184162340"><svg class="icon icon-hammer2"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-hammer2"></use> | |||
</svg> Légal</abbr> | |||
</p> | |||
<template id="theme-selector"> | |||
<form> | |||
<fieldset> | |||
<legend><svg class="icon icon-brightness-contrast"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-brightness-contrast"></use> | |||
</svg> Thème</legend> | |||
<label> | |||
<input type="radio" value="auto" name="chosen-color-scheme" checked> Auto | |||
</label> | |||
<label> | |||
<input type="radio" value="dark" name="chosen-color-scheme"> Foncé | |||
</label> | |||
<label> | |||
<input type="radio" value="light" name="chosen-color-scheme"> Clair | |||
</label> | |||
</fieldset> | |||
</form> | |||
</template> | |||
</footer> | |||
<script src="/static/david/js/instantpage-5.1.0.min.js" type="module"></script> | |||
<script> | |||
function loadThemeForm(templateName) { | |||
const themeSelectorTemplate = document.querySelector(templateName) | |||
const form = themeSelectorTemplate.content.firstElementChild | |||
themeSelectorTemplate.replaceWith(form) | |||
form.addEventListener('change', (e) => { | |||
const chosenColorScheme = e.target.value | |||
localStorage.setItem('theme', chosenColorScheme) | |||
toggleTheme(chosenColorScheme) | |||
}) | |||
const selectedTheme = localStorage.getItem('theme') | |||
if (selectedTheme && selectedTheme !== 'undefined') { | |||
form.querySelector(`[value="${selectedTheme}"]`).checked = true | |||
} | |||
} | |||
const prefersColorSchemeDark = '(prefers-color-scheme: dark)' | |||
window.addEventListener('load', () => { | |||
let hasDarkRules = false | |||
for (const styleSheet of Array.from(document.styleSheets)) { | |||
let mediaRules = [] | |||
for (const cssRule of styleSheet.cssRules) { | |||
if (cssRule.type !== CSSRule.MEDIA_RULE) { | |||
continue | |||
} | |||
// WARNING: Safari does not have/supports `conditionText`. | |||
if (cssRule.conditionText) { | |||
if (cssRule.conditionText !== prefersColorSchemeDark) { | |||
continue | |||
} | |||
} else { | |||
if (cssRule.cssText.startsWith(prefersColorSchemeDark)) { | |||
continue | |||
} | |||
} | |||
mediaRules = mediaRules.concat(Array.from(cssRule.cssRules)) | |||
} | |||
// WARNING: do not try to insert a Rule to a styleSheet you are | |||
// currently iterating on, otherwise the browser will be stuck | |||
// in a infinite loop… | |||
for (const mediaRule of mediaRules) { | |||
styleSheet.insertRule(mediaRule.cssText) | |||
hasDarkRules = true | |||
} | |||
} | |||
if (hasDarkRules) { | |||
loadThemeForm('#theme-selector') | |||
} | |||
}) | |||
</script> | |||
</body> | |||
</html> |
@@ -0,0 +1,70 @@ | |||
title: Poking around OpenAI. | |||
url: https://lethain.com/openai-exploration/ | |||
hash_url: 4a485034e94dc6123a624e8a589e8dac | |||
<p>I haven’t spent much time playing around with the latest LLMs, | |||
and decided to spend some time doing so. I was particularly curious | |||
about the usecase of using embeddings to supplement user prompts | |||
with additional, relevant data (e.g. supply the current status of their | |||
recent tickets into the prompt where they might inquire about progress on | |||
said tickets). This usecase is interesting because it’s very attainable | |||
for existing companies and products to take advantage of, and I imagine it’s | |||
roughly how e.g. Stripe’s GPT4 integration with their documentation works.</p><p>To play around with that, I created a script that converts all of my writing | |||
into embeddings, tokenizes the user-supplied prompt to identify relevant sections | |||
of my content to inject into an expanded prompt, and sent that expanded prompt | |||
to OpenAI AI’s API.</p><p>You can <a href="https://github.com/lethain/openai-experiments/blob/main/corpus.py">see the code on Github</a>, | |||
and read my notes on this project below.</p><h2 id="references">References</h2><p>This exploration is inspired by the recent work | |||
by <a href="https://eugeneyan.com/writing/llm-experiments/#llm-tools-to-summarize-query-and-advise">Eugene Yan</a> | |||
and <a href="https://simonwillison.net/2023/Apr/4/llm/">Simon Willison</a>. | |||
I owe particular thanks to <a href="https://twitter.com/eugeneyan/status/1646336530695467010">Eugene Yan</a> | |||
for his suggestions to improve the quality of the responses.</p><p>The code I’m sharing below is scrapped together from a number of sources:</p><p>I found none of the examples quite worked as documented, but ultimately I was able to get them working | |||
with some poking around, relearning Pandas, and so on.</p><h2 id="project">Project</h2><p>My project was to make the OpenAI API answer questions with awareness of all of my personal writing from this blog, | |||
<a href="https://staffeng.com">StaffEng</a> and <a href="https://infraeng.dev/">Infrastructure Engineering</a>. | |||
Specifically this means creating embeddings from Hugo blog posts in Markdown to use with OpenAI.</p><p>You can <a href="https://github.com/lethain/openai-experiments/blob/main/corpus.py">read the code on Github</a>. | |||
I’ve done absolutely nothing to make it easy to read, but it is a complete example, and you could use | |||
it with your own writing by changing <a href="https://github.com/lethain/openai-experiments/blob/main/corpus.py#L112">Line 112</a> | |||
to point at your blog’s content directories. (Oh, and changing the prompts on <a href="https://github.com/lethain/openai-experiments/blob/main/corpus.py#L260">Line 260</a>.</p><p>You can see a screenshot of what this looks like below.</p><p><img src="/static/blog/2023/openai-experiment.png" alt="Screenshot of terminal program running Github lethain/openai-experiment"></p><p>This project is pretty neat, in the sense that it works. It did take me a bit longer than expected, probably about three hours | |||
to get it working given some interruptions, mostly because the documentation’s examples were all subtly broken or didn’t actually connect | |||
together into working code. After it was working, I inevitably spent a few more hours fiddling around as well. | |||
My repo is terrible code, but is a full working code if anyone | |||
else had similar issues getting the question answering using embeddings stuff working!</p><p>The other comment on this project is that I don’t really view this as a particularly effective solution to the problem I wanted to solve, | |||
as it’s performing a fairly basic k-means algorithm to match tokenized versions of my blog posts against the query, | |||
and then injecting the best matches into the GPT query as context. Going into this, I expected, I dunno, something more | |||
sophisticated than this. It’s a very reasonable solution, and a cost efficient solution because it avoids any model (re)training, | |||
but feels a bit more basic than I imagined.</p><p>Also worth noting, the total cost to developing this app and running it a few dozen times: $0.50.</p><h2 id="thoughts">Thoughts</h2><p>This was a fun project, in part because it was a detour away from what I’ve spent most of my time on the last few months, | |||
which is writing my next book. Writing and editing a book is very valuable work, but it lacks the freeform joy of | |||
hacking around a small project with zero users. Without overthinking or overstructuring things too much, | |||
here are some bullet points thoughts about this project and expansion of AI in the industry at large:</p><ul><li>As someone who’s been working in the industry for a while now, it’s easy to get jaded about new things. | |||
My first reaction to the recent AI hype is very similar to my first reaction to the crypto hype: | |||
we’ve seen hype before, and initial hype is rarely correlated with long-term impact on the industry | |||
or on society. In other words, I wasn’t convinced.</li><li>Conversely, I think part of long-term engineering leadership is remaining open to new things. | |||
The industry has radically changed from twenty years ago, with mobile development as the most obvious proof point. | |||
Most things won’t change the industry much, but some things will completely transform it, | |||
and we owe cautious interest to these potentially transformational projects.</li><li>My personal bet is that the new AI wave is moderately transformative but not massively so. | |||
Expanding on my thinking a bit, LLMs are showing significant promise at mediocre solutions to very general problems. | |||
A very common, often unstated, Silicon Valley model is to hire engineers, pretend the engineers are | |||
solving a problem, hire a huge number of non-engineers to actually solve the problem “until the technology automates it”, | |||
grow the business rapidly, and hope automation solves the margins in some later year. | |||
LLM adoption should be a valuable tool in improving margins in this kind of business, | |||
which in theory should enable new businesses to be created by improving the potential margin. | |||
However, we’ve been in a decade of <a href="https://www.readmargins.com/p/zirp-explains-the-world">zero-interest-rate policy</a> | |||
which has meant that current-year margins haven’t mattered much to folks, | |||
which implies that most of these ideas that should be enabled by improved margins should | |||
have already been attempted in the preceeding margin-agnostic decade. | |||
This means that LLMs will make those businesses better, but the businesses themselves should | |||
have already been tried, and many of them have failed ultimately due to market size preventing | |||
required returns moreso than margin of operating their large internal teams to mask over missing margin-enhancing technology.</li><li>If you ignore the margin-enhancement opporunties represented by LLMs, | |||
which I’ve argued shouldn’t generate new business ideas but improve existing business ideas already | |||
tried over the last decade, then it’s interesting to ponder what the sweet spot is for these tools. | |||
My take is that they’re very good at supporting domain experts, where the potential damaged caused by | |||
inaccuracies is constrained, e.g. Github Copilot is a very plausible way to empower a proficient programmer, | |||
and a very risky way to train a novice in a setting where the code has access to sensitive resources or data. | |||
However, to the extent that we’re pushing experts from authors to editors, I’m not sure that’s an actual speed | |||
improvement for our current generation of experts, who already have mastery in authorship and (often) a lesser | |||
skill in editing. Maybe there is a new generation of experts who are exceptional editors first, and authors second, | |||
which these tools will foster. If that’s true, then likely the current generation of leaders is unable to | |||
assess these tools appropriately, but&mldr; I think that most folks make this argument about most new technologies, | |||
and it’s only true sometimes. (Again, crypto is a clear example of something that has not overtaken existing | |||
technologies in the real world with significant regulatory overhead.)</li></ul><p>Anyway, it was a fun project, and I have a much better intuitive sense of what’s possible | |||
in this space after spending some time here, which was my goal. I’ll remain very curious to | |||
see what comes together here as the timeline progresses.</p><p class="mt6 instapaper_ignoref"></p> |
@@ -0,0 +1,173 @@ | |||
<!doctype html><!-- This is a valid HTML5 document. --> | |||
<!-- Screen readers, SEO, extensions and so on. --> | |||
<html lang="fr"> | |||
<!-- Has to be within the first 1024 bytes, hence before the `title` element | |||
See: https://www.w3.org/TR/2012/CR-html5-20121217/document-metadata.html#charset --> | |||
<meta charset="utf-8"> | |||
<!-- Why no `X-UA-Compatible` meta: https://stackoverflow.com/a/6771584 --> | |||
<!-- The viewport meta is quite crowded and we are responsible for that. | |||
See: https://codepen.io/tigt/post/meta-viewport-for-2015 --> | |||
<meta name="viewport" content="width=device-width,initial-scale=1"> | |||
<!-- Required to make a valid HTML5 document. --> | |||
<title>GitHub Copilot AI pair programmer: Asset or Liability? (archive) — David Larlet</title> | |||
<meta name="description" content="Publication mise en cache pour en conserver une trace."> | |||
<!-- That good ol' feed, subscribe :). --> | |||
<link rel="alternate" type="application/atom+xml" title="Feed" href="/david/log/"> | |||
<!-- Generated from https://realfavicongenerator.net/ such a mess. --> | |||
<link rel="apple-touch-icon" sizes="180x180" href="/static/david/icons2/apple-touch-icon.png"> | |||
<link rel="icon" type="image/png" sizes="32x32" href="/static/david/icons2/favicon-32x32.png"> | |||
<link rel="icon" type="image/png" sizes="16x16" href="/static/david/icons2/favicon-16x16.png"> | |||
<link rel="manifest" href="/static/david/icons2/site.webmanifest"> | |||
<link rel="mask-icon" href="/static/david/icons2/safari-pinned-tab.svg" color="#07486c"> | |||
<link rel="shortcut icon" href="/static/david/icons2/favicon.ico"> | |||
<meta name="msapplication-TileColor" content="#f7f7f7"> | |||
<meta name="msapplication-config" content="/static/david/icons2/browserconfig.xml"> | |||
<meta name="theme-color" content="#f7f7f7" media="(prefers-color-scheme: light)"> | |||
<meta name="theme-color" content="#272727" media="(prefers-color-scheme: dark)"> | |||
<!-- Documented, feel free to shoot an email. --> | |||
<link rel="stylesheet" href="/static/david/css/style_2021-01-20.css"> | |||
<!-- See https://www.zachleat.com/web/comprehensive-webfonts/ for the trade-off. --> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_regular.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_bold.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_italic.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_regular.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_bold.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_italic.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<script> | |||
function toggleTheme(themeName) { | |||
document.documentElement.classList.toggle( | |||
'forced-dark', | |||
themeName === 'dark' | |||
) | |||
document.documentElement.classList.toggle( | |||
'forced-light', | |||
themeName === 'light' | |||
) | |||
} | |||
const selectedTheme = localStorage.getItem('theme') | |||
if (selectedTheme !== 'undefined') { | |||
toggleTheme(selectedTheme) | |||
} | |||
</script> | |||
<meta name="robots" content="noindex, nofollow"> | |||
<meta content="origin-when-cross-origin" name="referrer"> | |||
<!-- Canonical URL for SEO purposes --> | |||
<link rel="canonical" href="https://www.sciencedirect.com/science/article/abs/pii/S0164121223001292"> | |||
<body class="remarkdown h1-underline h2-underline h3-underline em-underscore hr-center ul-star pre-tick" data-instant-intensity="viewport-all"> | |||
<article> | |||
<header> | |||
<h1>GitHub Copilot AI pair programmer: Asset or Liability?</h1> | |||
</header> | |||
<nav> | |||
<p class="center"> | |||
<a href="/david/" title="Aller à l’accueil"><svg class="icon icon-home"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-home"></use> | |||
</svg> Accueil</a> • | |||
<a href="https://www.sciencedirect.com/science/article/abs/pii/S0164121223001292" title="Lien vers le contenu original">Source originale</a> | |||
</p> | |||
</nav> | |||
<hr> | |||
<h2>Abstract</h2> | |||
<p>Automatic program synthesis is a long-lasting dream in software engineering. Recently, a promising Deep Learning (DL) based solution, called Copilot, has been proposed by OpenAI and Microsoft as an industrial product. Although some studies evaluate the correctness of Copilot solutions and report its issues, more empirical evaluations are necessary to understand how developers can benefit from it effectively. In this paper, we study the capabilities of Copilot in two different programming tasks: (i) generating (and reproducing) correct and efficient solutions for fundamental algorithmic problems, and (ii) comparing Copilot’s proposed solutions with those of human programmers on a set of programming tasks. For the former, we assess the performance and functionality of Copilot in solving selected fundamental problems in computer science, like sorting and implementing data structures. In the latter, a dataset of programming problems with human-provided solutions is used. The results show that Copilot is capable of providing solutions for almost all fundamental algorithmic problems, however, some solutions are buggy and non-reproducible. Moreover, Copilot has some difficulties in combining multiple methods to generate a solution. Comparing Copilot to humans, our results show that the correct ratio of humans’ solutions is greater than Copilot’s suggestions, while the buggy solutions generated by Copilot require less effort to be repaired. Based on our findings, if Copilot is used by expert developers in software projects, it can become an asset since its suggestions could be comparable to humans’ contributions in terms of quality. However, Copilot can become a liability if it is used by novice developers who may fail to filter its buggy or non-optimal solutions due to a lack of expertise.</p> | |||
</article> | |||
<hr> | |||
<footer> | |||
<p> | |||
<a href="/david/" title="Aller à l’accueil"><svg class="icon icon-home"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-home"></use> | |||
</svg> Accueil</a> • | |||
<a href="/david/log/" title="Accès au flux RSS"><svg class="icon icon-rss2"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-rss2"></use> | |||
</svg> Suivre</a> • | |||
<a href="http://larlet.com" title="Go to my English profile" data-instant><svg class="icon icon-user-tie"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-user-tie"></use> | |||
</svg> Pro</a> • | |||
<a href="mailto:david%40larlet.fr" title="Envoyer un courriel"><svg class="icon icon-mail"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-mail"></use> | |||
</svg> Email</a> • | |||
<abbr class="nowrap" title="Hébergeur : Alwaysdata, 62 rue Tiquetonne 75002 Paris, +33184162340"><svg class="icon icon-hammer2"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-hammer2"></use> | |||
</svg> Légal</abbr> | |||
</p> | |||
<template id="theme-selector"> | |||
<form> | |||
<fieldset> | |||
<legend><svg class="icon icon-brightness-contrast"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-brightness-contrast"></use> | |||
</svg> Thème</legend> | |||
<label> | |||
<input type="radio" value="auto" name="chosen-color-scheme" checked> Auto | |||
</label> | |||
<label> | |||
<input type="radio" value="dark" name="chosen-color-scheme"> Foncé | |||
</label> | |||
<label> | |||
<input type="radio" value="light" name="chosen-color-scheme"> Clair | |||
</label> | |||
</fieldset> | |||
</form> | |||
</template> | |||
</footer> | |||
<script src="/static/david/js/instantpage-5.1.0.min.js" type="module"></script> | |||
<script> | |||
function loadThemeForm(templateName) { | |||
const themeSelectorTemplate = document.querySelector(templateName) | |||
const form = themeSelectorTemplate.content.firstElementChild | |||
themeSelectorTemplate.replaceWith(form) | |||
form.addEventListener('change', (e) => { | |||
const chosenColorScheme = e.target.value | |||
localStorage.setItem('theme', chosenColorScheme) | |||
toggleTheme(chosenColorScheme) | |||
}) | |||
const selectedTheme = localStorage.getItem('theme') | |||
if (selectedTheme && selectedTheme !== 'undefined') { | |||
form.querySelector(`[value="${selectedTheme}"]`).checked = true | |||
} | |||
} | |||
const prefersColorSchemeDark = '(prefers-color-scheme: dark)' | |||
window.addEventListener('load', () => { | |||
let hasDarkRules = false | |||
for (const styleSheet of Array.from(document.styleSheets)) { | |||
let mediaRules = [] | |||
for (const cssRule of styleSheet.cssRules) { | |||
if (cssRule.type !== CSSRule.MEDIA_RULE) { | |||
continue | |||
} | |||
// WARNING: Safari does not have/supports `conditionText`. | |||
if (cssRule.conditionText) { | |||
if (cssRule.conditionText !== prefersColorSchemeDark) { | |||
continue | |||
} | |||
} else { | |||
if (cssRule.cssText.startsWith(prefersColorSchemeDark)) { | |||
continue | |||
} | |||
} | |||
mediaRules = mediaRules.concat(Array.from(cssRule.cssRules)) | |||
} | |||
// WARNING: do not try to insert a Rule to a styleSheet you are | |||
// currently iterating on, otherwise the browser will be stuck | |||
// in a infinite loop… | |||
for (const mediaRule of mediaRules) { | |||
styleSheet.insertRule(mediaRule.cssText) | |||
hasDarkRules = true | |||
} | |||
} | |||
if (hasDarkRules) { | |||
loadThemeForm('#theme-selector') | |||
} | |||
}) | |||
</script> | |||
</body> | |||
</html> |
@@ -0,0 +1,7 @@ | |||
title: GitHub Copilot AI pair programmer: Asset or Liability? | |||
url: https://www.sciencedirect.com/science/article/abs/pii/S0164121223001292 | |||
hash_url: 6eef954bc8dd84322cf19ab38caf2ee3 | |||
## Abstract | |||
Automatic program synthesis is a long-lasting dream in software engineering. Recently, a promising Deep Learning (DL) based solution, called Copilot, has been proposed by OpenAI and Microsoft as an industrial product. Although some studies evaluate the correctness of Copilot solutions and report its issues, more empirical evaluations are necessary to understand how developers can benefit from it effectively. In this paper, we study the capabilities of Copilot in two different programming tasks: (i) generating (and reproducing) correct and efficient solutions for fundamental algorithmic problems, and (ii) comparing Copilot’s proposed solutions with those of human programmers on a set of programming tasks. For the former, we assess the performance and functionality of Copilot in solving selected fundamental problems in computer science, like sorting and implementing data structures. In the latter, a dataset of programming problems with human-provided solutions is used. The results show that Copilot is capable of providing solutions for almost all fundamental algorithmic problems, however, some solutions are buggy and non-reproducible. Moreover, Copilot has some difficulties in combining multiple methods to generate a solution. Comparing Copilot to humans, our results show that the correct ratio of humans’ solutions is greater than Copilot’s suggestions, while the buggy solutions generated by Copilot require less effort to be repaired. Based on our findings, if Copilot is used by expert developers in software projects, it can become an asset since its suggestions could be comparable to humans’ contributions in terms of quality. However, Copilot can become a liability if it is used by novice developers who may fail to filter its buggy or non-optimal solutions due to a lack of expertise. |
@@ -0,0 +1,191 @@ | |||
<!doctype html><!-- This is a valid HTML5 document. --> | |||
<!-- Screen readers, SEO, extensions and so on. --> | |||
<html lang="fr"> | |||
<!-- Has to be within the first 1024 bytes, hence before the `title` element | |||
See: https://www.w3.org/TR/2012/CR-html5-20121217/document-metadata.html#charset --> | |||
<meta charset="utf-8"> | |||
<!-- Why no `X-UA-Compatible` meta: https://stackoverflow.com/a/6771584 --> | |||
<!-- The viewport meta is quite crowded and we are responsible for that. | |||
See: https://codepen.io/tigt/post/meta-viewport-for-2015 --> | |||
<meta name="viewport" content="width=device-width,initial-scale=1"> | |||
<!-- Required to make a valid HTML5 document. --> | |||
<title>AIs can write for us but will we actually want them to? (archive) — David Larlet</title> | |||
<meta name="description" content="Publication mise en cache pour en conserver une trace."> | |||
<!-- That good ol' feed, subscribe :). --> | |||
<link rel="alternate" type="application/atom+xml" title="Feed" href="/david/log/"> | |||
<!-- Generated from https://realfavicongenerator.net/ such a mess. --> | |||
<link rel="apple-touch-icon" sizes="180x180" href="/static/david/icons2/apple-touch-icon.png"> | |||
<link rel="icon" type="image/png" sizes="32x32" href="/static/david/icons2/favicon-32x32.png"> | |||
<link rel="icon" type="image/png" sizes="16x16" href="/static/david/icons2/favicon-16x16.png"> | |||
<link rel="manifest" href="/static/david/icons2/site.webmanifest"> | |||
<link rel="mask-icon" href="/static/david/icons2/safari-pinned-tab.svg" color="#07486c"> | |||
<link rel="shortcut icon" href="/static/david/icons2/favicon.ico"> | |||
<meta name="msapplication-TileColor" content="#f7f7f7"> | |||
<meta name="msapplication-config" content="/static/david/icons2/browserconfig.xml"> | |||
<meta name="theme-color" content="#f7f7f7" media="(prefers-color-scheme: light)"> | |||
<meta name="theme-color" content="#272727" media="(prefers-color-scheme: dark)"> | |||
<!-- Documented, feel free to shoot an email. --> | |||
<link rel="stylesheet" href="/static/david/css/style_2021-01-20.css"> | |||
<!-- See https://www.zachleat.com/web/comprehensive-webfonts/ for the trade-off. --> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_regular.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_bold.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_italic.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: light), (prefers-color-scheme: no-preference)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_regular.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_bold.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<link rel="preload" href="/static/david/css/fonts/triplicate_t3_italic.woff2" as="font" type="font/woff2" media="(prefers-color-scheme: dark)" crossorigin> | |||
<script> | |||
function toggleTheme(themeName) { | |||
document.documentElement.classList.toggle( | |||
'forced-dark', | |||
themeName === 'dark' | |||
) | |||
document.documentElement.classList.toggle( | |||
'forced-light', | |||
themeName === 'light' | |||
) | |||
} | |||
const selectedTheme = localStorage.getItem('theme') | |||
if (selectedTheme !== 'undefined') { | |||
toggleTheme(selectedTheme) | |||
} | |||
</script> | |||
<meta name="robots" content="noindex, nofollow"> | |||
<meta content="origin-when-cross-origin" name="referrer"> | |||
<!-- Canonical URL for SEO purposes --> | |||
<link rel="canonical" href="https://www.bryanbraun.com/2023/04/14/ais-can-write-for-us-but-will-we-want-them-to/"> | |||
<body class="remarkdown h1-underline h2-underline h3-underline em-underscore hr-center ul-star pre-tick" data-instant-intensity="viewport-all"> | |||
<article> | |||
<header> | |||
<h1>AIs can write for us but will we actually want them to?</h1> | |||
</header> | |||
<nav> | |||
<p class="center"> | |||
<a href="/david/" title="Aller à l’accueil"><svg class="icon icon-home"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-home"></use> | |||
</svg> Accueil</a> • | |||
<a href="https://www.bryanbraun.com/2023/04/14/ais-can-write-for-us-but-will-we-want-them-to/" title="Lien vers le contenu original">Source originale</a> | |||
</p> | |||
</nav> | |||
<hr> | |||
<p>From <a href="https://openai.com/blog/chatgpt">ChatGPT</a> to <a href="https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/">Microsoft 365 Copilot</a>, we’re seeing a wave of AIs that can write and write well.</p> | |||
<p>In a recent post, Jim Nielsen described how having AIs write for us is a trade-off:</p> | |||
<blockquote> | |||
<p>“Writing is a moment for self-reflection, for providing the space and time necessary for the conception of thoughts or feelings that can change your heart or mind. Offloading that task to AI is not necessarily a net-gain, it is a trade-off. One to make consciously.”</p> | |||
<p>Jim Nielsen - <a href="https://blog.jim-nielsen.com/2023/more-everything-with-ai">More Everything With AI</a></p> | |||
</blockquote> | |||
<p>That made me think about my own writing. If I had to break down my current writing activity (not counting code), it would look something like this:</p> | |||
<ul> | |||
<li>10% - Journaling</li> | |||
<li>10% - <a href="https://www.bryanbraun.com/blog/">Blog posts</a></li> | |||
<li>20% - Texting and Personal Emails</li> | |||
<li>10% - Meeting notes / todos</li> | |||
<li>35% - Programming notes (usually to help me work through tricky coding issues)</li> | |||
<li>15% - <a href="https://www.bryanbraun.com/books/">Book notes</a></li> | |||
</ul> | |||
<p>Could I hand any of these over to AI?</p> | |||
<p>Definitely no on the journaling and blog posts, since those are basically pure self-reflection. It’s me figuring out what I believe. I could augment that a bit with spelling and grammar check tools, but it’s hard to imagine offloading more without compromising <a href="http://www.paulgraham.com/words.html">the process</a>.</p> | |||
<p>For texting and emails I already use autocomplete and <a href="https://support.google.com/mail/answer/9116836?hl=en&co=GENIE.Platform%3DDesktop">Smart Compose</a>. I also use Gmail templates for frequent responses, so I can’t see how I could automate this much further.</p> | |||
<p>Personal notes (for meetings, books, and coding) seems the most promising but I don’t think AI can do this for me either. When I take notes, I’m only interested in writing out the stuff that matters to me. Every book I read has a hundred summaries on the internet, each more detailed and comprehensive than mine, but I still take <a href="https://www.bryanbraun.com/books/">book notes</a> because I want to remember <a href="https://sive.rs/bfaq">what impacted me</a>. Even if an AI knew what those things were, delegating that work would defeat the purpose.</p> | |||
<p>So maybe I don’t want AIs to take over my writing but that doesn’t mean it’s useless. Autocomplete, grammar check, and Smart Compose… these tools are already AI powered. As AI tech progresses, I expect these tools to improve and become more pervasive, impacting my writing in little ways, mostly from the margins.</p> | |||
</article> | |||
<hr> | |||
<footer> | |||
<p> | |||
<a href="/david/" title="Aller à l’accueil"><svg class="icon icon-home"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-home"></use> | |||
</svg> Accueil</a> • | |||
<a href="/david/log/" title="Accès au flux RSS"><svg class="icon icon-rss2"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-rss2"></use> | |||
</svg> Suivre</a> • | |||
<a href="http://larlet.com" title="Go to my English profile" data-instant><svg class="icon icon-user-tie"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-user-tie"></use> | |||
</svg> Pro</a> • | |||
<a href="mailto:david%40larlet.fr" title="Envoyer un courriel"><svg class="icon icon-mail"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-mail"></use> | |||
</svg> Email</a> • | |||
<abbr class="nowrap" title="Hébergeur : Alwaysdata, 62 rue Tiquetonne 75002 Paris, +33184162340"><svg class="icon icon-hammer2"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-hammer2"></use> | |||
</svg> Légal</abbr> | |||
</p> | |||
<template id="theme-selector"> | |||
<form> | |||
<fieldset> | |||
<legend><svg class="icon icon-brightness-contrast"> | |||
<use xlink:href="/static/david/icons2/symbol-defs-2021-12.svg#icon-brightness-contrast"></use> | |||
</svg> Thème</legend> | |||
<label> | |||
<input type="radio" value="auto" name="chosen-color-scheme" checked> Auto | |||
</label> | |||
<label> | |||
<input type="radio" value="dark" name="chosen-color-scheme"> Foncé | |||
</label> | |||
<label> | |||
<input type="radio" value="light" name="chosen-color-scheme"> Clair | |||
</label> | |||
</fieldset> | |||
</form> | |||
</template> | |||
</footer> | |||
<script src="/static/david/js/instantpage-5.1.0.min.js" type="module"></script> | |||
<script> | |||
function loadThemeForm(templateName) { | |||
const themeSelectorTemplate = document.querySelector(templateName) | |||
const form = themeSelectorTemplate.content.firstElementChild | |||
themeSelectorTemplate.replaceWith(form) | |||
form.addEventListener('change', (e) => { | |||
const chosenColorScheme = e.target.value | |||
localStorage.setItem('theme', chosenColorScheme) | |||
toggleTheme(chosenColorScheme) | |||
}) | |||
const selectedTheme = localStorage.getItem('theme') | |||
if (selectedTheme && selectedTheme !== 'undefined') { | |||
form.querySelector(`[value="${selectedTheme}"]`).checked = true | |||
} | |||
} | |||
const prefersColorSchemeDark = '(prefers-color-scheme: dark)' | |||
window.addEventListener('load', () => { | |||
let hasDarkRules = false | |||
for (const styleSheet of Array.from(document.styleSheets)) { | |||
let mediaRules = [] | |||
for (const cssRule of styleSheet.cssRules) { | |||
if (cssRule.type !== CSSRule.MEDIA_RULE) { | |||
continue | |||
} | |||
// WARNING: Safari does not have/supports `conditionText`. | |||
if (cssRule.conditionText) { | |||
if (cssRule.conditionText !== prefersColorSchemeDark) { | |||
continue | |||
} | |||
} else { | |||
if (cssRule.cssText.startsWith(prefersColorSchemeDark)) { | |||
continue | |||
} | |||
} | |||
mediaRules = mediaRules.concat(Array.from(cssRule.cssRules)) | |||
} | |||
// WARNING: do not try to insert a Rule to a styleSheet you are | |||
// currently iterating on, otherwise the browser will be stuck | |||
// in a infinite loop… | |||
for (const mediaRule of mediaRules) { | |||
styleSheet.insertRule(mediaRule.cssText) | |||
hasDarkRules = true | |||
} | |||
} | |||
if (hasDarkRules) { | |||
loadThemeForm('#theme-selector') | |||
} | |||
}) | |||
</script> | |||
</body> | |||
</html> |
@@ -0,0 +1,24 @@ | |||
title: AIs can write for us but will we actually want them to? | |||
url: https://www.bryanbraun.com/2023/04/14/ais-can-write-for-us-but-will-we-want-them-to/ | |||
hash_url: 89aa5bbfeaa7c8f2411980f99801359c | |||
<p>From <a href="https://openai.com/blog/chatgpt">ChatGPT</a> to <a href="https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/">Microsoft 365 Copilot</a>, we’re seeing a wave of AIs that can write and write well.</p> | |||
<p>In a recent post, Jim Nielsen described how having AIs write for us is a trade-off:</p> | |||
<blockquote> | |||
<p>“Writing is a moment for self-reflection, for providing the space and time necessary for the conception of thoughts or feelings that can change your heart or mind. Offloading that task to AI is not necessarily a net-gain, it is a trade-off. One to make consciously.”</p> | |||
<p>Jim Nielsen - <a href="https://blog.jim-nielsen.com/2023/more-everything-with-ai">More Everything With AI</a></p> | |||
</blockquote> | |||
<p>That made me think about my own writing. If I had to break down my current writing activity (not counting code), it would look something like this:</p> | |||
<ul> | |||
<li>10% - Journaling</li> | |||
<li>10% - <a href="https://www.bryanbraun.com/blog/">Blog posts</a></li> | |||
<li>20% - Texting and Personal Emails</li> | |||
<li>10% - Meeting notes / todos</li> | |||
<li>35% - Programming notes (usually to help me work through tricky coding issues)</li> | |||