4 years ago · 2dc953dd44
--- a/cache/2020/67c8c54b07137bcfc0069fccd8261b53/index.html
+++ b/cache/2020/67c8c54b07137bcfc0069fccd8261b53/index.html
@@ -0,0 +1,703 @@
 <!doctype html><!-- This is a valid HTML5 document. -->
 <!-- Screen readers, SEO, extensions and so on. -->
 <html lang="fr">
 <!-- Has to be within the first 1024 bytes, hence before the <title>
  See: https://www.w3.org/TR/2012/CR-html5-20121217/document-metadata.html#charset -->
 <meta charset="utf-8">
 <!-- Why no `X-UA-Compatible` meta: https://stackoverflow.com/a/6771584 -->
 <!-- The viewport meta is quite crowded and we are responsible for that.
  See: https://codepen.io/tigt/post/meta-viewport-for-2015 -->
 <meta name="viewport" content="width=device-width,minimum-scale=1,initial-scale=1,shrink-to-fit=no">
 <!-- Required to make a valid HTML5 document. -->
 <title>Mercurial's Journey to and Reflections on Python 3 (archive) — David Larlet</title>
 <!-- Lightest blank gif, avoids an extra query to the server. -->
 <link rel="icon" href="data:;base64,iVBORw0KGgo=">
 <!-- Thank you Florens! -->
 <link rel="stylesheet" href="/static/david/css/style_2020-01-09.css">
 <!-- See https://www.zachleat.com/web/comprehensive-webfonts/ for the trade-off. -->
 <link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_regular.woff2" as="font" type="font/woff2" crossorigin>
 <link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_bold.woff2" as="font" type="font/woff2" crossorigin>
 <link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_italic.woff2" as="font" type="font/woff2" crossorigin>

  <meta name="robots" content="noindex, nofollow">
  <meta content="origin-when-cross-origin" name="referrer">
  <!-- Canonical URL for SEO purposes -->
  <link rel="canonical" href="https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journey-to-and-reflections-on-python-3/">

 <body class="remarkdown h1-underline h2-underline hr-center ul-star pre-tick">

 <article>
 <h1>Mercurial's Journey to and Reflections on Python 3</h1>
 <h2><a href="https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journey-to-and-reflections-on-python-3/">Source originale du contenu</a></h2>
 <p>Mercurial 5.2 was released on November 5, 2019. It is the first version
 of Mercurial that supports Python 3. This milestone comes nearly 11 years
 after Python 3.0 was first released on December 3, 2008.</p>

 <p>Speaking as a maintainer of Mercurial and an avid user of Python, I
 feel like the experience of making Mercurial work with Python 3 is
 worth sharing because there are a number of lessons to be learned.</p>

 <p>This post is logically divided into two sections: a mostly factual recount
 of Mercurial's Python 3 porting effort and a more opinionated commentary
 of the transition to Python 3 and the Python language ecosystem as a whole.
 Those who don't care about the mechanics of porting a large Python project
 to Python 3 may want to skip the next section or two.</p>

 <h2>Porting Mercurial to Python 3</h2>

 <p>Let's start with a brief history lesson of Mercurial's support for
 Python 3 as told by its own commit history.</p>

 <p>The Mercurial version control tool was first released in April 2005
 (the same month that Git was initially released). Version 1.0 came out
 in March 2008. The first reference to Python 3 I found in the code base
 was in <a href="https://www.mercurial-scm.org/repo/hg/rev/8fee8ff13d37">September 2008</a>.
 Then not much happens for a while until
 <a href="https://www.mercurial-scm.org/repo/hg/rev/4494fb02d549">June 2010</a>, when
 someone authors a bunch of changes to make the Python C extensions
 start to recognize Python 3. Then things were again quiet for a while
 until <a href="https://www.mercurial-scm.org/repo/hg/rev/56ef99fbd6f2">January 2013</a>,
 when a handful of changes landed to remove 2 argument <code>raise</code>. There were
 a handful of commits in 2014 but nothing worth calling out.</p>

 <p>Mercurial's meaningful journey to Python 3 started in 2015. In code,
 the work started in
 <a href="https://www.mercurial-scm.org/repo/hg/rev/af6e6a0781d7">April 2015</a>, with
 effort to make Mercurial's test harness run with Python 3. Part of
 this was a <a href="https://www.mercurial-scm.org/repo/hg/rev/fefc72523491">decision</a>
 that Python 3.5 (to be released several months later in September 2015)
 would be the minimum Python 3 version that Mercurial would support.</p>

 <p>Once the Mercurial Project decided it wanted to port to Python 3 (as opposed
 to another language), one of the earliest decisions was how to perform that
 port. <strong>Mercurial's code base was too large to attempt a flag day conversion</strong>
 where there would be a Python 2 version and a Python 3 version and one day
 everyone would switch from Python 2 to 3. <strong>Mercurial needed a way to run the
 same code (or as much of the same code) on both Python 2 and 3.</strong> We would
 maintain a single code base and users would gradually switch from running with
 Python 2 to Python 3.</p>

 <p>In <a href="https://www.mercurial-scm.org/repo/hg/rev/e1fb276d4619">May 2015</a>,
 Mercurial dropped support for Python 2.4 and 2.5. Dropping support for
 these older Python versions was critical, as it was effectively impossible to
 write Python code that ran on this wide gamut of versions because of
 incompatibilities in syntax and language features. For example, you needed
 Python 2.6 to get <code>print()</code> via <code>from __future__ import print_function</code>.
 The project's late start at a Python 3 port can be significantly attributed
 to Python 2.4 and 2.5 compatibility holding us back.</p>

 <p>The main goal with Mercurial's early porting work was just getting the code base
 to a point where <code>import mercurial</code> would work. There were a myriad of places
 where Mercurial used syntax that was invalid on Python 3 and Python 3
 couldn't even parse the source code, let alone compile it to bytecode and
 execute it.</p>

 <p>This effort began in earnest in
 <a href="https://www.mercurial-scm.org/repo/hg/rev/e93036747902">June 2015</a>
 with global source code rewrites like using modern octal syntax,
 modern exception catching syntax (<code>except Exception as e</code> instead of
 <code>except Exception, e</code>), <code>print()</code> instead of <code>print</code>, and a
 <a href="https://www.mercurial-scm.org/repo/hg/rev/1a6a117d0b95">modern import convention</a>
 along with the use of <code>from __future__ import absolute_import</code>.</p>

 <p>In the early days of the port, our first goal was to get all source code
 parsing as valid Python 3. The next step was to get all the modules <code>import</code>ing
 cleanly. This entailed fixing code that ran at <code>import</code> time to work on
 Python 3. Our thinking was that we would need the code base to be <code>import</code>
 clean on Python 3 before seriously thinking about run-time behavior. In reality,
 we quickly ported a lot of modules to <code>import</code> cleanly and then moved on
 to higher-level porting, leaving a long-tail of modules with <code>import</code> failures.</p>

 <p>This initial porting effort played out over months. There weren't many
 people working on it in the early days: a few people would basically hack on
 Python 3 as a form of itch scratching and most of the project's energy was
 focused on improving the existing Python 2 based product. You can get a rough
 idea of the timeline and participation in the early porting effort through the
 <a href="https://www.mercurial-scm.org/repo/hg/log/081a77df7bc6/tests/test-check-py3-compat.t?revcount=960">history of test-check-py3-compat.t</a>.
 We see the test being added in <a href="https://www.mercurial-scm.org/repo/hg/rev/40eb385f798f">December 2015</a>,
 By June 2016, most of the code base was ported to our modern import convention
 and we were ready to move on to more meaningful porting.</p>

 <p>One of the biggest early hurdles in our porting effort was how to overcome
 the string literals type mismatch between Python 2 and 3. In Python 2, a
 <code>''</code> string literal is a sequence of bytes. In Python 3, a <code>''</code> string literal
 is a sequence of Unicode code points. These are fundamentally different types.
 And in Mercurial's code base, <strong>most of our <em>string</em> types are binary by design:
 use of a Unicode based <code>str</code> for representing data is flat out wrong for our use
 case</strong>. We knew that Mercurial would need to eventually switch many string
 literals from <code>''</code> to <code>b''</code> to preserve type compatibility. But doing so would
 be problematic.</p>

 <p>In the early days of Mercurial's Python 3 port in 2015, Mercurial's project
 maintainer (Matt Mackall) set a ground rule that the Python 3 port shouldn't overly
 disrupt others: he wanted the Python 3 port to more or less happen in the background
 and not require every developer to be aware of Python 3's low-level behavior in order
 to get work done on the existing Python 2 code base. This may seem like a questionable
 decision (and I probably disagreed with him to some extent at the time because I was
 doing Python 3 porting work and the decision constrained this work). But it was the
 correct decision. Matt knew that it would be years before the Python 3 port was either
 necessary or resulted in a meaningful return on investment (the value proposition of
 Python 3 has always been weak to Mercurial because Python 3 doesn't demonstrate a
 compelling advantage over Python 2 for our use case). What Matt was trying to do was
 minimize the externalized costs that a Python 3 port would inflict on the project.
 He correctly recognized that maintaining the existing product and supporting
 existing users was more important than a long-term bet in its infancy.</p>

 <p>This ground rule meant that a mass insertion of <code>b''</code> prefixes everywhere
 was not desirable, as that would require developers to think about whether
 a type was a <code>bytes</code> or <code>str</code>, a distinction they didn't have to worry about
 on Python 2 because we practically never used the Unicode-based string type in
 Mercurial.</p>

 <p>In addition, there were some other practical issues with doing a bulk <code>b''</code>
 prefix insertion. One was that the added <code>b</code> characters would cause a lot of lines
 to grow beyond our length limits and we'd have to reformat code. That would
 require manual intervention and would significantly slow down porting. And
 a sub-issue of adding all the <code>b</code> prefixes and reformatting code is that it would
 <em>break</em> annotate/blame more than was tolerable. The latter issue was addressed
 by teaching Mercurial's annotate/blame feature to <em>skip</em> revisions. The project
 now has a convention of annotating commit messages with <code># skip-blame &lt;reason&gt;</code>
 so structural only changes can easily be ignored when performing an
 annotate/blame.</p>

 <p>A stop-gap solution to the <code>b''</code> everywhere issue came in
 <a href="https://www.mercurial-scm.org/repo/hg/rev/1c22400db72d">July 2016</a>, when I
 introduced a custom Python module importer that rewrote source code as part
 of <code>import</code> when running on Python 3. (I have
 <a href="/blog/2017/03/13/from-__past__-import-bytes_literals/">previously blogged</a>
 about this hack.) What this did was transparently add <code>b''</code> prefixes to all
 un-prefixed string literals as well as modify how a few common functions were
 called so that we wouldn't need to modify source code so things would run natively
 on Python 3. The source transformer allowed us to have the benefits of progressing
 in our Python 3 port without having to rewrite tens of thousands of lines of
 source code. The solution was hacky. But it enabled us to make significant
 progress on the Python 3 port without externalizing a lot of cost onto others.</p>

 <p>I thought the source transformer would be relatively short-lived and would be
 removed shortly after the project inevitably decided to go all in on Python 3.
 To my surprise, others built additional transforms over the years and the source
 transformer persisted all the way until
 <a href="https://www.mercurial-scm.org/repo/hg/rev/d783f945a701">October 2019</a>, when
 I removed it just before the first non-alpha Python 3 compatible version
 of Mercurial was released.</p>

 <p>A common problem Mercurial faced with making the code base dual Python 2/3 native
 was dealing with standard library differences. Most of the problems stemmed
 from changes between Python 2.7 and 3.5+. But there are changes within the
 versions of Python 3 that we had to wallpaper over as well. In
 <a href="https://www.mercurial-scm.org/repo/hg/rev/6041fb8f2da8">April 2016</a>, the
 <code>mercurial.pycompat</code> module was introduced to export aliases or wrappers around
 standard library functionality to abstract the differences between Python
 versions. This file <a href="https://www.mercurial-scm.org/repo/hg/log/66af68d4c751/mercurial/pycompat.py?revcount=240">grew over time</a>
 and <a href="https://www.mercurial-scm.org/repo/hg/file/66af68d4c751/mercurial/pycompat.py">eventually became</a>
 Mercurial's version of <a href="https://six.readthedocs.io/">six</a>. To be honest, I'm
 not sure if we should have used <code>six</code> from the beginning. <code>six</code> probably would
 have saved some work. But we had to eventually write a lot of shims for
 converting between <code>str</code> and <code>bytes</code> and would have needed to invent a
 <code>pycompat</code> layer in some form anyway. So I'm not sure <code>six</code> would have saved
 enough effort to justify the baggage of integrating a 3rd party package into
 Mercurial. (When Mercurial accepts a 3rd party package, downstream packagers
 like Debian get all hot and bothered and end up making questionable patches
 to our source code. So we prefer to minimize the surface area for
 problems by minimizing dependencies on 3rd party packages.)</p>

 <p>Once we had a source transforming module importer and the <code>pycompat</code>
 compatibility shim, we started to focus in earnest on making core
 functionality actually work on Python 3. We established a convention of
 annotating changesets needed for Python 3 with <code>py3</code>, so a
 <a href="https://www.mercurial-scm.org/repo/hg/log?rev=desc(py3)&amp;revcount=4000">commit message search</a>
 yields a lot of the history. (But it isn't a full history since not every Python 3
 oriented change used this convention). We see from that history that after
 the source importer landed, a lot of porting effort was spent on things
 very early in the <code>hg</code> process lifetime. This included handling environment
 variables, loading config files, and argument parsing. We introduced a
 <a href="https://www.mercurial-scm.org/repo/hg/log/@/tests/test-check-py3-commands.t">test-check-py3-commands.t</a>
 test to track the progress of <code>hg</code> commands working in Python 3. The very early
 history of that file shows the various error messages changing, as underlying
 early process functionality was slowly ported to work on Python 3. By
 <a href="https://www.mercurial-scm.org/repo/hg/rev/2d555d753f0e">December 2016</a>, we
 had <code>hg version</code> working on Python 3!</p>

 <p>With basic <code>hg</code> command dispatch ported to Python 3 at the end of 2016,
 2017 represented an inflection point in the Python 3 porting effort. With the
 early process functionality working, different people could pick up different
 commands and code paths and start making code work with Python 3. By
 <a href="https://www.mercurial-scm.org/repo/hg/rev/52ee1b5ac277">March 2017</a>, basic
 repository opening and <code>hg files</code> worked. Shortly thereafter,
 <a href="https://www.mercurial-scm.org/repo/hg/rev/ed23f929af38">hg init started working as well</a>.
 And <a href="https://www.mercurial-scm.org/repo/hg/rev/935a1b1117c7">hg status</a> and
 <a href="https://www.mercurial-scm.org/repo/hg/rev/aea8ec3f7dd1">hg commit</a> did as well.</p>

 <p>Within a few months, enough of Mercurial's functionality was working with Python
 3 that we started to <a href="https://www.mercurial-scm.org/repo/hg/rev/7a877e569ed6">track which tests passed on Python 3</a>.
 The <a href="https://www.mercurial-scm.org/repo/hg/log/@/contrib/python3-whitelist?revcount=480">evolution of this file</a>
 shows a reasonable history of the porting velocity.</p>

 <p>In <a href="https://www.mercurial-scm.org/repo/hg/rev/feb910d2f59b">May 2017</a>, we dropped
 support for Python 2.6. This significantly reduced the complexity of supporting
 Python 3, as there was tons of functionality in Python 2.7 that made it easier
 to target both Python 2 and 3 and now our hands were untied to utilize it.</p>

 <p>In <a href="https://www.mercurial-scm.org/repo/hg/rev/bd8875b6473c">November 2017</a>, I
 landed a test harness feature to report exceptions seen during test runs. I
 later <a href="https://www.mercurial-scm.org/repo/hg/rev/8de90e006c78">refined the output</a>
 so the most frequent failures were reported more prominently. This feature
 greatly enabled our ability to target the most common exceptions, allowing
 us to write patches to fix the most prevalent issues on Python 3 and uncover
 previously unknown failures.</p>

 <p>By the end of 2017, we had most of the structural pieces in place to complete
 the port. Essentially all that was required at that point was time and labor.
 We didn't have a formal mechanism in place to target porting efforts. Instead,
 people would pick up a component or test that they wanted to hack on and then
 make incremental changes towards making that work. All the while, we didn't
 have a strict policy on not regressing Python 3 and regressions in Python 3
 porting progress were semi-frequent. Although we did tend to correct
 regressions quickly. And over time, developers saw a flurry of Python 3
 patches and slowly grew awareness of how to accommodate Python 3, and the
 number of Python 3 regressions became less frequent.</p>

 <p>As useful as the source-transforming module importer was, it incurred some
 additional burden for the porting effort. The source transformer effectively
 converted all un-prefixed string literals (<code>''</code>) to bytes literals (<code>b''</code>)
 to preserve string type behavior with Python 2. But various aspects of Python
 3 didn't like the existence of <code>bytes</code>. Various standard library functionality
 now wanted unicode <code>str</code> and didn't accept <code>bytes</code>, even though the Python
 2 implementation used the equivalent of <code>bytes</code>. So our <code>pycompat</code> layer
 grew pretty large to accommodate calling into various standard library
 functionality. Another side-effect which we didn't initially anticipate
 was the <code>**kwargs</code> calling convention. Python allows you to use <code>**</code>
 with a dict with string keys to turn those keys into named arguments
 in a function call. But Python 3 requires these <code>dict</code> keys to be
 <code>str</code> and outright rejects <code>bytes</code> keys, even if the <code>bytes</code> instance
 is ASCII safe and has the same underlying byte representation of the
 string data as the <code>str</code> instance would. So we had to invent support
 functions that would convert <code>dict</code> keys from <code>bytes</code> to <code>str</code> for
 use with <code>**kwargs</code> and another to convert a <code>**kwargs</code> dict from
 <code>str</code> keys to <code>bytes</code> keys so we could use <code>''</code> syntax to access keys
 in our source code! Also on the string type front, we had to sprinkle
 the codebase with raw string literals (<code>r''</code>) to force the use of
 <code>str</code> irregardless of which Python version you were running on (our
 source transformer only changed unprefixed string literals, so existing
 <code>r''</code> strings would be preserved as <code>str</code>).</p>

 <p>Blind transformation of all string literals to <code>bytes</code> was less than ideal
 and it did impose some unwanted side-effects. But, again, most <em>strings</em>
 in Mercurial are bytes by design, so we thought it would be easier to
 <em>byteify</em> all strings then selectively undo that where native strings
 were actually warranted (like keys in most <code>dict</code>s) than to take the
 up-front cost to examine every string and make an intelligent determination
 as to what type it should be. I go back and forth as to whether this was the
 correct call. But when you factor in that the source transforming
 module importer unblocked Python 3 porting at a time in the project's
 history when there was so much focus on improving the core product and it
 did so without externalizing many costs onto the people doing the critical
 core product work, I think it was the right call.</p>

 <p>By mid 2019, the number of test failures in Python 3 had been whittled
 down to a reasonable, less daunting number. It felt like victory was
 in grasp and inevitable. But a few significant issues lingered.</p>

 <p>One remaining question was around addressing differences between Python
 3 versions. At the time, Python 3.5, 3.6, and 3.7 were released and 3.8
 was scheduled for release by the end of the year. We had a surprising
 number of issues with differences in Python 3 versions. Many of us
 were running Python 3.7, so it had the fewest failures. We had to spend
 extra effort to get Python 3.5 and 3.6 working as well as 3.7. Same for
 3.8.</p>

 <p>Another task we deferred until the second half of 2019 was standing up
 robust CI for Python 3. We had some coverage, but it was minimal. Wanting
 a distraction from PyOxidizer for a bit and wanting to overhaul Mercurial's
 CI system (which is officially built on Buildbot), I cobbled together a
 <em>serverless</em> CI system built on top of AWS DynamoDB and S3 for storage,
 Lambda functions and CloudWatch events for all business logic, and EC2 spot
 instances for job execution. This CI system executed Python 3.5, 3.6, 3.7,
 and 3.8 variants of our test harness on Linux and Python 3.7 on Windows.
 This gave developers insight into version-specific failures. More
 importantly, it also gave insight into Windows failures, which was
 previously not well tested. It was discovered that Python 3 on Windows was
 lagging significantly behind POSIX.</p>

 <p>By the time of the Mercurial developer meetup in October 2019, nearly
 all tests were passing on POSIX platforms and we were confident that
 we could declare Python 3 support as at least beta quality for the
 Mercurial 5.2 release, planned for early November.</p>

 <p>One of our blockers for ripping off the alpha label on Python 3 support
 was removing our source-transforming module importer. It had performance
 implications and it wasn't something we wanted to ship because it felt
 too hacky. A blocker for this was we wanted to automatically format
 our source tree with <a href="https://black.readthedocs.io/en/stable/">black</a>
 because if we removed the source transformer, we'd have to rewrite
 a lot of source code to apply changes the transformer was performing,
 which would necessitate wrapping a lot of lines, which would involve a lot
 of manual effort. We wanted to <em>blacken</em> our code base first so that
 mass rewriting source code wouldn't involve a lot of tedious reformatting
 since <code>black</code> would handle that for us automatically. And rewriting the
 source tree with <code>black</code> was blocked on a specific feature landing in
 <code>black</code>! (We did not agree with <code>black</code>'s behavior of
 unwrapping comma-delimited lists of items if they could fit on a single
 line. So one of our core contributors wrote a patch to <code>black</code> that
 changed its behavior so a trailing <code>,</code> in a list of items will force
 items to be formatted on multiple lines. I personally find the multiple line
 formatting much easier to read. And the behavior is arguably better for
 code review and <em>annotation</em>, which is line based.) Once this feature
 landed in <code>black</code>, we reformatted our source tree and started ripping
 out the source transformations, starting by inserting <code>b''</code> literals
 everywhere. By late October, the source transformer was no more and
 we were ready to release beta quality support for Python 3 (at least
 on UNIX-like platforms).</p>

 <p>Having described a mostly factual overview of Mercurial's port to Python
 3, it is now time to shift gears to the speculative and opinionated
 parts of this post. <strong>I want to underscore that the opinions reflected
 here are my own and do not reflect the overall Mercurial Project or even
 a consensus within it.</strong></p>

 <h2>The Future of Python 3 and Mercurial</h2>

 <p>Mercurial's port to Python 3 is still ongoing. While we've shipped
 Python 3 support and the test harness is clean on Python 3, I view shipping
 as only a milestone - arguably <em>the</em> most important one - in a longer
 journey. There's still a lot of work to do.</p>

 <p>It is now 2020 and Python 2 support is now officially dead from the
 perspective of the Python language maintainers. Linux distributions are
 starting to rip out Python 2. Packages are dropping Python 2 support in
 new versions. The world is moving to Python 3 only. But <strong>Mercurial still
 officially supports Python 2</strong>. And it is still yet to be determined how
 long we will retain support for Python 2 in the code base. We've only had
 one release supporting Python 3. Our users still need to port their
 extensions (implemented in Python). Our users still need to start widely
 using Mercurial with Python 3. Even our own developers need to switch to
 Python 3 (old habits are hard to break).</p>

 <p>I anticipate a long tail of random bugs in Mercurial on Python 3. While
 the tests may pass, our code coverage is not 100%. And even if it were,
 Python is a dynamic language and there are tons of invariants that aren't
 caught at compile time and can only be discovered at run time. <strong>These
 invariants cannot all be detected by tests, no matter how good your test
 coverage is.</strong> This is a <em>feature</em>/<em>limitation</em> of dynamic languages. Our
 users will likely be finding a long tail of miscellaneous bugs on Python
 3 for <em>years</em>.</p>

 <p>At present, our code base is littered with tons of random hacks to bridge
 the gap between Python 2 and 3. Once Python 2 support is dropped, we'll
 need to remove these hacks and make the source tree Python 3 native, with
 minimal shims to wallpaper over differences in Python 3 versions. <strong>Removing
 this Python version bridge code will likely require hundreds of commits and
 will be a non-trivial effort.</strong> It's likely to be deemed a low priority (it
 is glorified busy work after all), and code for the express purpose of
 supporting Python 2 will likely linger for years.</p>

 <p>We are also still shoring up our packaging and distribution story on
 Python 3. This is easier on some platforms than others. I created
 <a href="https://github.com/indygreg/PyOxidizer">PyOxidizer</a> partially because
 of the poor experience I had with Python application packaging and
 distribution through the Mercurial Project. The Mercurial Project has
 already signed off on using PyOxidizer for distributing Mercurial in
 the future. So look for an <em>oxidized</em> Mercurial distribution in the
 near future! (You could argue PyOxidizer is an epic yak shave to better
 support Mercurial. But that's for another post.)</p>

 <p>Then there's Windows support. A Python 3 powered Mercurial on Windows
 still has a handful of known issues. It may require a few more releases
 before we consider Python 3 on Windows to be stable.</p>

 <p>Because we're still on a code base that must support Python 2, our
 adoption of Python 3 features is very limited. The only Python 3
 feature that Mercurial developers seem to almost universally get excited
 about is type annotations. We already have some people playing around
 with <code>pytype</code> using comment-based annotations and <code>pytype</code> has already
 caught a few bugs. We're eager to go all in on type annotations and
 uncover lots of dynamic typing bugs and poorly implemented APIs.
 Beyond type annotations, I can't name any feature that people are screaming
 to adopt and which makes a lot of sense for Mercurial. There's a long
 tail of minor features I'm sure will get utilized. But none of the
 marquee features that define major language releases seem that interesting
 to us. Time will tell.</p>

 <h2>Commentary on Python 3</h2>

 <p>Having described Mercurial's ongoing journey to Python 3, I now want to
 focus more on Python itself. Again, the opinions here are my own and
 don't reflect those of the Mercurial Project.</p>

 <p><strong>Succinctly, my experience porting Mercurial and other projects to
 Python 3 has significantly soured my perceptions of Python. As much as
 I have historically loved Python - from the language to the welcoming
 community - I am still struggling to understand how Python could manage
 to inflict so much hardship on the community by choosing the transition
 plan that they did.</strong> I believe Python's choices represent a terrific
 example of what not to do when managing a large project or ecosystem.
 Maintainers of other largely-deployed systems would benefit from taking
 the time to understand and reflect on Python's missteps.</p>

 <p>Python 3.0 was released on December 3, 2008. And it took the better part of
 a decade for the community to embrace it. <strong>This should be universally
 recognized as a failure.</strong> While hindsight is 20/20, many of the issues
 with Python 3 were obvious at the time and could have been mitigated had
 the language maintainers been more accommodating - and dare I say
 empathetic - to its users.</p>

 <p>Initially, Python 3 had a rather cavalier attitude towards backwards and
 forwards compatibility. In the early years of Python 3, the attitude of
 Python's maintainers was <em>Python 3 is a new, better language: you should
 target it explicitly</em>. There were some tools and methods to ease the
 transition. But nothing super polished, especially in the early years.
 Adoption of Python 3 in the overall community was slow. Python developers
 in the wild justifiably complained that the value proposition of Python 3
 was too weak to justify porting effort. Not helping was that the early
 advice for targeting Python 3 was to rewrite the source code to become
 Python 3 native. This is in contrast with using the same source to run on both
 Python 2 and 3. For library and application maintainers, this potentially
 meant maintaining separate versions of your code or forcing end-users to
 make a giant leap, which would realistically orphan users on an old version,
 fragmenting your user base. Neither of those were great alternatives, so
 you can understand why many projects didn't bite.</p>

 <p>For many projects of non-trivial size, flag day transitions from Python 2 to
 3 were simply not viable: the pathway to Python 3 was to make code dual
 Python 2/3 compatible and gradually switch over the runtime to Python 3.
 But initial versions of Python 3 made this effectively impossible! Let me
 give a few specific examples.</p>

 <p>In Python 2, a string literal <code>''</code> is effectively an array of bytes. In
 Python 3, it is a series of Unicode code points - a fundamentally different
 type! In Python 2, you could write <code>b''</code> to be explicit that a string literal
 was bytes or you could write <code>u''</code> to indicate a Unicode literal, mimicking
 Python 3's behavior. In Python 3, you could write <code>b''</code> to create a <code>bytes</code>
 instance. But for whatever reason, Python 3 initially removed the <code>u''</code> syntax,
 meaning there wasn't as easy way to explicitly denote the type of each
 string literal so that it was consistent between Python 2 and 3! Python 3.3
 (released September 2012) restored <code>u''</code> support, making it more viable to
 write Python source code that worked on both Python 2 and 3. <strong>For nearly 4
 years, Python 3 took away the consistent syntax for denoting bytes/Unicode
 string literals.</strong></p>

 <p>Another feature was <code>%</code> formatting of strings. Python 2 allowed use of the
 <code>%</code> formatting operator on both its string types. But Python 3 initially
 removed the implementation of <code>%</code> from <code>bytes</code>. Why, I have no clue. It
 is perfectly reasonable to splice byte sequences into a buffer via use of
 a formatting string. But the Python language maintainers insisted otherwise.
 And it wasn't until the community complained about its absence loudly enough
 that this feature was
 <a href="https://docs.python.org/3/whatsnew/3.5.html#whatsnew-pep-461">restored in Python 3.5</a>,
 which was released in September 2015. Fun fact: the lack of this feature was
 once considered a blocker for Mercurial moving to Python 3 because
 Mercurial uses <code>bytes</code> almost universally, which meant that nearly every use
 of <code>%</code> would have to be changed to something else. And to this day, Python
 3's <code>bytes</code> still doesn't have a <code>format()</code> method, so the alternative was
 effectively string concatenation, which is a massive step backwards from the
 expressiveness of <code>%</code> formatting.</p>

 <p><strong>The initial approach of Python 3 mirrors a folly that many developers
 and projects make: attempting a rewrite instead of performing incremental
 evolution.</strong> For established projects, large scale rewrites often go poorly.
 And Python 3 is no exception. Yes, from a code level, CPython (and likely
 other Python implementations) were incremental changes over Python 2 using
 the same code base. But from a language and standard library level, the
 differences in Python 3 were significant enough that I - and even Python's
 core maintainers - considered it a new language, and therefore a rewrite.
 When your random project attempts a rewrite and fails, the blast radius of that is
 often contained to that project. Maybe you don't publish a new release
 as soon as you otherwise would. <strong>But when you are powering an ecosystem,
 the ripple effects from a failed rewrite percolate throughout that ecosystem
 and last for years and have many second order effects. We see this with
 Python 3, where poor choices made in the late 2000s are inflicting significant
 hardship still in 2020.</strong></p>

 <p>From the initial restrained adoption of Python 3, it is obvious that the
 Python ecosystem overwhelmingly rejected the initial boil the oceans approach
 of Python 3. Python's maintainers eventually got the message and started
 restoring features like <code>u''</code> and <code>bytes</code> <code>%</code> formatting back into the
 language to placate the community. All the while Python 3 had been accumulating
 new features and the cumulative sum of those features was compelling enough
 to win over users.</p>

 <p>For many projects (including Mercurial), Python 3.4/3.5 was the first viable
 porting target for Python 3. Python 3.5 was released in September 2015, almost
 7 years after Python 3.0 was released in December 2008. <strong>Seven. Years.</strong>
 An ecosystem that falters for that long is generally not healthy. What may have
 saved Python from total collapse here is that Python 2 was still going strong and
 people were generally happy with it. I really do think Python dodged a bullet
 here, because there was a massive window where the language could have
 hemorrhaged a critical amount of its user base and been relegated to an
 afterthought. One could draw an analogy to Perl, which lost out to PHP,
 Python, and Ruby, and whose fall from grace aligned with a lengthy
 transition from Perl 5 to 6.</p>

 <p>If you look back at the early history of Python 3, <strong>I think you are forced
 to conclude that Python effectively kneecapped itself for 5-7 years
 through questionable implementation choices that prevented users from
 incurring incremental transitions between the major language versions. 2008
 to 2013-2015 should be known as the <em>lost years of Python</em> because so much
 opportunity and energy was squandered.</strong> Yes, Python is still healthy today
 and Python 3 is (finally) being adopted at scale. But had earlier versions
 of Python 3 been more <em>empathetic</em> towards Python 2 users porting to it,
 Python and Python 3 in 2020 would be even stronger than it is. The community
 was artificially hindered for years. And we won't know until 2023-2025 what
 things could have looked like in 2020 had the Python core language team
 spent more time paving a smoother road between the major language versions.</p>

 <p>To be clear, I do think Python 3 is generally a better language than Python 2.
 It has fewer warts, more compelling features, and better performance (except
 for startup time, which is still slower than Python 2). I am ecstatic the
 community is finally rallying around Python 3! For my Python coding, it has
 reached the point where I curse under my breath when I need to support
 Python 2 or even older versions of Python 3, like 3.5 or 3.6: I just wish
 the world would move on and adopt the future already!</p>

 <p>But I would be remiss if I failed to mention some of my gripes with Python
 3 beyond the transition shenanigans.</p>

 <p>Perhaps my least favorite <em>feature</em> of Python 3 is its insistence that the
 world is Unicode. In Python 2, the default string type was backed by
 bytes. In Python 3, the default string type is backed by Unicode code
 points. As part of that transition, large parts of the standard library
 now operate in the Unicode space instead of the domain of bytes. I understand
 why Python does this: they want <em>strings</em> to be Unicode and don't want
 users to have to spend that much energy thinking about when to use
 <code>str</code> versus <code>bytes</code>. This approach is admirable and somewhat defensible
 because it takes a stand on a solution that is arguably <em>good enough</em> for
 most users. However, <strong>the approach of assuming the world is Unicode is
 flat out wrong and has significant implications for systems level
 applications</strong> (like version control tools).</p>

 <p>There are a myriad of places in Python's standard library where Python
 insists on using the Unicode-backed <code>str</code> type and rejects <code>bytes</code>. For
 example, various networking modules refuse to accept <code>bytes</code> for hostnames
 or URLs. HTTP libraries won't accept <code>bytes</code> for HTTP header names or values.
 Functions that are proxies to POSIX-defined functions won't accept <code>bytes</code>
 even though the POSIX function it calls into is using <code>char *</code> and isn't
 Unicode aware. Then there's filename handling, where Python assumes the
 existence of a global encoding for filenames and uses this encoding to convert
 between <code>str</code> and <code>bytes</code>. And it does this despite POSIX filesystem paths
 being a bag of bytes where the only rules are that <code>\0</code> terminates the
 filename and <code>/</code> is special.</p>

 <p>In cases like Python refusing to accept <code>bytes</code> for things like HTTP
 header names (which will just be spit out over the wire as bytes), Python's
 pendulum has swung too far towards Unicode only. In my opinion, Python needs
 to be more accommodating and allow <code>bytes</code> when it makes sense. I hope the
 pendulum knocks some sense into people when it swings back towards a more
 reasonable solution that better acknowledges the realities of the world we
 live in.</p>

 <p>For areas like filename handling, the world is more complicated. Python
 is effectively an abstraction layer over the operating system APIs exposing
 this functionality. And there is often an impedance mismatch between operating
 systems. For example, POSIX (Linux) tends to use <code>char *</code> for everything
 and doesn't care about encoding and Windows tends to use 16 bit character
 types where the encoding is... a can of worms.</p>

 <p><strong>The reality here is that it is impossible to abstract over differences
 between operating system behavior without compromises that can result in data
 loss, outright wrong behavior, or loss of functionality. But Python 3 attempts
 to do it anyway, making Python 3 unsuitable (or at least highly undesirable) for
 certain systems level applications that rely on it</strong> (like a version control
 tool).</p>

 <p>In fairness to Python, it isn't the only programming language that gets
 this wrong. The only language I've seen <em>properly</em> implement higher-order
 abstractions on top of operating system facilities is Rust, whose approach can
 be generalized as <em>use Python 3's solution of normalizing to Unicode/UTF-8 by
 default</em>, but expose <em>escape hatches</em> which allow access to the raw underlying
 types and APIs used by the operating system for the advanced consumers who
 require it. For example, Rust's <code>Path</code> type which represents a filesystem path
 <a href="https://doc.rust-lang.org/std/path/struct.Path.html#method.as_os_str">allows access</a>
 to the raw <a href="https://doc.rust-lang.org/std/ffi/struct.OsStr.html">OsStr</a> value
 used by the operating system, not a normalization of it to bytes or Unicode,
 which may be lossy. This allows consumers to e.g. create and retrieve
 OS-native filesystem paths without data loss. This functionality is critical
 in some domains. Python 3's awareness/insistence that the world is
 Unicode (which it isn't universally) reduces Python's applicability in these
 domains.</p>

 <p>Speaking of Rust, at the Mercurial developer meetup in October 2019, we were
 discussing the use of Rust in Mercurial and one of the core maintainers blurted
 out something along the lines of <em>if Rust were at its current state 5 years ago,
 Mercurial would have likely ported from Python 2 to Rust instead of Python 3</em>.
 As crazy as it initially sounded, I think I agree with that assessment. With the
 benefit of hindsight, having been a key player in the Python 3 porting effort,
 seeing all the complications and headaches Python 3 is introducing, and
 having learned Rust and witnessed its benefits for performance, control,
 and correctness firsthand, porting to Rust would likely have been the correct
 move for the project at that point in time. 2020 is not 2014, however, and I'm
 not sure if I would opt for a rewrite in Rust today. (Most rewrites are follies
 after all.) But I know one thing: I certainly wouldn't implement a new version
 control tool in Python 3 and I would probably choose Rust as an implementation
 language for most new projects in the systems level space or with an expected
 shelf life of 10+ years. (I really should blog about how awesome Rust is.)</p>

 <p>Back to the topic of Python itself, <strong>I'm really soured on Python at this
 point in time. The effort required to port to Python 3 was staggering. For
 Mercurial, Python 3 introduces a ton of problems and doesn't really solve
 many. We effectively sludged through mud for several years only to wind
 up in a state that feels strictly worse than where we started. I'm sure it will
 be strictly better in a few years. But at that point, we're talking about a
 5+ year transition. To call the Python 3 transition disruptive and
 distracting for the project would be an understatement. As a project maintainer,
 it's natural to ask what we could have accomplished if we weren't forced
 to carry out this sideshow.</strong></p>

 <p>I can't shake the feeling that a lot of the pain afflicted by the Python 3
 transition could have been avoided had Python's language leadership made
 a different set of decisions and more highly prioritized the transition
 experience. (Like not initially removing features like <code>u''</code> and <code>bytes %</code>
 and not introducing gratuitous backwards compatibility breaks, like with
 <code>items()/iteritems()</code>. I would have also liked to see a feature like
 <code>from __future__</code> - maybe <code>from __past__</code> - that would make it easier for
 Python 3 code to target semantics in earlier versions in order to provide
 a more turnkey on-ramp onto new versions.) I simultaneously see Python 3
 losing its position as a justifiable tool in some domains (like systems
 level tooling) due to ongoing design decisions and poor implementation (like
 startup overhead problems). (In contrast, I see Rust excelling where Python
 is faltering and find Rust code surprisingly expressive to write and maintain
 given how low-level it is and therefore feel that Rust is a compelling
 alternative to Python in a surprisingly large number of domains.)</p>

 <p>Look, I know it is easy for me to armchair quarterback and critique with the
 benefit of hindsight/ignorance. I'm sure there is a lot of nuance here. I'm
 sure there was disagreement within the Python community over a lot of these
 issues. Maintaining a large and successful programming language and community
 like Python's is hard and you aren't going to please all the people all the
 time. And speaking as a maintainer, I have mad respect for the people leading
 such a large community. But niceties aside, everyone knows the Python 3
 transition was rough and could have gone better. It should not have taken 11
 years to get to where we are today.</p>

 <p><strong>I'd like to encourage the Python Project to conduct a thorough postmortem on
 the transition to Python 3.</strong> Identify what went well, what could have gone
 better, and what should be done next time such a large language change is wanted.
 Speaking as a Python user, a maintainer of a Python project, and as someone in
 industry who is now skeptical about use of Python at work due to risks of
 potentially company crippling high-effort migrations in the future, a postmortem
 would help restore my confidence that Python's maintainers learned from the
 various missteps on the road to Python 3 and these potentially ecosystem
 crippling mistakes won't be made again.</p>

 <p>Python had a wildly successful past few decades. And it can continue to
 thrive for several more. But the Python 3 migration was painful for all
 involved. And as much as we need to move on and leave Python 2 behind us,
 there are some important lessons to be learned. I hope the Python community
 takes the opportunity to reflect and am confident it will grow stronger by
 taking the time to do so.</p>
 </article>


 <hr>

 <footer>
  <p>
    <a href="/david/" title="Aller à l’accueil">🏠</a> •
    <a href="/david/log/" title="Accès au flux RSS">🤖</a> •
    <a href="http://larlet.com" title="Go to my English profile" data-instant>🇨🇦</a> •
    <a href="mailto:david%40larlet.fr" title="Envoyer un courriel">📮</a> •
    <abbr title="Hébergeur : Alwaysdata, 62 rue Tiquetonne 75002 Paris, +33184162340">🧚</abbr>
  </p>
 </footer>
 <script src="/static/david/js/instantpage-3.0.0.min.js" type="module" defer></script>
 </body>
 </html>
--- a/cache/2020/67c8c54b07137bcfc0069fccd8261b53/index.md
+++ b/cache/2020/67c8c54b07137bcfc0069fccd8261b53/index.md
@@ -0,0 +1,592 @@
 title: Mercurial's Journey to and Reflections on Python 3
 url: https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journey-to-and-reflections-on-python-3/
 hash_url: 67c8c54b07137bcfc0069fccd8261b53

 <p>Mercurial 5.2 was released on November 5, 2019. It is the first version
 of Mercurial that supports Python 3. This milestone comes nearly 11 years
 after Python 3.0 was first released on December 3, 2008.</p>
 <p>Speaking as a maintainer of Mercurial and an avid user of Python, I
 feel like the experience of making Mercurial work with Python 3 is
 worth sharing because there are a number of lessons to be learned.</p>
 <p>This post is logically divided into two sections: a mostly factual recount
 of Mercurial's Python 3 porting effort and a more opinionated commentary
 of the transition to Python 3 and the Python language ecosystem as a whole.
 Those who don't care about the mechanics of porting a large Python project
 to Python 3 may want to skip the next section or two.</p>
 <h2>Porting Mercurial to Python 3</h2>
 <p>Let's start with a brief history lesson of Mercurial's support for
 Python 3 as told by its own commit history.</p>
 <p>The Mercurial version control tool was first released in April 2005
 (the same month that Git was initially released). Version 1.0 came out
 in March 2008. The first reference to Python 3 I found in the code base
 was in <a href="https://www.mercurial-scm.org/repo/hg/rev/8fee8ff13d37">September 2008</a>.
 Then not much happens for a while until
 <a href="https://www.mercurial-scm.org/repo/hg/rev/4494fb02d549">June 2010</a>, when
 someone authors a bunch of changes to make the Python C extensions
 start to recognize Python 3. Then things were again quiet for a while
 until <a href="https://www.mercurial-scm.org/repo/hg/rev/56ef99fbd6f2">January 2013</a>,
 when a handful of changes landed to remove 2 argument <code>raise</code>. There were
 a handful of commits in 2014 but nothing worth calling out.</p>
 <p>Mercurial's meaningful journey to Python 3 started in 2015. In code,
 the work started in
 <a href="https://www.mercurial-scm.org/repo/hg/rev/af6e6a0781d7">April 2015</a>, with
 effort to make Mercurial's test harness run with Python 3. Part of
 this was a <a href="https://www.mercurial-scm.org/repo/hg/rev/fefc72523491">decision</a>
 that Python 3.5 (to be released several months later in September 2015)
 would be the minimum Python 3 version that Mercurial would support.</p>
 <p>Once the Mercurial Project decided it wanted to port to Python 3 (as opposed
 to another language), one of the earliest decisions was how to perform that
 port. <strong>Mercurial's code base was too large to attempt a flag day conversion</strong>
 where there would be a Python 2 version and a Python 3 version and one day
 everyone would switch from Python 2 to 3. <strong>Mercurial needed a way to run the
 same code (or as much of the same code) on both Python 2 and 3.</strong> We would
 maintain a single code base and users would gradually switch from running with
 Python 2 to Python 3.</p>
 <p>In <a href="https://www.mercurial-scm.org/repo/hg/rev/e1fb276d4619">May 2015</a>,
 Mercurial dropped support for Python 2.4 and 2.5. Dropping support for
 these older Python versions was critical, as it was effectively impossible to
 write Python code that ran on this wide gamut of versions because of
 incompatibilities in syntax and language features. For example, you needed
 Python 2.6 to get <code>print()</code> via <code>from __future__ import print_function</code>.
 The project's late start at a Python 3 port can be significantly attributed
 to Python 2.4 and 2.5 compatibility holding us back.</p>
 <p>The main goal with Mercurial's early porting work was just getting the code base
 to a point where <code>import mercurial</code> would work. There were a myriad of places
 where Mercurial used syntax that was invalid on Python 3 and Python 3
 couldn't even parse the source code, let alone compile it to bytecode and
 execute it.</p>
 <p>This effort began in earnest in
 <a href="https://www.mercurial-scm.org/repo/hg/rev/e93036747902">June 2015</a>
 with global source code rewrites like using modern octal syntax,
 modern exception catching syntax (<code>except Exception as e</code> instead of
 <code>except Exception, e</code>), <code>print()</code> instead of <code>print</code>, and a
 <a href="https://www.mercurial-scm.org/repo/hg/rev/1a6a117d0b95">modern import convention</a>
 along with the use of <code>from __future__ import absolute_import</code>.</p>
 <p>In the early days of the port, our first goal was to get all source code
 parsing as valid Python 3. The next step was to get all the modules <code>import</code>ing
 cleanly. This entailed fixing code that ran at <code>import</code> time to work on
 Python 3. Our thinking was that we would need the code base to be <code>import</code>
 clean on Python 3 before seriously thinking about run-time behavior. In reality,
 we quickly ported a lot of modules to <code>import</code> cleanly and then moved on
 to higher-level porting, leaving a long-tail of modules with <code>import</code> failures.</p>
 <p>This initial porting effort played out over months. There weren't many
 people working on it in the early days: a few people would basically hack on
 Python 3 as a form of itch scratching and most of the project's energy was
 focused on improving the existing Python 2 based product. You can get a rough
 idea of the timeline and participation in the early porting effort through the
 <a href="https://www.mercurial-scm.org/repo/hg/log/081a77df7bc6/tests/test-check-py3-compat.t?revcount=960">history of test-check-py3-compat.t</a>.
 We see the test being added in <a href="https://www.mercurial-scm.org/repo/hg/rev/40eb385f798f">December 2015</a>,
 By June 2016, most of the code base was ported to our modern import convention
 and we were ready to move on to more meaningful porting.</p>
 <p>One of the biggest early hurdles in our porting effort was how to overcome
 the string literals type mismatch between Python 2 and 3. In Python 2, a
 <code>''</code> string literal is a sequence of bytes. In Python 3, a <code>''</code> string literal
 is a sequence of Unicode code points. These are fundamentally different types.
 And in Mercurial's code base, <strong>most of our <em>string</em> types are binary by design:
 use of a Unicode based <code>str</code> for representing data is flat out wrong for our use
 case</strong>. We knew that Mercurial would need to eventually switch many string
 literals from <code>''</code> to <code>b''</code> to preserve type compatibility. But doing so would
 be problematic.</p>
 <p>In the early days of Mercurial's Python 3 port in 2015, Mercurial's project
 maintainer (Matt Mackall) set a ground rule that the Python 3 port shouldn't overly
 disrupt others: he wanted the Python 3 port to more or less happen in the background
 and not require every developer to be aware of Python 3's low-level behavior in order
 to get work done on the existing Python 2 code base. This may seem like a questionable
 decision (and I probably disagreed with him to some extent at the time because I was
 doing Python 3 porting work and the decision constrained this work). But it was the
 correct decision. Matt knew that it would be years before the Python 3 port was either
 necessary or resulted in a meaningful return on investment (the value proposition of
 Python 3 has always been weak to Mercurial because Python 3 doesn't demonstrate a
 compelling advantage over Python 2 for our use case). What Matt was trying to do was
 minimize the externalized costs that a Python 3 port would inflict on the project.
 He correctly recognized that maintaining the existing product and supporting
 existing users was more important than a long-term bet in its infancy.</p>
 <p>This ground rule meant that a mass insertion of <code>b''</code> prefixes everywhere
 was not desirable, as that would require developers to think about whether
 a type was a <code>bytes</code> or <code>str</code>, a distinction they didn't have to worry about
 on Python 2 because we practically never used the Unicode-based string type in
 Mercurial.</p>
 <p>In addition, there were some other practical issues with doing a bulk <code>b''</code>
 prefix insertion. One was that the added <code>b</code> characters would cause a lot of lines
 to grow beyond our length limits and we'd have to reformat code. That would
 require manual intervention and would significantly slow down porting. And
 a sub-issue of adding all the <code>b</code> prefixes and reformatting code is that it would
 <em>break</em> annotate/blame more than was tolerable. The latter issue was addressed
 by teaching Mercurial's annotate/blame feature to <em>skip</em> revisions. The project
 now has a convention of annotating commit messages with <code># skip-blame &lt;reason&gt;</code>
 so structural only changes can easily be ignored when performing an
 annotate/blame.</p>
 <p>A stop-gap solution to the <code>b''</code> everywhere issue came in
 <a href="https://www.mercurial-scm.org/repo/hg/rev/1c22400db72d">July 2016</a>, when I
 introduced a custom Python module importer that rewrote source code as part
 of <code>import</code> when running on Python 3. (I have
 <a href="/blog/2017/03/13/from-__past__-import-bytes_literals/">previously blogged</a>
 about this hack.) What this did was transparently add <code>b''</code> prefixes to all
 un-prefixed string literals as well as modify how a few common functions were
 called so that we wouldn't need to modify source code so things would run natively
 on Python 3. The source transformer allowed us to have the benefits of progressing
 in our Python 3 port without having to rewrite tens of thousands of lines of
 source code. The solution was hacky. But it enabled us to make significant
 progress on the Python 3 port without externalizing a lot of cost onto others.</p>
 <p>I thought the source transformer would be relatively short-lived and would be
 removed shortly after the project inevitably decided to go all in on Python 3.
 To my surprise, others built additional transforms over the years and the source
 transformer persisted all the way until
 <a href="https://www.mercurial-scm.org/repo/hg/rev/d783f945a701">October 2019</a>, when
 I removed it just before the first non-alpha Python 3 compatible version
 of Mercurial was released.</p>
 <p>A common problem Mercurial faced with making the code base dual Python 2/3 native
 was dealing with standard library differences. Most of the problems stemmed
 from changes between Python 2.7 and 3.5+. But there are changes within the
 versions of Python 3 that we had to wallpaper over as well. In
 <a href="https://www.mercurial-scm.org/repo/hg/rev/6041fb8f2da8">April 2016</a>, the
 <code>mercurial.pycompat</code> module was introduced to export aliases or wrappers around
 standard library functionality to abstract the differences between Python
 versions. This file <a href="https://www.mercurial-scm.org/repo/hg/log/66af68d4c751/mercurial/pycompat.py?revcount=240">grew over time</a>
 and <a href="https://www.mercurial-scm.org/repo/hg/file/66af68d4c751/mercurial/pycompat.py">eventually became</a>
 Mercurial's version of <a href="https://six.readthedocs.io/">six</a>. To be honest, I'm
 not sure if we should have used <code>six</code> from the beginning. <code>six</code> probably would
 have saved some work. But we had to eventually write a lot of shims for
 converting between <code>str</code> and <code>bytes</code> and would have needed to invent a
 <code>pycompat</code> layer in some form anyway. So I'm not sure <code>six</code> would have saved
 enough effort to justify the baggage of integrating a 3rd party package into
 Mercurial. (When Mercurial accepts a 3rd party package, downstream packagers
 like Debian get all hot and bothered and end up making questionable patches
 to our source code. So we prefer to minimize the surface area for
 problems by minimizing dependencies on 3rd party packages.)</p>
 <p>Once we had a source transforming module importer and the <code>pycompat</code>
 compatibility shim, we started to focus in earnest on making core
 functionality actually work on Python 3. We established a convention of
 annotating changesets needed for Python 3 with <code>py3</code>, so a
 <a href="https://www.mercurial-scm.org/repo/hg/log?rev=desc(py3)&amp;revcount=4000">commit message search</a>
 yields a lot of the history. (But it isn't a full history since not every Python 3
 oriented change used this convention). We see from that history that after
 the source importer landed, a lot of porting effort was spent on things
 very early in the <code>hg</code> process lifetime. This included handling environment
 variables, loading config files, and argument parsing. We introduced a
 <a href="https://www.mercurial-scm.org/repo/hg/log/@/tests/test-check-py3-commands.t">test-check-py3-commands.t</a>
 test to track the progress of <code>hg</code> commands working in Python 3. The very early
 history of that file shows the various error messages changing, as underlying
 early process functionality was slowly ported to work on Python 3. By
 <a href="https://www.mercurial-scm.org/repo/hg/rev/2d555d753f0e">December 2016</a>, we
 had <code>hg version</code> working on Python 3!</p>
 <p>With basic <code>hg</code> command dispatch ported to Python 3 at the end of 2016,
 2017 represented an inflection point in the Python 3 porting effort. With the
 early process functionality working, different people could pick up different
 commands and code paths and start making code work with Python 3. By
 <a href="https://www.mercurial-scm.org/repo/hg/rev/52ee1b5ac277">March 2017</a>, basic
 repository opening and <code>hg files</code> worked. Shortly thereafter,
 <a href="https://www.mercurial-scm.org/repo/hg/rev/ed23f929af38">hg init started working as well</a>.
 And <a href="https://www.mercurial-scm.org/repo/hg/rev/935a1b1117c7">hg status</a> and
 <a href="https://www.mercurial-scm.org/repo/hg/rev/aea8ec3f7dd1">hg commit</a> did as well.</p>
 <p>Within a few months, enough of Mercurial's functionality was working with Python
 3 that we started to <a href="https://www.mercurial-scm.org/repo/hg/rev/7a877e569ed6">track which tests passed on Python 3</a>.
 The <a href="https://www.mercurial-scm.org/repo/hg/log/@/contrib/python3-whitelist?revcount=480">evolution of this file</a>
 shows a reasonable history of the porting velocity.</p>
 <p>In <a href="https://www.mercurial-scm.org/repo/hg/rev/feb910d2f59b">May 2017</a>, we dropped
 support for Python 2.6. This significantly reduced the complexity of supporting
 Python 3, as there was tons of functionality in Python 2.7 that made it easier
 to target both Python 2 and 3 and now our hands were untied to utilize it.</p>
 <p>In <a href="https://www.mercurial-scm.org/repo/hg/rev/bd8875b6473c">November 2017</a>, I
 landed a test harness feature to report exceptions seen during test runs. I
 later <a href="https://www.mercurial-scm.org/repo/hg/rev/8de90e006c78">refined the output</a>
 so the most frequent failures were reported more prominently. This feature
 greatly enabled our ability to target the most common exceptions, allowing
 us to write patches to fix the most prevalent issues on Python 3 and uncover
 previously unknown failures.</p>
 <p>By the end of 2017, we had most of the structural pieces in place to complete
 the port. Essentially all that was required at that point was time and labor.
 We didn't have a formal mechanism in place to target porting efforts. Instead,
 people would pick up a component or test that they wanted to hack on and then
 make incremental changes towards making that work. All the while, we didn't
 have a strict policy on not regressing Python 3 and regressions in Python 3
 porting progress were semi-frequent. Although we did tend to correct
 regressions quickly. And over time, developers saw a flurry of Python 3
 patches and slowly grew awareness of how to accommodate Python 3, and the
 number of Python 3 regressions became less frequent.</p>
 <p>As useful as the source-transforming module importer was, it incurred some
 additional burden for the porting effort. The source transformer effectively
 converted all un-prefixed string literals (<code>''</code>) to bytes literals (<code>b''</code>)
 to preserve string type behavior with Python 2. But various aspects of Python
 3 didn't like the existence of <code>bytes</code>. Various standard library functionality
 now wanted unicode <code>str</code> and didn't accept <code>bytes</code>, even though the Python
 2 implementation used the equivalent of <code>bytes</code>. So our <code>pycompat</code> layer
 grew pretty large to accommodate calling into various standard library
 functionality. Another side-effect which we didn't initially anticipate
 was the <code>**kwargs</code> calling convention. Python allows you to use <code>**</code>
 with a dict with string keys to turn those keys into named arguments
 in a function call. But Python 3 requires these <code>dict</code> keys to be
 <code>str</code> and outright rejects <code>bytes</code> keys, even if the <code>bytes</code> instance
 is ASCII safe and has the same underlying byte representation of the
 string data as the <code>str</code> instance would. So we had to invent support
 functions that would convert <code>dict</code> keys from <code>bytes</code> to <code>str</code> for
 use with <code>**kwargs</code> and another to convert a <code>**kwargs</code> dict from
 <code>str</code> keys to <code>bytes</code> keys so we could use <code>''</code> syntax to access keys
 in our source code! Also on the string type front, we had to sprinkle
 the codebase with raw string literals (<code>r''</code>) to force the use of
 <code>str</code> irregardless of which Python version you were running on (our
 source transformer only changed unprefixed string literals, so existing
 <code>r''</code> strings would be preserved as <code>str</code>).</p>
 <p>Blind transformation of all string literals to <code>bytes</code> was less than ideal
 and it did impose some unwanted side-effects. But, again, most <em>strings</em>
 in Mercurial are bytes by design, so we thought it would be easier to
 <em>byteify</em> all strings then selectively undo that where native strings
 were actually warranted (like keys in most <code>dict</code>s) than to take the
 up-front cost to examine every string and make an intelligent determination
 as to what type it should be. I go back and forth as to whether this was the
 correct call. But when you factor in that the source transforming
 module importer unblocked Python 3 porting at a time in the project's
 history when there was so much focus on improving the core product and it
 did so without externalizing many costs onto the people doing the critical
 core product work, I think it was the right call.</p>
 <p>By mid 2019, the number of test failures in Python 3 had been whittled
 down to a reasonable, less daunting number. It felt like victory was
 in grasp and inevitable. But a few significant issues lingered.</p>
 <p>One remaining question was around addressing differences between Python
 3 versions. At the time, Python 3.5, 3.6, and 3.7 were released and 3.8
 was scheduled for release by the end of the year. We had a surprising
 number of issues with differences in Python 3 versions. Many of us
 were running Python 3.7, so it had the fewest failures. We had to spend
 extra effort to get Python 3.5 and 3.6 working as well as 3.7. Same for
 3.8.</p>
 <p>Another task we deferred until the second half of 2019 was standing up
 robust CI for Python 3. We had some coverage, but it was minimal. Wanting
 a distraction from PyOxidizer for a bit and wanting to overhaul Mercurial's
 CI system (which is officially built on Buildbot), I cobbled together a
 <em>serverless</em> CI system built on top of AWS DynamoDB and S3 for storage,
 Lambda functions and CloudWatch events for all business logic, and EC2 spot
 instances for job execution. This CI system executed Python 3.5, 3.6, 3.7,
 and 3.8 variants of our test harness on Linux and Python 3.7 on Windows.
 This gave developers insight into version-specific failures. More
 importantly, it also gave insight into Windows failures, which was
 previously not well tested. It was discovered that Python 3 on Windows was
 lagging significantly behind POSIX.</p>
 <p>By the time of the Mercurial developer meetup in October 2019, nearly
 all tests were passing on POSIX platforms and we were confident that
 we could declare Python 3 support as at least beta quality for the
 Mercurial 5.2 release, planned for early November.</p>
 <p>One of our blockers for ripping off the alpha label on Python 3 support
 was removing our source-transforming module importer. It had performance
 implications and it wasn't something we wanted to ship because it felt
 too hacky. A blocker for this was we wanted to automatically format
 our source tree with <a href="https://black.readthedocs.io/en/stable/">black</a>
 because if we removed the source transformer, we'd have to rewrite
 a lot of source code to apply changes the transformer was performing,
 which would necessitate wrapping a lot of lines, which would involve a lot
 of manual effort. We wanted to <em>blacken</em> our code base first so that
 mass rewriting source code wouldn't involve a lot of tedious reformatting
 since <code>black</code> would handle that for us automatically. And rewriting the
 source tree with <code>black</code> was blocked on a specific feature landing in
 <code>black</code>! (We did not agree with <code>black</code>'s behavior of
 unwrapping comma-delimited lists of items if they could fit on a single
 line. So one of our core contributors wrote a patch to <code>black</code> that
 changed its behavior so a trailing <code>,</code> in a list of items will force
 items to be formatted on multiple lines. I personally find the multiple line
 formatting much easier to read. And the behavior is arguably better for
 code review and <em>annotation</em>, which is line based.) Once this feature
 landed in <code>black</code>, we reformatted our source tree and started ripping
 out the source transformations, starting by inserting <code>b''</code> literals
 everywhere. By late October, the source transformer was no more and
 we were ready to release beta quality support for Python 3 (at least
 on UNIX-like platforms).</p>
 <p>Having described a mostly factual overview of Mercurial's port to Python
 3, it is now time to shift gears to the speculative and opinionated
 parts of this post. <strong>I want to underscore that the opinions reflected
 here are my own and do not reflect the overall Mercurial Project or even
 a consensus within it.</strong></p>
 <h2>The Future of Python 3 and Mercurial</h2>
 <p>Mercurial's port to Python 3 is still ongoing. While we've shipped
 Python 3 support and the test harness is clean on Python 3, I view shipping
 as only a milestone - arguably <em>the</em> most important one - in a longer
 journey. There's still a lot of work to do.</p>
 <p>It is now 2020 and Python 2 support is now officially dead from the
 perspective of the Python language maintainers. Linux distributions are
 starting to rip out Python 2. Packages are dropping Python 2 support in
 new versions. The world is moving to Python 3 only. But <strong>Mercurial still
 officially supports Python 2</strong>. And it is still yet to be determined how
 long we will retain support for Python 2 in the code base. We've only had
 one release supporting Python 3. Our users still need to port their
 extensions (implemented in Python). Our users still need to start widely
 using Mercurial with Python 3. Even our own developers need to switch to
 Python 3 (old habits are hard to break).</p>
 <p>I anticipate a long tail of random bugs in Mercurial on Python 3. While
 the tests may pass, our code coverage is not 100%. And even if it were,
 Python is a dynamic language and there are tons of invariants that aren't
 caught at compile time and can only be discovered at run time. <strong>These
 invariants cannot all be detected by tests, no matter how good your test
 coverage is.</strong> This is a <em>feature</em>/<em>limitation</em> of dynamic languages. Our
 users will likely be finding a long tail of miscellaneous bugs on Python
 3 for <em>years</em>.</p>
 <p>At present, our code base is littered with tons of random hacks to bridge
 the gap between Python 2 and 3. Once Python 2 support is dropped, we'll
 need to remove these hacks and make the source tree Python 3 native, with
 minimal shims to wallpaper over differences in Python 3 versions. <strong>Removing
 this Python version bridge code will likely require hundreds of commits and
 will be a non-trivial effort.</strong> It's likely to be deemed a low priority (it
 is glorified busy work after all), and code for the express purpose of
 supporting Python 2 will likely linger for years.</p>
 <p>We are also still shoring up our packaging and distribution story on
 Python 3. This is easier on some platforms than others. I created
 <a href="https://github.com/indygreg/PyOxidizer">PyOxidizer</a> partially because
 of the poor experience I had with Python application packaging and
 distribution through the Mercurial Project. The Mercurial Project has
 already signed off on using PyOxidizer for distributing Mercurial in
 the future. So look for an <em>oxidized</em> Mercurial distribution in the
 near future! (You could argue PyOxidizer is an epic yak shave to better
 support Mercurial. But that's for another post.)</p>
 <p>Then there's Windows support. A Python 3 powered Mercurial on Windows
 still has a handful of known issues. It may require a few more releases
 before we consider Python 3 on Windows to be stable.</p>
 <p>Because we're still on a code base that must support Python 2, our
 adoption of Python 3 features is very limited. The only Python 3
 feature that Mercurial developers seem to almost universally get excited
 about is type annotations. We already have some people playing around
 with <code>pytype</code> using comment-based annotations and <code>pytype</code> has already
 caught a few bugs. We're eager to go all in on type annotations and
 uncover lots of dynamic typing bugs and poorly implemented APIs.
 Beyond type annotations, I can't name any feature that people are screaming
 to adopt and which makes a lot of sense for Mercurial. There's a long
 tail of minor features I'm sure will get utilized. But none of the
 marquee features that define major language releases seem that interesting
 to us. Time will tell.</p>
 <h2>Commentary on Python 3</h2>
 <p>Having described Mercurial's ongoing journey to Python 3, I now want to
 focus more on Python itself. Again, the opinions here are my own and
 don't reflect those of the Mercurial Project.</p>
 <p><strong>Succinctly, my experience porting Mercurial and other projects to
 Python 3 has significantly soured my perceptions of Python. As much as
 I have historically loved Python - from the language to the welcoming
 community - I am still struggling to understand how Python could manage
 to inflict so much hardship on the community by choosing the transition
 plan that they did.</strong> I believe Python's choices represent a terrific
 example of what not to do when managing a large project or ecosystem.
 Maintainers of other largely-deployed systems would benefit from taking
 the time to understand and reflect on Python's missteps.</p>
 <p>Python 3.0 was released on December 3, 2008. And it took the better part of
 a decade for the community to embrace it. <strong>This should be universally
 recognized as a failure.</strong> While hindsight is 20/20, many of the issues
 with Python 3 were obvious at the time and could have been mitigated had
 the language maintainers been more accommodating - and dare I say
 empathetic - to its users.</p>
 <p>Initially, Python 3 had a rather cavalier attitude towards backwards and
 forwards compatibility. In the early years of Python 3, the attitude of
 Python's maintainers was <em>Python 3 is a new, better language: you should
 target it explicitly</em>. There were some tools and methods to ease the
 transition. But nothing super polished, especially in the early years.
 Adoption of Python 3 in the overall community was slow. Python developers
 in the wild justifiably complained that the value proposition of Python 3
 was too weak to justify porting effort. Not helping was that the early
 advice for targeting Python 3 was to rewrite the source code to become
 Python 3 native. This is in contrast with using the same source to run on both
 Python 2 and 3. For library and application maintainers, this potentially
 meant maintaining separate versions of your code or forcing end-users to
 make a giant leap, which would realistically orphan users on an old version,
 fragmenting your user base. Neither of those were great alternatives, so
 you can understand why many projects didn't bite.</p>
 <p>For many projects of non-trivial size, flag day transitions from Python 2 to
 3 were simply not viable: the pathway to Python 3 was to make code dual
 Python 2/3 compatible and gradually switch over the runtime to Python 3.
 But initial versions of Python 3 made this effectively impossible! Let me
 give a few specific examples.</p>
 <p>In Python 2, a string literal <code>''</code> is effectively an array of bytes. In
 Python 3, it is a series of Unicode code points - a fundamentally different
 type! In Python 2, you could write <code>b''</code> to be explicit that a string literal
 was bytes or you could write <code>u''</code> to indicate a Unicode literal, mimicking
 Python 3's behavior. In Python 3, you could write <code>b''</code> to create a <code>bytes</code>
 instance. But for whatever reason, Python 3 initially removed the <code>u''</code> syntax,
 meaning there wasn't as easy way to explicitly denote the type of each
 string literal so that it was consistent between Python 2 and 3! Python 3.3
 (released September 2012) restored <code>u''</code> support, making it more viable to
 write Python source code that worked on both Python 2 and 3. <strong>For nearly 4
 years, Python 3 took away the consistent syntax for denoting bytes/Unicode
 string literals.</strong></p>
 <p>Another feature was <code>%</code> formatting of strings. Python 2 allowed use of the
 <code>%</code> formatting operator on both its string types. But Python 3 initially
 removed the implementation of <code>%</code> from <code>bytes</code>. Why, I have no clue. It
 is perfectly reasonable to splice byte sequences into a buffer via use of
 a formatting string. But the Python language maintainers insisted otherwise.
 And it wasn't until the community complained about its absence loudly enough
 that this feature was
 <a href="https://docs.python.org/3/whatsnew/3.5.html#whatsnew-pep-461">restored in Python 3.5</a>,
 which was released in September 2015. Fun fact: the lack of this feature was
 once considered a blocker for Mercurial moving to Python 3 because
 Mercurial uses <code>bytes</code> almost universally, which meant that nearly every use
 of <code>%</code> would have to be changed to something else. And to this day, Python
 3's <code>bytes</code> still doesn't have a <code>format()</code> method, so the alternative was
 effectively string concatenation, which is a massive step backwards from the
 expressiveness of <code>%</code> formatting.</p>
 <p><strong>The initial approach of Python 3 mirrors a folly that many developers
 and projects make: attempting a rewrite instead of performing incremental
 evolution.</strong> For established projects, large scale rewrites often go poorly.
 And Python 3 is no exception. Yes, from a code level, CPython (and likely
 other Python implementations) were incremental changes over Python 2 using
 the same code base. But from a language and standard library level, the
 differences in Python 3 were significant enough that I - and even Python's
 core maintainers - considered it a new language, and therefore a rewrite.
 When your random project attempts a rewrite and fails, the blast radius of that is
 often contained to that project. Maybe you don't publish a new release
 as soon as you otherwise would. <strong>But when you are powering an ecosystem,
 the ripple effects from a failed rewrite percolate throughout that ecosystem
 and last for years and have many second order effects. We see this with
 Python 3, where poor choices made in the late 2000s are inflicting significant
 hardship still in 2020.</strong></p>
 <p>From the initial restrained adoption of Python 3, it is obvious that the
 Python ecosystem overwhelmingly rejected the initial boil the oceans approach
 of Python 3. Python's maintainers eventually got the message and started
 restoring features like <code>u''</code> and <code>bytes</code> <code>%</code> formatting back into the
 language to placate the community. All the while Python 3 had been accumulating
 new features and the cumulative sum of those features was compelling enough
 to win over users.</p>
 <p>For many projects (including Mercurial), Python 3.4/3.5 was the first viable
 porting target for Python 3. Python 3.5 was released in September 2015, almost
 7 years after Python 3.0 was released in December 2008. <strong>Seven. Years.</strong>
 An ecosystem that falters for that long is generally not healthy. What may have
 saved Python from total collapse here is that Python 2 was still going strong and
 people were generally happy with it. I really do think Python dodged a bullet
 here, because there was a massive window where the language could have
 hemorrhaged a critical amount of its user base and been relegated to an
 afterthought. One could draw an analogy to Perl, which lost out to PHP,
 Python, and Ruby, and whose fall from grace aligned with a lengthy
 transition from Perl 5 to 6.</p>
 <p>If you look back at the early history of Python 3, <strong>I think you are forced
 to conclude that Python effectively kneecapped itself for 5-7 years
 through questionable implementation choices that prevented users from
 incurring incremental transitions between the major language versions. 2008
 to 2013-2015 should be known as the <em>lost years of Python</em> because so much
 opportunity and energy was squandered.</strong> Yes, Python is still healthy today
 and Python 3 is (finally) being adopted at scale. But had earlier versions
 of Python 3 been more <em>empathetic</em> towards Python 2 users porting to it,
 Python and Python 3 in 2020 would be even stronger than it is. The community
 was artificially hindered for years. And we won't know until 2023-2025 what
 things could have looked like in 2020 had the Python core language team
 spent more time paving a smoother road between the major language versions.</p>
 <p>To be clear, I do think Python 3 is generally a better language than Python 2.
 It has fewer warts, more compelling features, and better performance (except
 for startup time, which is still slower than Python 2). I am ecstatic the
 community is finally rallying around Python 3! For my Python coding, it has
 reached the point where I curse under my breath when I need to support
 Python 2 or even older versions of Python 3, like 3.5 or 3.6: I just wish
 the world would move on and adopt the future already!</p>
 <p>But I would be remiss if I failed to mention some of my gripes with Python
 3 beyond the transition shenanigans.</p>
 <p>Perhaps my least favorite <em>feature</em> of Python 3 is its insistence that the
 world is Unicode. In Python 2, the default string type was backed by
 bytes. In Python 3, the default string type is backed by Unicode code
 points. As part of that transition, large parts of the standard library
 now operate in the Unicode space instead of the domain of bytes. I understand
 why Python does this: they want <em>strings</em> to be Unicode and don't want
 users to have to spend that much energy thinking about when to use
 <code>str</code> versus <code>bytes</code>. This approach is admirable and somewhat defensible
 because it takes a stand on a solution that is arguably <em>good enough</em> for
 most users. However, <strong>the approach of assuming the world is Unicode is
 flat out wrong and has significant implications for systems level
 applications</strong> (like version control tools).</p>
 <p>There are a myriad of places in Python's standard library where Python
 insists on using the Unicode-backed <code>str</code> type and rejects <code>bytes</code>. For
 example, various networking modules refuse to accept <code>bytes</code> for hostnames
 or URLs. HTTP libraries won't accept <code>bytes</code> for HTTP header names or values.
 Functions that are proxies to POSIX-defined functions won't accept <code>bytes</code>
 even though the POSIX function it calls into is using <code>char *</code> and isn't
 Unicode aware. Then there's filename handling, where Python assumes the
 existence of a global encoding for filenames and uses this encoding to convert
 between <code>str</code> and <code>bytes</code>. And it does this despite POSIX filesystem paths
 being a bag of bytes where the only rules are that <code>\0</code> terminates the
 filename and <code>/</code> is special.</p>
 <p>In cases like Python refusing to accept <code>bytes</code> for things like HTTP
 header names (which will just be spit out over the wire as bytes), Python's
 pendulum has swung too far towards Unicode only. In my opinion, Python needs
 to be more accommodating and allow <code>bytes</code> when it makes sense. I hope the
 pendulum knocks some sense into people when it swings back towards a more
 reasonable solution that better acknowledges the realities of the world we
 live in.</p>
 <p>For areas like filename handling, the world is more complicated. Python
 is effectively an abstraction layer over the operating system APIs exposing
 this functionality. And there is often an impedance mismatch between operating
 systems. For example, POSIX (Linux) tends to use <code>char *</code> for everything
 and doesn't care about encoding and Windows tends to use 16 bit character
 types where the encoding is... a can of worms.</p>
 <p><strong>The reality here is that it is impossible to abstract over differences
 between operating system behavior without compromises that can result in data
 loss, outright wrong behavior, or loss of functionality. But Python 3 attempts
 to do it anyway, making Python 3 unsuitable (or at least highly undesirable) for
 certain systems level applications that rely on it</strong> (like a version control
 tool).</p>
 <p>In fairness to Python, it isn't the only programming language that gets
 this wrong. The only language I've seen <em>properly</em> implement higher-order
 abstractions on top of operating system facilities is Rust, whose approach can
 be generalized as <em>use Python 3's solution of normalizing to Unicode/UTF-8 by
 default</em>, but expose <em>escape hatches</em> which allow access to the raw underlying
 types and APIs used by the operating system for the advanced consumers who
 require it. For example, Rust's <code>Path</code> type which represents a filesystem path
 <a href="https://doc.rust-lang.org/std/path/struct.Path.html#method.as_os_str">allows access</a>
 to the raw <a href="https://doc.rust-lang.org/std/ffi/struct.OsStr.html">OsStr</a> value
 used by the operating system, not a normalization of it to bytes or Unicode,
 which may be lossy. This allows consumers to e.g. create and retrieve
 OS-native filesystem paths without data loss. This functionality is critical
 in some domains. Python 3's awareness/insistence that the world is
 Unicode (which it isn't universally) reduces Python's applicability in these
 domains.</p>
 <p>Speaking of Rust, at the Mercurial developer meetup in October 2019, we were
 discussing the use of Rust in Mercurial and one of the core maintainers blurted
 out something along the lines of <em>if Rust were at its current state 5 years ago,
 Mercurial would have likely ported from Python 2 to Rust instead of Python 3</em>.
 As crazy as it initially sounded, I think I agree with that assessment. With the
 benefit of hindsight, having been a key player in the Python 3 porting effort,
 seeing all the complications and headaches Python 3 is introducing, and
 having learned Rust and witnessed its benefits for performance, control,
 and correctness firsthand, porting to Rust would likely have been the correct
 move for the project at that point in time. 2020 is not 2014, however, and I'm
 not sure if I would opt for a rewrite in Rust today. (Most rewrites are follies
 after all.) But I know one thing: I certainly wouldn't implement a new version
 control tool in Python 3 and I would probably choose Rust as an implementation
 language for most new projects in the systems level space or with an expected
 shelf life of 10+ years. (I really should blog about how awesome Rust is.)</p>
 <p>Back to the topic of Python itself, <strong>I'm really soured on Python at this
 point in time. The effort required to port to Python 3 was staggering. For
 Mercurial, Python 3 introduces a ton of problems and doesn't really solve
 many. We effectively sludged through mud for several years only to wind
 up in a state that feels strictly worse than where we started. I'm sure it will
 be strictly better in a few years. But at that point, we're talking about a
 5+ year transition. To call the Python 3 transition disruptive and
 distracting for the project would be an understatement. As a project maintainer,
 it's natural to ask what we could have accomplished if we weren't forced
 to carry out this sideshow.</strong></p>
 <p>I can't shake the feeling that a lot of the pain afflicted by the Python 3
 transition could have been avoided had Python's language leadership made
 a different set of decisions and more highly prioritized the transition
 experience. (Like not initially removing features like <code>u''</code> and <code>bytes %</code>
 and not introducing gratuitous backwards compatibility breaks, like with
 <code>items()/iteritems()</code>. I would have also liked to see a feature like
 <code>from __future__</code> - maybe <code>from __past__</code> - that would make it easier for
 Python 3 code to target semantics in earlier versions in order to provide
 a more turnkey on-ramp onto new versions.) I simultaneously see Python 3
 losing its position as a justifiable tool in some domains (like systems
 level tooling) due to ongoing design decisions and poor implementation (like
 startup overhead problems). (In contrast, I see Rust excelling where Python
 is faltering and find Rust code surprisingly expressive to write and maintain
 given how low-level it is and therefore feel that Rust is a compelling
 alternative to Python in a surprisingly large number of domains.)</p>
 <p>Look, I know it is easy for me to armchair quarterback and critique with the
 benefit of hindsight/ignorance. I'm sure there is a lot of nuance here. I'm
 sure there was disagreement within the Python community over a lot of these
 issues. Maintaining a large and successful programming language and community
 like Python's is hard and you aren't going to please all the people all the
 time. And speaking as a maintainer, I have mad respect for the people leading
 such a large community. But niceties aside, everyone knows the Python 3
 transition was rough and could have gone better. It should not have taken 11
 years to get to where we are today.</p>
 <p><strong>I'd like to encourage the Python Project to conduct a thorough postmortem on
 the transition to Python 3.</strong> Identify what went well, what could have gone
 better, and what should be done next time such a large language change is wanted.
 Speaking as a Python user, a maintainer of a Python project, and as someone in
 industry who is now skeptical about use of Python at work due to risks of
 potentially company crippling high-effort migrations in the future, a postmortem
 would help restore my confidence that Python's maintainers learned from the
 various missteps on the road to Python 3 and these potentially ecosystem
 crippling mistakes won't be made again.</p>
 <p>Python had a wildly successful past few decades. And it can continue to
 thrive for several more. But the Python 3 migration was painful for all
 involved. And as much as we need to move on and leave Python 2 behind us,
 there are some important lessons to be learned. I hope the Python community
 takes the opportunity to reflect and am confident it will grow stronger by
 taking the time to do so.</p>
--- a/cache/2020/index.html
+++ b/cache/2020/index.html
@@ -29,6 +29,8 @@
      
        <li><a href="/david/cache/2020/17aa5580eb34f39f214e4a72458c535e/" title="Accès à l'article caché">Thinking about the past, present, and future of web development</a> (<a href="https://www.baldurbjarnason.com/past-present-future-web/" title="Accès à l'article original">original</a>)</li>
      
        <li><a href="/david/cache/2020/67c8c54b07137bcfc0069fccd8261b53/" title="Accès à l'article caché">Mercurial's Journey to and Reflections on Python 3</a> (<a href="https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journey-to-and-reflections-on-python-3/" title="Accès à l'article original">original</a>)</li>
      
        <li><a href="/david/cache/2020/82e58e715a4ddb17b2f9e2a023005b1a/" title="Accès à l'article caché">Wordsmiths | Getting Real</a> (<a href="https://basecamp.com/gettingreal/08.6-wordsmiths" title="Accès à l'article original">original</a>)</li>
      
        <li><a href="/david/cache/2020/c1c53ee2ef8544ad798629bf8a3b7249/" title="Accès à l'article caché">Thinking about Climate on a Dark, Dismal Morning</a> (<a href="https://blogs.scientificamerican.com/hot-planet/thinking-about-climate-on-a-dark-dismal-morning/" title="Accès à l'article original">original</a>)</li>