<!doctype html><!-- This is a valid HTML5 document. --> | |||||
<!-- Screen readers, SEO, extensions and so on. --> | |||||
<html lang="fr"> | |||||
<!-- Has to be within the first 1024 bytes, hence before the <title> | |||||
See: https://www.w3.org/TR/2012/CR-html5-20121217/document-metadata.html#charset --> | |||||
<meta charset="utf-8"> | |||||
<!-- Why no `X-UA-Compatible` meta: https://stackoverflow.com/a/6771584 --> | |||||
<!-- The viewport meta is quite crowded and we are responsible for that. | |||||
See: https://codepen.io/tigt/post/meta-viewport-for-2015 --> | |||||
<meta name="viewport" content="width=device-width,minimum-scale=1,initial-scale=1,shrink-to-fit=no"> | |||||
<!-- Required to make a valid HTML5 document. --> | |||||
<title>Mercurial's Journey to and Reflections on Python 3 (archive) — David Larlet</title> | |||||
<!-- Lightest blank gif, avoids an extra query to the server. --> | |||||
<link rel="icon" href="data:;base64,iVBORw0KGgo="> | |||||
<!-- Thank you Florens! --> | |||||
<link rel="stylesheet" href="/static/david/css/style_2020-01-09.css"> | |||||
<!-- See https://www.zachleat.com/web/comprehensive-webfonts/ for the trade-off. --> | |||||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_regular.woff2" as="font" type="font/woff2" crossorigin> | |||||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_bold.woff2" as="font" type="font/woff2" crossorigin> | |||||
<link rel="preload" href="/static/david/css/fonts/triplicate_t4_poly_italic.woff2" as="font" type="font/woff2" crossorigin> | |||||
<meta name="robots" content="noindex, nofollow"> | |||||
<meta content="origin-when-cross-origin" name="referrer"> | |||||
<!-- Canonical URL for SEO purposes --> | |||||
<link rel="canonical" href="https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journey-to-and-reflections-on-python-3/"> | |||||
<body class="remarkdown h1-underline h2-underline hr-center ul-star pre-tick"> | |||||
<article> | |||||
<h1>Mercurial's Journey to and Reflections on Python 3</h1> | |||||
<h2><a href="https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journey-to-and-reflections-on-python-3/">Source originale du contenu</a></h2> | |||||
<p>Mercurial 5.2 was released on November 5, 2019. It is the first version | |||||
of Mercurial that supports Python 3. This milestone comes nearly 11 years | |||||
after Python 3.0 was first released on December 3, 2008.</p> | |||||
<p>Speaking as a maintainer of Mercurial and an avid user of Python, I | |||||
feel like the experience of making Mercurial work with Python 3 is | |||||
worth sharing because there are a number of lessons to be learned.</p> | |||||
<p>This post is logically divided into two sections: a mostly factual recount | |||||
of Mercurial's Python 3 porting effort and a more opinionated commentary | |||||
of the transition to Python 3 and the Python language ecosystem as a whole. | |||||
Those who don't care about the mechanics of porting a large Python project | |||||
to Python 3 may want to skip the next section or two.</p> | |||||
<h2>Porting Mercurial to Python 3</h2> | |||||
<p>Let's start with a brief history lesson of Mercurial's support for | |||||
Python 3 as told by its own commit history.</p> | |||||
<p>The Mercurial version control tool was first released in April 2005 | |||||
(the same month that Git was initially released). Version 1.0 came out | |||||
in March 2008. The first reference to Python 3 I found in the code base | |||||
was in <a href="https://www.mercurial-scm.org/repo/hg/rev/8fee8ff13d37">September 2008</a>. | |||||
Then not much happens for a while until | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/4494fb02d549">June 2010</a>, when | |||||
someone authors a bunch of changes to make the Python C extensions | |||||
start to recognize Python 3. Then things were again quiet for a while | |||||
until <a href="https://www.mercurial-scm.org/repo/hg/rev/56ef99fbd6f2">January 2013</a>, | |||||
when a handful of changes landed to remove 2 argument <code>raise</code>. There were | |||||
a handful of commits in 2014 but nothing worth calling out.</p> | |||||
<p>Mercurial's meaningful journey to Python 3 started in 2015. In code, | |||||
the work started in | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/af6e6a0781d7">April 2015</a>, with | |||||
effort to make Mercurial's test harness run with Python 3. Part of | |||||
this was a <a href="https://www.mercurial-scm.org/repo/hg/rev/fefc72523491">decision</a> | |||||
that Python 3.5 (to be released several months later in September 2015) | |||||
would be the minimum Python 3 version that Mercurial would support.</p> | |||||
<p>Once the Mercurial Project decided it wanted to port to Python 3 (as opposed | |||||
to another language), one of the earliest decisions was how to perform that | |||||
port. <strong>Mercurial's code base was too large to attempt a flag day conversion</strong> | |||||
where there would be a Python 2 version and a Python 3 version and one day | |||||
everyone would switch from Python 2 to 3. <strong>Mercurial needed a way to run the | |||||
same code (or as much of the same code) on both Python 2 and 3.</strong> We would | |||||
maintain a single code base and users would gradually switch from running with | |||||
Python 2 to Python 3.</p> | |||||
<p>In <a href="https://www.mercurial-scm.org/repo/hg/rev/e1fb276d4619">May 2015</a>, | |||||
Mercurial dropped support for Python 2.4 and 2.5. Dropping support for | |||||
these older Python versions was critical, as it was effectively impossible to | |||||
write Python code that ran on this wide gamut of versions because of | |||||
incompatibilities in syntax and language features. For example, you needed | |||||
Python 2.6 to get <code>print()</code> via <code>from __future__ import print_function</code>. | |||||
The project's late start at a Python 3 port can be significantly attributed | |||||
to Python 2.4 and 2.5 compatibility holding us back.</p> | |||||
<p>The main goal with Mercurial's early porting work was just getting the code base | |||||
to a point where <code>import mercurial</code> would work. There were a myriad of places | |||||
where Mercurial used syntax that was invalid on Python 3 and Python 3 | |||||
couldn't even parse the source code, let alone compile it to bytecode and | |||||
execute it.</p> | |||||
<p>This effort began in earnest in | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/e93036747902">June 2015</a> | |||||
with global source code rewrites like using modern octal syntax, | |||||
modern exception catching syntax (<code>except Exception as e</code> instead of | |||||
<code>except Exception, e</code>), <code>print()</code> instead of <code>print</code>, and a | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/1a6a117d0b95">modern import convention</a> | |||||
along with the use of <code>from __future__ import absolute_import</code>.</p> | |||||
<p>In the early days of the port, our first goal was to get all source code | |||||
parsing as valid Python 3. The next step was to get all the modules <code>import</code>ing | |||||
cleanly. This entailed fixing code that ran at <code>import</code> time to work on | |||||
Python 3. Our thinking was that we would need the code base to be <code>import</code> | |||||
clean on Python 3 before seriously thinking about run-time behavior. In reality, | |||||
we quickly ported a lot of modules to <code>import</code> cleanly and then moved on | |||||
to higher-level porting, leaving a long-tail of modules with <code>import</code> failures.</p> | |||||
<p>This initial porting effort played out over months. There weren't many | |||||
people working on it in the early days: a few people would basically hack on | |||||
Python 3 as a form of itch scratching and most of the project's energy was | |||||
focused on improving the existing Python 2 based product. You can get a rough | |||||
idea of the timeline and participation in the early porting effort through the | |||||
<a href="https://www.mercurial-scm.org/repo/hg/log/081a77df7bc6/tests/test-check-py3-compat.t?revcount=960">history of test-check-py3-compat.t</a>. | |||||
We see the test being added in <a href="https://www.mercurial-scm.org/repo/hg/rev/40eb385f798f">December 2015</a>, | |||||
By June 2016, most of the code base was ported to our modern import convention | |||||
and we were ready to move on to more meaningful porting.</p> | |||||
<p>One of the biggest early hurdles in our porting effort was how to overcome | |||||
the string literals type mismatch between Python 2 and 3. In Python 2, a | |||||
<code>''</code> string literal is a sequence of bytes. In Python 3, a <code>''</code> string literal | |||||
is a sequence of Unicode code points. These are fundamentally different types. | |||||
And in Mercurial's code base, <strong>most of our <em>string</em> types are binary by design: | |||||
use of a Unicode based <code>str</code> for representing data is flat out wrong for our use | |||||
case</strong>. We knew that Mercurial would need to eventually switch many string | |||||
literals from <code>''</code> to <code>b''</code> to preserve type compatibility. But doing so would | |||||
be problematic.</p> | |||||
<p>In the early days of Mercurial's Python 3 port in 2015, Mercurial's project | |||||
maintainer (Matt Mackall) set a ground rule that the Python 3 port shouldn't overly | |||||
disrupt others: he wanted the Python 3 port to more or less happen in the background | |||||
and not require every developer to be aware of Python 3's low-level behavior in order | |||||
to get work done on the existing Python 2 code base. This may seem like a questionable | |||||
decision (and I probably disagreed with him to some extent at the time because I was | |||||
doing Python 3 porting work and the decision constrained this work). But it was the | |||||
correct decision. Matt knew that it would be years before the Python 3 port was either | |||||
necessary or resulted in a meaningful return on investment (the value proposition of | |||||
Python 3 has always been weak to Mercurial because Python 3 doesn't demonstrate a | |||||
compelling advantage over Python 2 for our use case). What Matt was trying to do was | |||||
minimize the externalized costs that a Python 3 port would inflict on the project. | |||||
He correctly recognized that maintaining the existing product and supporting | |||||
existing users was more important than a long-term bet in its infancy.</p> | |||||
<p>This ground rule meant that a mass insertion of <code>b''</code> prefixes everywhere | |||||
was not desirable, as that would require developers to think about whether | |||||
a type was a <code>bytes</code> or <code>str</code>, a distinction they didn't have to worry about | |||||
on Python 2 because we practically never used the Unicode-based string type in | |||||
Mercurial.</p> | |||||
<p>In addition, there were some other practical issues with doing a bulk <code>b''</code> | |||||
prefix insertion. One was that the added <code>b</code> characters would cause a lot of lines | |||||
to grow beyond our length limits and we'd have to reformat code. That would | |||||
require manual intervention and would significantly slow down porting. And | |||||
a sub-issue of adding all the <code>b</code> prefixes and reformatting code is that it would | |||||
<em>break</em> annotate/blame more than was tolerable. The latter issue was addressed | |||||
by teaching Mercurial's annotate/blame feature to <em>skip</em> revisions. The project | |||||
now has a convention of annotating commit messages with <code># skip-blame <reason></code> | |||||
so structural only changes can easily be ignored when performing an | |||||
annotate/blame.</p> | |||||
<p>A stop-gap solution to the <code>b''</code> everywhere issue came in | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/1c22400db72d">July 2016</a>, when I | |||||
introduced a custom Python module importer that rewrote source code as part | |||||
of <code>import</code> when running on Python 3. (I have | |||||
<a href="/blog/2017/03/13/from-__past__-import-bytes_literals/">previously blogged</a> | |||||
about this hack.) What this did was transparently add <code>b''</code> prefixes to all | |||||
un-prefixed string literals as well as modify how a few common functions were | |||||
called so that we wouldn't need to modify source code so things would run natively | |||||
on Python 3. The source transformer allowed us to have the benefits of progressing | |||||
in our Python 3 port without having to rewrite tens of thousands of lines of | |||||
source code. The solution was hacky. But it enabled us to make significant | |||||
progress on the Python 3 port without externalizing a lot of cost onto others.</p> | |||||
<p>I thought the source transformer would be relatively short-lived and would be | |||||
removed shortly after the project inevitably decided to go all in on Python 3. | |||||
To my surprise, others built additional transforms over the years and the source | |||||
transformer persisted all the way until | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/d783f945a701">October 2019</a>, when | |||||
I removed it just before the first non-alpha Python 3 compatible version | |||||
of Mercurial was released.</p> | |||||
<p>A common problem Mercurial faced with making the code base dual Python 2/3 native | |||||
was dealing with standard library differences. Most of the problems stemmed | |||||
from changes between Python 2.7 and 3.5+. But there are changes within the | |||||
versions of Python 3 that we had to wallpaper over as well. In | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/6041fb8f2da8">April 2016</a>, the | |||||
<code>mercurial.pycompat</code> module was introduced to export aliases or wrappers around | |||||
standard library functionality to abstract the differences between Python | |||||
versions. This file <a href="https://www.mercurial-scm.org/repo/hg/log/66af68d4c751/mercurial/pycompat.py?revcount=240">grew over time</a> | |||||
and <a href="https://www.mercurial-scm.org/repo/hg/file/66af68d4c751/mercurial/pycompat.py">eventually became</a> | |||||
Mercurial's version of <a href="https://six.readthedocs.io/">six</a>. To be honest, I'm | |||||
not sure if we should have used <code>six</code> from the beginning. <code>six</code> probably would | |||||
have saved some work. But we had to eventually write a lot of shims for | |||||
converting between <code>str</code> and <code>bytes</code> and would have needed to invent a | |||||
<code>pycompat</code> layer in some form anyway. So I'm not sure <code>six</code> would have saved | |||||
enough effort to justify the baggage of integrating a 3rd party package into | |||||
Mercurial. (When Mercurial accepts a 3rd party package, downstream packagers | |||||
like Debian get all hot and bothered and end up making questionable patches | |||||
to our source code. So we prefer to minimize the surface area for | |||||
problems by minimizing dependencies on 3rd party packages.)</p> | |||||
<p>Once we had a source transforming module importer and the <code>pycompat</code> | |||||
compatibility shim, we started to focus in earnest on making core | |||||
functionality actually work on Python 3. We established a convention of | |||||
annotating changesets needed for Python 3 with <code>py3</code>, so a | |||||
<a href="https://www.mercurial-scm.org/repo/hg/log?rev=desc(py3)&revcount=4000">commit message search</a> | |||||
yields a lot of the history. (But it isn't a full history since not every Python 3 | |||||
oriented change used this convention). We see from that history that after | |||||
the source importer landed, a lot of porting effort was spent on things | |||||
very early in the <code>hg</code> process lifetime. This included handling environment | |||||
variables, loading config files, and argument parsing. We introduced a | |||||
<a href="https://www.mercurial-scm.org/repo/hg/log/@/tests/test-check-py3-commands.t">test-check-py3-commands.t</a> | |||||
test to track the progress of <code>hg</code> commands working in Python 3. The very early | |||||
history of that file shows the various error messages changing, as underlying | |||||
early process functionality was slowly ported to work on Python 3. By | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/2d555d753f0e">December 2016</a>, we | |||||
had <code>hg version</code> working on Python 3!</p> | |||||
<p>With basic <code>hg</code> command dispatch ported to Python 3 at the end of 2016, | |||||
2017 represented an inflection point in the Python 3 porting effort. With the | |||||
early process functionality working, different people could pick up different | |||||
commands and code paths and start making code work with Python 3. By | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/52ee1b5ac277">March 2017</a>, basic | |||||
repository opening and <code>hg files</code> worked. Shortly thereafter, | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/ed23f929af38">hg init started working as well</a>. | |||||
And <a href="https://www.mercurial-scm.org/repo/hg/rev/935a1b1117c7">hg status</a> and | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/aea8ec3f7dd1">hg commit</a> did as well.</p> | |||||
<p>Within a few months, enough of Mercurial's functionality was working with Python | |||||
3 that we started to <a href="https://www.mercurial-scm.org/repo/hg/rev/7a877e569ed6">track which tests passed on Python 3</a>. | |||||
The <a href="https://www.mercurial-scm.org/repo/hg/log/@/contrib/python3-whitelist?revcount=480">evolution of this file</a> | |||||
shows a reasonable history of the porting velocity.</p> | |||||
<p>In <a href="https://www.mercurial-scm.org/repo/hg/rev/feb910d2f59b">May 2017</a>, we dropped | |||||
support for Python 2.6. This significantly reduced the complexity of supporting | |||||
Python 3, as there was tons of functionality in Python 2.7 that made it easier | |||||
to target both Python 2 and 3 and now our hands were untied to utilize it.</p> | |||||
<p>In <a href="https://www.mercurial-scm.org/repo/hg/rev/bd8875b6473c">November 2017</a>, I | |||||
landed a test harness feature to report exceptions seen during test runs. I | |||||
later <a href="https://www.mercurial-scm.org/repo/hg/rev/8de90e006c78">refined the output</a> | |||||
so the most frequent failures were reported more prominently. This feature | |||||
greatly enabled our ability to target the most common exceptions, allowing | |||||
us to write patches to fix the most prevalent issues on Python 3 and uncover | |||||
previously unknown failures.</p> | |||||
<p>By the end of 2017, we had most of the structural pieces in place to complete | |||||
the port. Essentially all that was required at that point was time and labor. | |||||
We didn't have a formal mechanism in place to target porting efforts. Instead, | |||||
people would pick up a component or test that they wanted to hack on and then | |||||
make incremental changes towards making that work. All the while, we didn't | |||||
have a strict policy on not regressing Python 3 and regressions in Python 3 | |||||
porting progress were semi-frequent. Although we did tend to correct | |||||
regressions quickly. And over time, developers saw a flurry of Python 3 | |||||
patches and slowly grew awareness of how to accommodate Python 3, and the | |||||
number of Python 3 regressions became less frequent.</p> | |||||
<p>As useful as the source-transforming module importer was, it incurred some | |||||
additional burden for the porting effort. The source transformer effectively | |||||
converted all un-prefixed string literals (<code>''</code>) to bytes literals (<code>b''</code>) | |||||
to preserve string type behavior with Python 2. But various aspects of Python | |||||
3 didn't like the existence of <code>bytes</code>. Various standard library functionality | |||||
now wanted unicode <code>str</code> and didn't accept <code>bytes</code>, even though the Python | |||||
2 implementation used the equivalent of <code>bytes</code>. So our <code>pycompat</code> layer | |||||
grew pretty large to accommodate calling into various standard library | |||||
functionality. Another side-effect which we didn't initially anticipate | |||||
was the <code>**kwargs</code> calling convention. Python allows you to use <code>**</code> | |||||
with a dict with string keys to turn those keys into named arguments | |||||
in a function call. But Python 3 requires these <code>dict</code> keys to be | |||||
<code>str</code> and outright rejects <code>bytes</code> keys, even if the <code>bytes</code> instance | |||||
is ASCII safe and has the same underlying byte representation of the | |||||
string data as the <code>str</code> instance would. So we had to invent support | |||||
functions that would convert <code>dict</code> keys from <code>bytes</code> to <code>str</code> for | |||||
use with <code>**kwargs</code> and another to convert a <code>**kwargs</code> dict from | |||||
<code>str</code> keys to <code>bytes</code> keys so we could use <code>''</code> syntax to access keys | |||||
in our source code! Also on the string type front, we had to sprinkle | |||||
the codebase with raw string literals (<code>r''</code>) to force the use of | |||||
<code>str</code> irregardless of which Python version you were running on (our | |||||
source transformer only changed unprefixed string literals, so existing | |||||
<code>r''</code> strings would be preserved as <code>str</code>).</p> | |||||
<p>Blind transformation of all string literals to <code>bytes</code> was less than ideal | |||||
and it did impose some unwanted side-effects. But, again, most <em>strings</em> | |||||
in Mercurial are bytes by design, so we thought it would be easier to | |||||
<em>byteify</em> all strings then selectively undo that where native strings | |||||
were actually warranted (like keys in most <code>dict</code>s) than to take the | |||||
up-front cost to examine every string and make an intelligent determination | |||||
as to what type it should be. I go back and forth as to whether this was the | |||||
correct call. But when you factor in that the source transforming | |||||
module importer unblocked Python 3 porting at a time in the project's | |||||
history when there was so much focus on improving the core product and it | |||||
did so without externalizing many costs onto the people doing the critical | |||||
core product work, I think it was the right call.</p> | |||||
<p>By mid 2019, the number of test failures in Python 3 had been whittled | |||||
down to a reasonable, less daunting number. It felt like victory was | |||||
in grasp and inevitable. But a few significant issues lingered.</p> | |||||
<p>One remaining question was around addressing differences between Python | |||||
3 versions. At the time, Python 3.5, 3.6, and 3.7 were released and 3.8 | |||||
was scheduled for release by the end of the year. We had a surprising | |||||
number of issues with differences in Python 3 versions. Many of us | |||||
were running Python 3.7, so it had the fewest failures. We had to spend | |||||
extra effort to get Python 3.5 and 3.6 working as well as 3.7. Same for | |||||
3.8.</p> | |||||
<p>Another task we deferred until the second half of 2019 was standing up | |||||
robust CI for Python 3. We had some coverage, but it was minimal. Wanting | |||||
a distraction from PyOxidizer for a bit and wanting to overhaul Mercurial's | |||||
CI system (which is officially built on Buildbot), I cobbled together a | |||||
<em>serverless</em> CI system built on top of AWS DynamoDB and S3 for storage, | |||||
Lambda functions and CloudWatch events for all business logic, and EC2 spot | |||||
instances for job execution. This CI system executed Python 3.5, 3.6, 3.7, | |||||
and 3.8 variants of our test harness on Linux and Python 3.7 on Windows. | |||||
This gave developers insight into version-specific failures. More | |||||
importantly, it also gave insight into Windows failures, which was | |||||
previously not well tested. It was discovered that Python 3 on Windows was | |||||
lagging significantly behind POSIX.</p> | |||||
<p>By the time of the Mercurial developer meetup in October 2019, nearly | |||||
all tests were passing on POSIX platforms and we were confident that | |||||
we could declare Python 3 support as at least beta quality for the | |||||
Mercurial 5.2 release, planned for early November.</p> | |||||
<p>One of our blockers for ripping off the alpha label on Python 3 support | |||||
was removing our source-transforming module importer. It had performance | |||||
implications and it wasn't something we wanted to ship because it felt | |||||
too hacky. A blocker for this was we wanted to automatically format | |||||
our source tree with <a href="https://black.readthedocs.io/en/stable/">black</a> | |||||
because if we removed the source transformer, we'd have to rewrite | |||||
a lot of source code to apply changes the transformer was performing, | |||||
which would necessitate wrapping a lot of lines, which would involve a lot | |||||
of manual effort. We wanted to <em>blacken</em> our code base first so that | |||||
mass rewriting source code wouldn't involve a lot of tedious reformatting | |||||
since <code>black</code> would handle that for us automatically. And rewriting the | |||||
source tree with <code>black</code> was blocked on a specific feature landing in | |||||
<code>black</code>! (We did not agree with <code>black</code>'s behavior of | |||||
unwrapping comma-delimited lists of items if they could fit on a single | |||||
line. So one of our core contributors wrote a patch to <code>black</code> that | |||||
changed its behavior so a trailing <code>,</code> in a list of items will force | |||||
items to be formatted on multiple lines. I personally find the multiple line | |||||
formatting much easier to read. And the behavior is arguably better for | |||||
code review and <em>annotation</em>, which is line based.) Once this feature | |||||
landed in <code>black</code>, we reformatted our source tree and started ripping | |||||
out the source transformations, starting by inserting <code>b''</code> literals | |||||
everywhere. By late October, the source transformer was no more and | |||||
we were ready to release beta quality support for Python 3 (at least | |||||
on UNIX-like platforms).</p> | |||||
<p>Having described a mostly factual overview of Mercurial's port to Python | |||||
3, it is now time to shift gears to the speculative and opinionated | |||||
parts of this post. <strong>I want to underscore that the opinions reflected | |||||
here are my own and do not reflect the overall Mercurial Project or even | |||||
a consensus within it.</strong></p> | |||||
<h2>The Future of Python 3 and Mercurial</h2> | |||||
<p>Mercurial's port to Python 3 is still ongoing. While we've shipped | |||||
Python 3 support and the test harness is clean on Python 3, I view shipping | |||||
as only a milestone - arguably <em>the</em> most important one - in a longer | |||||
journey. There's still a lot of work to do.</p> | |||||
<p>It is now 2020 and Python 2 support is now officially dead from the | |||||
perspective of the Python language maintainers. Linux distributions are | |||||
starting to rip out Python 2. Packages are dropping Python 2 support in | |||||
new versions. The world is moving to Python 3 only. But <strong>Mercurial still | |||||
officially supports Python 2</strong>. And it is still yet to be determined how | |||||
long we will retain support for Python 2 in the code base. We've only had | |||||
one release supporting Python 3. Our users still need to port their | |||||
extensions (implemented in Python). Our users still need to start widely | |||||
using Mercurial with Python 3. Even our own developers need to switch to | |||||
Python 3 (old habits are hard to break).</p> | |||||
<p>I anticipate a long tail of random bugs in Mercurial on Python 3. While | |||||
the tests may pass, our code coverage is not 100%. And even if it were, | |||||
Python is a dynamic language and there are tons of invariants that aren't | |||||
caught at compile time and can only be discovered at run time. <strong>These | |||||
invariants cannot all be detected by tests, no matter how good your test | |||||
coverage is.</strong> This is a <em>feature</em>/<em>limitation</em> of dynamic languages. Our | |||||
users will likely be finding a long tail of miscellaneous bugs on Python | |||||
3 for <em>years</em>.</p> | |||||
<p>At present, our code base is littered with tons of random hacks to bridge | |||||
the gap between Python 2 and 3. Once Python 2 support is dropped, we'll | |||||
need to remove these hacks and make the source tree Python 3 native, with | |||||
minimal shims to wallpaper over differences in Python 3 versions. <strong>Removing | |||||
this Python version bridge code will likely require hundreds of commits and | |||||
will be a non-trivial effort.</strong> It's likely to be deemed a low priority (it | |||||
is glorified busy work after all), and code for the express purpose of | |||||
supporting Python 2 will likely linger for years.</p> | |||||
<p>We are also still shoring up our packaging and distribution story on | |||||
Python 3. This is easier on some platforms than others. I created | |||||
<a href="https://github.com/indygreg/PyOxidizer">PyOxidizer</a> partially because | |||||
of the poor experience I had with Python application packaging and | |||||
distribution through the Mercurial Project. The Mercurial Project has | |||||
already signed off on using PyOxidizer for distributing Mercurial in | |||||
the future. So look for an <em>oxidized</em> Mercurial distribution in the | |||||
near future! (You could argue PyOxidizer is an epic yak shave to better | |||||
support Mercurial. But that's for another post.)</p> | |||||
<p>Then there's Windows support. A Python 3 powered Mercurial on Windows | |||||
still has a handful of known issues. It may require a few more releases | |||||
before we consider Python 3 on Windows to be stable.</p> | |||||
<p>Because we're still on a code base that must support Python 2, our | |||||
adoption of Python 3 features is very limited. The only Python 3 | |||||
feature that Mercurial developers seem to almost universally get excited | |||||
about is type annotations. We already have some people playing around | |||||
with <code>pytype</code> using comment-based annotations and <code>pytype</code> has already | |||||
caught a few bugs. We're eager to go all in on type annotations and | |||||
uncover lots of dynamic typing bugs and poorly implemented APIs. | |||||
Beyond type annotations, I can't name any feature that people are screaming | |||||
to adopt and which makes a lot of sense for Mercurial. There's a long | |||||
tail of minor features I'm sure will get utilized. But none of the | |||||
marquee features that define major language releases seem that interesting | |||||
to us. Time will tell.</p> | |||||
<h2>Commentary on Python 3</h2> | |||||
<p>Having described Mercurial's ongoing journey to Python 3, I now want to | |||||
focus more on Python itself. Again, the opinions here are my own and | |||||
don't reflect those of the Mercurial Project.</p> | |||||
<p><strong>Succinctly, my experience porting Mercurial and other projects to | |||||
Python 3 has significantly soured my perceptions of Python. As much as | |||||
I have historically loved Python - from the language to the welcoming | |||||
community - I am still struggling to understand how Python could manage | |||||
to inflict so much hardship on the community by choosing the transition | |||||
plan that they did.</strong> I believe Python's choices represent a terrific | |||||
example of what not to do when managing a large project or ecosystem. | |||||
Maintainers of other largely-deployed systems would benefit from taking | |||||
the time to understand and reflect on Python's missteps.</p> | |||||
<p>Python 3.0 was released on December 3, 2008. And it took the better part of | |||||
a decade for the community to embrace it. <strong>This should be universally | |||||
recognized as a failure.</strong> While hindsight is 20/20, many of the issues | |||||
with Python 3 were obvious at the time and could have been mitigated had | |||||
the language maintainers been more accommodating - and dare I say | |||||
empathetic - to its users.</p> | |||||
<p>Initially, Python 3 had a rather cavalier attitude towards backwards and | |||||
forwards compatibility. In the early years of Python 3, the attitude of | |||||
Python's maintainers was <em>Python 3 is a new, better language: you should | |||||
target it explicitly</em>. There were some tools and methods to ease the | |||||
transition. But nothing super polished, especially in the early years. | |||||
Adoption of Python 3 in the overall community was slow. Python developers | |||||
in the wild justifiably complained that the value proposition of Python 3 | |||||
was too weak to justify porting effort. Not helping was that the early | |||||
advice for targeting Python 3 was to rewrite the source code to become | |||||
Python 3 native. This is in contrast with using the same source to run on both | |||||
Python 2 and 3. For library and application maintainers, this potentially | |||||
meant maintaining separate versions of your code or forcing end-users to | |||||
make a giant leap, which would realistically orphan users on an old version, | |||||
fragmenting your user base. Neither of those were great alternatives, so | |||||
you can understand why many projects didn't bite.</p> | |||||
<p>For many projects of non-trivial size, flag day transitions from Python 2 to | |||||
3 were simply not viable: the pathway to Python 3 was to make code dual | |||||
Python 2/3 compatible and gradually switch over the runtime to Python 3. | |||||
But initial versions of Python 3 made this effectively impossible! Let me | |||||
give a few specific examples.</p> | |||||
<p>In Python 2, a string literal <code>''</code> is effectively an array of bytes. In | |||||
Python 3, it is a series of Unicode code points - a fundamentally different | |||||
type! In Python 2, you could write <code>b''</code> to be explicit that a string literal | |||||
was bytes or you could write <code>u''</code> to indicate a Unicode literal, mimicking | |||||
Python 3's behavior. In Python 3, you could write <code>b''</code> to create a <code>bytes</code> | |||||
instance. But for whatever reason, Python 3 initially removed the <code>u''</code> syntax, | |||||
meaning there wasn't as easy way to explicitly denote the type of each | |||||
string literal so that it was consistent between Python 2 and 3! Python 3.3 | |||||
(released September 2012) restored <code>u''</code> support, making it more viable to | |||||
write Python source code that worked on both Python 2 and 3. <strong>For nearly 4 | |||||
years, Python 3 took away the consistent syntax for denoting bytes/Unicode | |||||
string literals.</strong></p> | |||||
<p>Another feature was <code>%</code> formatting of strings. Python 2 allowed use of the | |||||
<code>%</code> formatting operator on both its string types. But Python 3 initially | |||||
removed the implementation of <code>%</code> from <code>bytes</code>. Why, I have no clue. It | |||||
is perfectly reasonable to splice byte sequences into a buffer via use of | |||||
a formatting string. But the Python language maintainers insisted otherwise. | |||||
And it wasn't until the community complained about its absence loudly enough | |||||
that this feature was | |||||
<a href="https://docs.python.org/3/whatsnew/3.5.html#whatsnew-pep-461">restored in Python 3.5</a>, | |||||
which was released in September 2015. Fun fact: the lack of this feature was | |||||
once considered a blocker for Mercurial moving to Python 3 because | |||||
Mercurial uses <code>bytes</code> almost universally, which meant that nearly every use | |||||
of <code>%</code> would have to be changed to something else. And to this day, Python | |||||
3's <code>bytes</code> still doesn't have a <code>format()</code> method, so the alternative was | |||||
effectively string concatenation, which is a massive step backwards from the | |||||
expressiveness of <code>%</code> formatting.</p> | |||||
<p><strong>The initial approach of Python 3 mirrors a folly that many developers | |||||
and projects make: attempting a rewrite instead of performing incremental | |||||
evolution.</strong> For established projects, large scale rewrites often go poorly. | |||||
And Python 3 is no exception. Yes, from a code level, CPython (and likely | |||||
other Python implementations) were incremental changes over Python 2 using | |||||
the same code base. But from a language and standard library level, the | |||||
differences in Python 3 were significant enough that I - and even Python's | |||||
core maintainers - considered it a new language, and therefore a rewrite. | |||||
When your random project attempts a rewrite and fails, the blast radius of that is | |||||
often contained to that project. Maybe you don't publish a new release | |||||
as soon as you otherwise would. <strong>But when you are powering an ecosystem, | |||||
the ripple effects from a failed rewrite percolate throughout that ecosystem | |||||
and last for years and have many second order effects. We see this with | |||||
Python 3, where poor choices made in the late 2000s are inflicting significant | |||||
hardship still in 2020.</strong></p> | |||||
<p>From the initial restrained adoption of Python 3, it is obvious that the | |||||
Python ecosystem overwhelmingly rejected the initial boil the oceans approach | |||||
of Python 3. Python's maintainers eventually got the message and started | |||||
restoring features like <code>u''</code> and <code>bytes</code> <code>%</code> formatting back into the | |||||
language to placate the community. All the while Python 3 had been accumulating | |||||
new features and the cumulative sum of those features was compelling enough | |||||
to win over users.</p> | |||||
<p>For many projects (including Mercurial), Python 3.4/3.5 was the first viable | |||||
porting target for Python 3. Python 3.5 was released in September 2015, almost | |||||
7 years after Python 3.0 was released in December 2008. <strong>Seven. Years.</strong> | |||||
An ecosystem that falters for that long is generally not healthy. What may have | |||||
saved Python from total collapse here is that Python 2 was still going strong and | |||||
people were generally happy with it. I really do think Python dodged a bullet | |||||
here, because there was a massive window where the language could have | |||||
hemorrhaged a critical amount of its user base and been relegated to an | |||||
afterthought. One could draw an analogy to Perl, which lost out to PHP, | |||||
Python, and Ruby, and whose fall from grace aligned with a lengthy | |||||
transition from Perl 5 to 6.</p> | |||||
<p>If you look back at the early history of Python 3, <strong>I think you are forced | |||||
to conclude that Python effectively kneecapped itself for 5-7 years | |||||
through questionable implementation choices that prevented users from | |||||
incurring incremental transitions between the major language versions. 2008 | |||||
to 2013-2015 should be known as the <em>lost years of Python</em> because so much | |||||
opportunity and energy was squandered.</strong> Yes, Python is still healthy today | |||||
and Python 3 is (finally) being adopted at scale. But had earlier versions | |||||
of Python 3 been more <em>empathetic</em> towards Python 2 users porting to it, | |||||
Python and Python 3 in 2020 would be even stronger than it is. The community | |||||
was artificially hindered for years. And we won't know until 2023-2025 what | |||||
things could have looked like in 2020 had the Python core language team | |||||
spent more time paving a smoother road between the major language versions.</p> | |||||
<p>To be clear, I do think Python 3 is generally a better language than Python 2. | |||||
It has fewer warts, more compelling features, and better performance (except | |||||
for startup time, which is still slower than Python 2). I am ecstatic the | |||||
community is finally rallying around Python 3! For my Python coding, it has | |||||
reached the point where I curse under my breath when I need to support | |||||
Python 2 or even older versions of Python 3, like 3.5 or 3.6: I just wish | |||||
the world would move on and adopt the future already!</p> | |||||
<p>But I would be remiss if I failed to mention some of my gripes with Python | |||||
3 beyond the transition shenanigans.</p> | |||||
<p>Perhaps my least favorite <em>feature</em> of Python 3 is its insistence that the | |||||
world is Unicode. In Python 2, the default string type was backed by | |||||
bytes. In Python 3, the default string type is backed by Unicode code | |||||
points. As part of that transition, large parts of the standard library | |||||
now operate in the Unicode space instead of the domain of bytes. I understand | |||||
why Python does this: they want <em>strings</em> to be Unicode and don't want | |||||
users to have to spend that much energy thinking about when to use | |||||
<code>str</code> versus <code>bytes</code>. This approach is admirable and somewhat defensible | |||||
because it takes a stand on a solution that is arguably <em>good enough</em> for | |||||
most users. However, <strong>the approach of assuming the world is Unicode is | |||||
flat out wrong and has significant implications for systems level | |||||
applications</strong> (like version control tools).</p> | |||||
<p>There are a myriad of places in Python's standard library where Python | |||||
insists on using the Unicode-backed <code>str</code> type and rejects <code>bytes</code>. For | |||||
example, various networking modules refuse to accept <code>bytes</code> for hostnames | |||||
or URLs. HTTP libraries won't accept <code>bytes</code> for HTTP header names or values. | |||||
Functions that are proxies to POSIX-defined functions won't accept <code>bytes</code> | |||||
even though the POSIX function it calls into is using <code>char *</code> and isn't | |||||
Unicode aware. Then there's filename handling, where Python assumes the | |||||
existence of a global encoding for filenames and uses this encoding to convert | |||||
between <code>str</code> and <code>bytes</code>. And it does this despite POSIX filesystem paths | |||||
being a bag of bytes where the only rules are that <code>\0</code> terminates the | |||||
filename and <code>/</code> is special.</p> | |||||
<p>In cases like Python refusing to accept <code>bytes</code> for things like HTTP | |||||
header names (which will just be spit out over the wire as bytes), Python's | |||||
pendulum has swung too far towards Unicode only. In my opinion, Python needs | |||||
to be more accommodating and allow <code>bytes</code> when it makes sense. I hope the | |||||
pendulum knocks some sense into people when it swings back towards a more | |||||
reasonable solution that better acknowledges the realities of the world we | |||||
live in.</p> | |||||
<p>For areas like filename handling, the world is more complicated. Python | |||||
is effectively an abstraction layer over the operating system APIs exposing | |||||
this functionality. And there is often an impedance mismatch between operating | |||||
systems. For example, POSIX (Linux) tends to use <code>char *</code> for everything | |||||
and doesn't care about encoding and Windows tends to use 16 bit character | |||||
types where the encoding is... a can of worms.</p> | |||||
<p><strong>The reality here is that it is impossible to abstract over differences | |||||
between operating system behavior without compromises that can result in data | |||||
loss, outright wrong behavior, or loss of functionality. But Python 3 attempts | |||||
to do it anyway, making Python 3 unsuitable (or at least highly undesirable) for | |||||
certain systems level applications that rely on it</strong> (like a version control | |||||
tool).</p> | |||||
<p>In fairness to Python, it isn't the only programming language that gets | |||||
this wrong. The only language I've seen <em>properly</em> implement higher-order | |||||
abstractions on top of operating system facilities is Rust, whose approach can | |||||
be generalized as <em>use Python 3's solution of normalizing to Unicode/UTF-8 by | |||||
default</em>, but expose <em>escape hatches</em> which allow access to the raw underlying | |||||
types and APIs used by the operating system for the advanced consumers who | |||||
require it. For example, Rust's <code>Path</code> type which represents a filesystem path | |||||
<a href="https://doc.rust-lang.org/std/path/struct.Path.html#method.as_os_str">allows access</a> | |||||
to the raw <a href="https://doc.rust-lang.org/std/ffi/struct.OsStr.html">OsStr</a> value | |||||
used by the operating system, not a normalization of it to bytes or Unicode, | |||||
which may be lossy. This allows consumers to e.g. create and retrieve | |||||
OS-native filesystem paths without data loss. This functionality is critical | |||||
in some domains. Python 3's awareness/insistence that the world is | |||||
Unicode (which it isn't universally) reduces Python's applicability in these | |||||
domains.</p> | |||||
<p>Speaking of Rust, at the Mercurial developer meetup in October 2019, we were | |||||
discussing the use of Rust in Mercurial and one of the core maintainers blurted | |||||
out something along the lines of <em>if Rust were at its current state 5 years ago, | |||||
Mercurial would have likely ported from Python 2 to Rust instead of Python 3</em>. | |||||
As crazy as it initially sounded, I think I agree with that assessment. With the | |||||
benefit of hindsight, having been a key player in the Python 3 porting effort, | |||||
seeing all the complications and headaches Python 3 is introducing, and | |||||
having learned Rust and witnessed its benefits for performance, control, | |||||
and correctness firsthand, porting to Rust would likely have been the correct | |||||
move for the project at that point in time. 2020 is not 2014, however, and I'm | |||||
not sure if I would opt for a rewrite in Rust today. (Most rewrites are follies | |||||
after all.) But I know one thing: I certainly wouldn't implement a new version | |||||
control tool in Python 3 and I would probably choose Rust as an implementation | |||||
language for most new projects in the systems level space or with an expected | |||||
shelf life of 10+ years. (I really should blog about how awesome Rust is.)</p> | |||||
<p>Back to the topic of Python itself, <strong>I'm really soured on Python at this | |||||
point in time. The effort required to port to Python 3 was staggering. For | |||||
Mercurial, Python 3 introduces a ton of problems and doesn't really solve | |||||
many. We effectively sludged through mud for several years only to wind | |||||
up in a state that feels strictly worse than where we started. I'm sure it will | |||||
be strictly better in a few years. But at that point, we're talking about a | |||||
5+ year transition. To call the Python 3 transition disruptive and | |||||
distracting for the project would be an understatement. As a project maintainer, | |||||
it's natural to ask what we could have accomplished if we weren't forced | |||||
to carry out this sideshow.</strong></p> | |||||
<p>I can't shake the feeling that a lot of the pain afflicted by the Python 3 | |||||
transition could have been avoided had Python's language leadership made | |||||
a different set of decisions and more highly prioritized the transition | |||||
experience. (Like not initially removing features like <code>u''</code> and <code>bytes %</code> | |||||
and not introducing gratuitous backwards compatibility breaks, like with | |||||
<code>items()/iteritems()</code>. I would have also liked to see a feature like | |||||
<code>from __future__</code> - maybe <code>from __past__</code> - that would make it easier for | |||||
Python 3 code to target semantics in earlier versions in order to provide | |||||
a more turnkey on-ramp onto new versions.) I simultaneously see Python 3 | |||||
losing its position as a justifiable tool in some domains (like systems | |||||
level tooling) due to ongoing design decisions and poor implementation (like | |||||
startup overhead problems). (In contrast, I see Rust excelling where Python | |||||
is faltering and find Rust code surprisingly expressive to write and maintain | |||||
given how low-level it is and therefore feel that Rust is a compelling | |||||
alternative to Python in a surprisingly large number of domains.)</p> | |||||
<p>Look, I know it is easy for me to armchair quarterback and critique with the | |||||
benefit of hindsight/ignorance. I'm sure there is a lot of nuance here. I'm | |||||
sure there was disagreement within the Python community over a lot of these | |||||
issues. Maintaining a large and successful programming language and community | |||||
like Python's is hard and you aren't going to please all the people all the | |||||
time. And speaking as a maintainer, I have mad respect for the people leading | |||||
such a large community. But niceties aside, everyone knows the Python 3 | |||||
transition was rough and could have gone better. It should not have taken 11 | |||||
years to get to where we are today.</p> | |||||
<p><strong>I'd like to encourage the Python Project to conduct a thorough postmortem on | |||||
the transition to Python 3.</strong> Identify what went well, what could have gone | |||||
better, and what should be done next time such a large language change is wanted. | |||||
Speaking as a Python user, a maintainer of a Python project, and as someone in | |||||
industry who is now skeptical about use of Python at work due to risks of | |||||
potentially company crippling high-effort migrations in the future, a postmortem | |||||
would help restore my confidence that Python's maintainers learned from the | |||||
various missteps on the road to Python 3 and these potentially ecosystem | |||||
crippling mistakes won't be made again.</p> | |||||
<p>Python had a wildly successful past few decades. And it can continue to | |||||
thrive for several more. But the Python 3 migration was painful for all | |||||
involved. And as much as we need to move on and leave Python 2 behind us, | |||||
there are some important lessons to be learned. I hope the Python community | |||||
takes the opportunity to reflect and am confident it will grow stronger by | |||||
taking the time to do so.</p> | |||||
</article> | |||||
<hr> | |||||
<footer> | |||||
<p> | |||||
<a href="/david/" title="Aller à l’accueil">🏠</a> • | |||||
<a href="/david/log/" title="Accès au flux RSS">🤖</a> • | |||||
<a href="http://larlet.com" title="Go to my English profile" data-instant>🇨🇦</a> • | |||||
<a href="mailto:david%40larlet.fr" title="Envoyer un courriel">📮</a> • | |||||
<abbr title="Hébergeur : Alwaysdata, 62 rue Tiquetonne 75002 Paris, +33184162340">🧚</abbr> | |||||
</p> | |||||
</footer> | |||||
<script src="/static/david/js/instantpage-3.0.0.min.js" type="module" defer></script> | |||||
</body> | |||||
</html> |
title: Mercurial's Journey to and Reflections on Python 3 | |||||
url: https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journey-to-and-reflections-on-python-3/ | |||||
hash_url: 67c8c54b07137bcfc0069fccd8261b53 | |||||
<p>Mercurial 5.2 was released on November 5, 2019. It is the first version | |||||
of Mercurial that supports Python 3. This milestone comes nearly 11 years | |||||
after Python 3.0 was first released on December 3, 2008.</p> | |||||
<p>Speaking as a maintainer of Mercurial and an avid user of Python, I | |||||
feel like the experience of making Mercurial work with Python 3 is | |||||
worth sharing because there are a number of lessons to be learned.</p> | |||||
<p>This post is logically divided into two sections: a mostly factual recount | |||||
of Mercurial's Python 3 porting effort and a more opinionated commentary | |||||
of the transition to Python 3 and the Python language ecosystem as a whole. | |||||
Those who don't care about the mechanics of porting a large Python project | |||||
to Python 3 may want to skip the next section or two.</p> | |||||
<h2>Porting Mercurial to Python 3</h2> | |||||
<p>Let's start with a brief history lesson of Mercurial's support for | |||||
Python 3 as told by its own commit history.</p> | |||||
<p>The Mercurial version control tool was first released in April 2005 | |||||
(the same month that Git was initially released). Version 1.0 came out | |||||
in March 2008. The first reference to Python 3 I found in the code base | |||||
was in <a href="https://www.mercurial-scm.org/repo/hg/rev/8fee8ff13d37">September 2008</a>. | |||||
Then not much happens for a while until | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/4494fb02d549">June 2010</a>, when | |||||
someone authors a bunch of changes to make the Python C extensions | |||||
start to recognize Python 3. Then things were again quiet for a while | |||||
until <a href="https://www.mercurial-scm.org/repo/hg/rev/56ef99fbd6f2">January 2013</a>, | |||||
when a handful of changes landed to remove 2 argument <code>raise</code>. There were | |||||
a handful of commits in 2014 but nothing worth calling out.</p> | |||||
<p>Mercurial's meaningful journey to Python 3 started in 2015. In code, | |||||
the work started in | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/af6e6a0781d7">April 2015</a>, with | |||||
effort to make Mercurial's test harness run with Python 3. Part of | |||||
this was a <a href="https://www.mercurial-scm.org/repo/hg/rev/fefc72523491">decision</a> | |||||
that Python 3.5 (to be released several months later in September 2015) | |||||
would be the minimum Python 3 version that Mercurial would support.</p> | |||||
<p>Once the Mercurial Project decided it wanted to port to Python 3 (as opposed | |||||
to another language), one of the earliest decisions was how to perform that | |||||
port. <strong>Mercurial's code base was too large to attempt a flag day conversion</strong> | |||||
where there would be a Python 2 version and a Python 3 version and one day | |||||
everyone would switch from Python 2 to 3. <strong>Mercurial needed a way to run the | |||||
same code (or as much of the same code) on both Python 2 and 3.</strong> We would | |||||
maintain a single code base and users would gradually switch from running with | |||||
Python 2 to Python 3.</p> | |||||
<p>In <a href="https://www.mercurial-scm.org/repo/hg/rev/e1fb276d4619">May 2015</a>, | |||||
Mercurial dropped support for Python 2.4 and 2.5. Dropping support for | |||||
these older Python versions was critical, as it was effectively impossible to | |||||
write Python code that ran on this wide gamut of versions because of | |||||
incompatibilities in syntax and language features. For example, you needed | |||||
Python 2.6 to get <code>print()</code> via <code>from __future__ import print_function</code>. | |||||
The project's late start at a Python 3 port can be significantly attributed | |||||
to Python 2.4 and 2.5 compatibility holding us back.</p> | |||||
<p>The main goal with Mercurial's early porting work was just getting the code base | |||||
to a point where <code>import mercurial</code> would work. There were a myriad of places | |||||
where Mercurial used syntax that was invalid on Python 3 and Python 3 | |||||
couldn't even parse the source code, let alone compile it to bytecode and | |||||
execute it.</p> | |||||
<p>This effort began in earnest in | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/e93036747902">June 2015</a> | |||||
with global source code rewrites like using modern octal syntax, | |||||
modern exception catching syntax (<code>except Exception as e</code> instead of | |||||
<code>except Exception, e</code>), <code>print()</code> instead of <code>print</code>, and a | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/1a6a117d0b95">modern import convention</a> | |||||
along with the use of <code>from __future__ import absolute_import</code>.</p> | |||||
<p>In the early days of the port, our first goal was to get all source code | |||||
parsing as valid Python 3. The next step was to get all the modules <code>import</code>ing | |||||
cleanly. This entailed fixing code that ran at <code>import</code> time to work on | |||||
Python 3. Our thinking was that we would need the code base to be <code>import</code> | |||||
clean on Python 3 before seriously thinking about run-time behavior. In reality, | |||||
we quickly ported a lot of modules to <code>import</code> cleanly and then moved on | |||||
to higher-level porting, leaving a long-tail of modules with <code>import</code> failures.</p> | |||||
<p>This initial porting effort played out over months. There weren't many | |||||
people working on it in the early days: a few people would basically hack on | |||||
Python 3 as a form of itch scratching and most of the project's energy was | |||||
focused on improving the existing Python 2 based product. You can get a rough | |||||
idea of the timeline and participation in the early porting effort through the | |||||
<a href="https://www.mercurial-scm.org/repo/hg/log/081a77df7bc6/tests/test-check-py3-compat.t?revcount=960">history of test-check-py3-compat.t</a>. | |||||
We see the test being added in <a href="https://www.mercurial-scm.org/repo/hg/rev/40eb385f798f">December 2015</a>, | |||||
By June 2016, most of the code base was ported to our modern import convention | |||||
and we were ready to move on to more meaningful porting.</p> | |||||
<p>One of the biggest early hurdles in our porting effort was how to overcome | |||||
the string literals type mismatch between Python 2 and 3. In Python 2, a | |||||
<code>''</code> string literal is a sequence of bytes. In Python 3, a <code>''</code> string literal | |||||
is a sequence of Unicode code points. These are fundamentally different types. | |||||
And in Mercurial's code base, <strong>most of our <em>string</em> types are binary by design: | |||||
use of a Unicode based <code>str</code> for representing data is flat out wrong for our use | |||||
case</strong>. We knew that Mercurial would need to eventually switch many string | |||||
literals from <code>''</code> to <code>b''</code> to preserve type compatibility. But doing so would | |||||
be problematic.</p> | |||||
<p>In the early days of Mercurial's Python 3 port in 2015, Mercurial's project | |||||
maintainer (Matt Mackall) set a ground rule that the Python 3 port shouldn't overly | |||||
disrupt others: he wanted the Python 3 port to more or less happen in the background | |||||
and not require every developer to be aware of Python 3's low-level behavior in order | |||||
to get work done on the existing Python 2 code base. This may seem like a questionable | |||||
decision (and I probably disagreed with him to some extent at the time because I was | |||||
doing Python 3 porting work and the decision constrained this work). But it was the | |||||
correct decision. Matt knew that it would be years before the Python 3 port was either | |||||
necessary or resulted in a meaningful return on investment (the value proposition of | |||||
Python 3 has always been weak to Mercurial because Python 3 doesn't demonstrate a | |||||
compelling advantage over Python 2 for our use case). What Matt was trying to do was | |||||
minimize the externalized costs that a Python 3 port would inflict on the project. | |||||
He correctly recognized that maintaining the existing product and supporting | |||||
existing users was more important than a long-term bet in its infancy.</p> | |||||
<p>This ground rule meant that a mass insertion of <code>b''</code> prefixes everywhere | |||||
was not desirable, as that would require developers to think about whether | |||||
a type was a <code>bytes</code> or <code>str</code>, a distinction they didn't have to worry about | |||||
on Python 2 because we practically never used the Unicode-based string type in | |||||
Mercurial.</p> | |||||
<p>In addition, there were some other practical issues with doing a bulk <code>b''</code> | |||||
prefix insertion. One was that the added <code>b</code> characters would cause a lot of lines | |||||
to grow beyond our length limits and we'd have to reformat code. That would | |||||
require manual intervention and would significantly slow down porting. And | |||||
a sub-issue of adding all the <code>b</code> prefixes and reformatting code is that it would | |||||
<em>break</em> annotate/blame more than was tolerable. The latter issue was addressed | |||||
by teaching Mercurial's annotate/blame feature to <em>skip</em> revisions. The project | |||||
now has a convention of annotating commit messages with <code># skip-blame <reason></code> | |||||
so structural only changes can easily be ignored when performing an | |||||
annotate/blame.</p> | |||||
<p>A stop-gap solution to the <code>b''</code> everywhere issue came in | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/1c22400db72d">July 2016</a>, when I | |||||
introduced a custom Python module importer that rewrote source code as part | |||||
of <code>import</code> when running on Python 3. (I have | |||||
<a href="/blog/2017/03/13/from-__past__-import-bytes_literals/">previously blogged</a> | |||||
about this hack.) What this did was transparently add <code>b''</code> prefixes to all | |||||
un-prefixed string literals as well as modify how a few common functions were | |||||
called so that we wouldn't need to modify source code so things would run natively | |||||
on Python 3. The source transformer allowed us to have the benefits of progressing | |||||
in our Python 3 port without having to rewrite tens of thousands of lines of | |||||
source code. The solution was hacky. But it enabled us to make significant | |||||
progress on the Python 3 port without externalizing a lot of cost onto others.</p> | |||||
<p>I thought the source transformer would be relatively short-lived and would be | |||||
removed shortly after the project inevitably decided to go all in on Python 3. | |||||
To my surprise, others built additional transforms over the years and the source | |||||
transformer persisted all the way until | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/d783f945a701">October 2019</a>, when | |||||
I removed it just before the first non-alpha Python 3 compatible version | |||||
of Mercurial was released.</p> | |||||
<p>A common problem Mercurial faced with making the code base dual Python 2/3 native | |||||
was dealing with standard library differences. Most of the problems stemmed | |||||
from changes between Python 2.7 and 3.5+. But there are changes within the | |||||
versions of Python 3 that we had to wallpaper over as well. In | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/6041fb8f2da8">April 2016</a>, the | |||||
<code>mercurial.pycompat</code> module was introduced to export aliases or wrappers around | |||||
standard library functionality to abstract the differences between Python | |||||
versions. This file <a href="https://www.mercurial-scm.org/repo/hg/log/66af68d4c751/mercurial/pycompat.py?revcount=240">grew over time</a> | |||||
and <a href="https://www.mercurial-scm.org/repo/hg/file/66af68d4c751/mercurial/pycompat.py">eventually became</a> | |||||
Mercurial's version of <a href="https://six.readthedocs.io/">six</a>. To be honest, I'm | |||||
not sure if we should have used <code>six</code> from the beginning. <code>six</code> probably would | |||||
have saved some work. But we had to eventually write a lot of shims for | |||||
converting between <code>str</code> and <code>bytes</code> and would have needed to invent a | |||||
<code>pycompat</code> layer in some form anyway. So I'm not sure <code>six</code> would have saved | |||||
enough effort to justify the baggage of integrating a 3rd party package into | |||||
Mercurial. (When Mercurial accepts a 3rd party package, downstream packagers | |||||
like Debian get all hot and bothered and end up making questionable patches | |||||
to our source code. So we prefer to minimize the surface area for | |||||
problems by minimizing dependencies on 3rd party packages.)</p> | |||||
<p>Once we had a source transforming module importer and the <code>pycompat</code> | |||||
compatibility shim, we started to focus in earnest on making core | |||||
functionality actually work on Python 3. We established a convention of | |||||
annotating changesets needed for Python 3 with <code>py3</code>, so a | |||||
<a href="https://www.mercurial-scm.org/repo/hg/log?rev=desc(py3)&revcount=4000">commit message search</a> | |||||
yields a lot of the history. (But it isn't a full history since not every Python 3 | |||||
oriented change used this convention). We see from that history that after | |||||
the source importer landed, a lot of porting effort was spent on things | |||||
very early in the <code>hg</code> process lifetime. This included handling environment | |||||
variables, loading config files, and argument parsing. We introduced a | |||||
<a href="https://www.mercurial-scm.org/repo/hg/log/@/tests/test-check-py3-commands.t">test-check-py3-commands.t</a> | |||||
test to track the progress of <code>hg</code> commands working in Python 3. The very early | |||||
history of that file shows the various error messages changing, as underlying | |||||
early process functionality was slowly ported to work on Python 3. By | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/2d555d753f0e">December 2016</a>, we | |||||
had <code>hg version</code> working on Python 3!</p> | |||||
<p>With basic <code>hg</code> command dispatch ported to Python 3 at the end of 2016, | |||||
2017 represented an inflection point in the Python 3 porting effort. With the | |||||
early process functionality working, different people could pick up different | |||||
commands and code paths and start making code work with Python 3. By | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/52ee1b5ac277">March 2017</a>, basic | |||||
repository opening and <code>hg files</code> worked. Shortly thereafter, | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/ed23f929af38">hg init started working as well</a>. | |||||
And <a href="https://www.mercurial-scm.org/repo/hg/rev/935a1b1117c7">hg status</a> and | |||||
<a href="https://www.mercurial-scm.org/repo/hg/rev/aea8ec3f7dd1">hg commit</a> did as well.</p> | |||||
<p>Within a few months, enough of Mercurial's functionality was working with Python | |||||
3 that we started to <a href="https://www.mercurial-scm.org/repo/hg/rev/7a877e569ed6">track which tests passed on Python 3</a>. | |||||
The <a href="https://www.mercurial-scm.org/repo/hg/log/@/contrib/python3-whitelist?revcount=480">evolution of this file</a> | |||||
shows a reasonable history of the porting velocity.</p> | |||||
<p>In <a href="https://www.mercurial-scm.org/repo/hg/rev/feb910d2f59b">May 2017</a>, we dropped | |||||
support for Python 2.6. This significantly reduced the complexity of supporting | |||||
Python 3, as there was tons of functionality in Python 2.7 that made it easier | |||||
to target both Python 2 and 3 and now our hands were untied to utilize it.</p> | |||||
<p>In <a href="https://www.mercurial-scm.org/repo/hg/rev/bd8875b6473c">November 2017</a>, I | |||||
landed a test harness feature to report exceptions seen during test runs. I | |||||
later <a href="https://www.mercurial-scm.org/repo/hg/rev/8de90e006c78">refined the output</a> | |||||
so the most frequent failures were reported more prominently. This feature | |||||
greatly enabled our ability to target the most common exceptions, allowing | |||||
us to write patches to fix the most prevalent issues on Python 3 and uncover | |||||
previously unknown failures.</p> | |||||
<p>By the end of 2017, we had most of the structural pieces in place to complete | |||||
the port. Essentially all that was required at that point was time and labor. | |||||
We didn't have a formal mechanism in place to target porting efforts. Instead, | |||||
people would pick up a component or test that they wanted to hack on and then | |||||
make incremental changes towards making that work. All the while, we didn't | |||||
have a strict policy on not regressing Python 3 and regressions in Python 3 | |||||
porting progress were semi-frequent. Although we did tend to correct | |||||
regressions quickly. And over time, developers saw a flurry of Python 3 | |||||
patches and slowly grew awareness of how to accommodate Python 3, and the | |||||
number of Python 3 regressions became less frequent.</p> | |||||
<p>As useful as the source-transforming module importer was, it incurred some | |||||
additional burden for the porting effort. The source transformer effectively | |||||
converted all un-prefixed string literals (<code>''</code>) to bytes literals (<code>b''</code>) | |||||
to preserve string type behavior with Python 2. But various aspects of Python | |||||
3 didn't like the existence of <code>bytes</code>. Various standard library functionality | |||||
now wanted unicode <code>str</code> and didn't accept <code>bytes</code>, even though the Python | |||||
2 implementation used the equivalent of <code>bytes</code>. So our <code>pycompat</code> layer | |||||
grew pretty large to accommodate calling into various standard library | |||||
functionality. Another side-effect which we didn't initially anticipate | |||||
was the <code>**kwargs</code> calling convention. Python allows you to use <code>**</code> | |||||
with a dict with string keys to turn those keys into named arguments | |||||
in a function call. But Python 3 requires these <code>dict</code> keys to be | |||||
<code>str</code> and outright rejects <code>bytes</code> keys, even if the <code>bytes</code> instance | |||||
is ASCII safe and has the same underlying byte representation of the | |||||
string data as the <code>str</code> instance would. So we had to invent support | |||||
functions that would convert <code>dict</code> keys from <code>bytes</code> to <code>str</code> for | |||||
use with <code>**kwargs</code> and another to convert a <code>**kwargs</code> dict from | |||||
<code>str</code> keys to <code>bytes</code> keys so we could use <code>''</code> syntax to access keys | |||||
in our source code! Also on the string type front, we had to sprinkle | |||||
the codebase with raw string literals (<code>r''</code>) to force the use of | |||||
<code>str</code> irregardless of which Python version you were running on (our | |||||
source transformer only changed unprefixed string literals, so existing | |||||
<code>r''</code> strings would be preserved as <code>str</code>).</p> | |||||
<p>Blind transformation of all string literals to <code>bytes</code> was less than ideal | |||||
and it did impose some unwanted side-effects. But, again, most <em>strings</em> | |||||
in Mercurial are bytes by design, so we thought it would be easier to | |||||
<em>byteify</em> all strings then selectively undo that where native strings | |||||
were actually warranted (like keys in most <code>dict</code>s) than to take the | |||||
up-front cost to examine every string and make an intelligent determination | |||||
as to what type it should be. I go back and forth as to whether this was the | |||||
correct call. But when you factor in that the source transforming | |||||
module importer unblocked Python 3 porting at a time in the project's | |||||
history when there was so much focus on improving the core product and it | |||||
did so without externalizing many costs onto the people doing the critical | |||||
core product work, I think it was the right call.</p> | |||||
<p>By mid 2019, the number of test failures in Python 3 had been whittled | |||||
down to a reasonable, less daunting number. It felt like victory was | |||||
in grasp and inevitable. But a few significant issues lingered.</p> | |||||
<p>One remaining question was around addressing differences between Python | |||||
3 versions. At the time, Python 3.5, 3.6, and 3.7 were released and 3.8 | |||||
was scheduled for release by the end of the year. We had a surprising | |||||
number of issues with differences in Python 3 versions. Many of us | |||||
were running Python 3.7, so it had the fewest failures. We had to spend | |||||
extra effort to get Python 3.5 and 3.6 working as well as 3.7. Same for | |||||
3.8.</p> | |||||
<p>Another task we deferred until the second half of 2019 was standing up | |||||
robust CI for Python 3. We had some coverage, but it was minimal. Wanting | |||||
a distraction from PyOxidizer for a bit and wanting to overhaul Mercurial's | |||||
CI system (which is officially built on Buildbot), I cobbled together a | |||||
<em>serverless</em> CI system built on top of AWS DynamoDB and S3 for storage, | |||||
Lambda functions and CloudWatch events for all business logic, and EC2 spot | |||||
instances for job execution. This CI system executed Python 3.5, 3.6, 3.7, | |||||
and 3.8 variants of our test harness on Linux and Python 3.7 on Windows. | |||||
This gave developers insight into version-specific failures. More | |||||
importantly, it also gave insight into Windows failures, which was | |||||
previously not well tested. It was discovered that Python 3 on Windows was | |||||
lagging significantly behind POSIX.</p> | |||||
<p>By the time of the Mercurial developer meetup in October 2019, nearly | |||||
all tests were passing on POSIX platforms and we were confident that | |||||
we could declare Python 3 support as at least beta quality for the | |||||
Mercurial 5.2 release, planned for early November.</p> | |||||
<p>One of our blockers for ripping off the alpha label on Python 3 support | |||||
was removing our source-transforming module importer. It had performance | |||||
implications and it wasn't something we wanted to ship because it felt | |||||
too hacky. A blocker for this was we wanted to automatically format | |||||
our source tree with <a href="https://black.readthedocs.io/en/stable/">black</a> | |||||
because if we removed the source transformer, we'd have to rewrite | |||||
a lot of source code to apply changes the transformer was performing, | |||||
which would necessitate wrapping a lot of lines, which would involve a lot | |||||
of manual effort. We wanted to <em>blacken</em> our code base first so that | |||||
mass rewriting source code wouldn't involve a lot of tedious reformatting | |||||
since <code>black</code> would handle that for us automatically. And rewriting the | |||||
source tree with <code>black</code> was blocked on a specific feature landing in | |||||
<code>black</code>! (We did not agree with <code>black</code>'s behavior of | |||||
unwrapping comma-delimited lists of items if they could fit on a single | |||||
line. So one of our core contributors wrote a patch to <code>black</code> that | |||||
changed its behavior so a trailing <code>,</code> in a list of items will force | |||||
items to be formatted on multiple lines. I personally find the multiple line | |||||
formatting much easier to read. And the behavior is arguably better for | |||||
code review and <em>annotation</em>, which is line based.) Once this feature | |||||
landed in <code>black</code>, we reformatted our source tree and started ripping | |||||
out the source transformations, starting by inserting <code>b''</code> literals | |||||
everywhere. By late October, the source transformer was no more and | |||||
we were ready to release beta quality support for Python 3 (at least | |||||
on UNIX-like platforms).</p> | |||||
<p>Having described a mostly factual overview of Mercurial's port to Python | |||||
3, it is now time to shift gears to the speculative and opinionated | |||||
parts of this post. <strong>I want to underscore that the opinions reflected | |||||
here are my own and do not reflect the overall Mercurial Project or even | |||||
a consensus within it.</strong></p> | |||||
<h2>The Future of Python 3 and Mercurial</h2> | |||||
<p>Mercurial's port to Python 3 is still ongoing. While we've shipped | |||||
Python 3 support and the test harness is clean on Python 3, I view shipping | |||||
as only a milestone - arguably <em>the</em> most important one - in a longer | |||||
journey. There's still a lot of work to do.</p> | |||||
<p>It is now 2020 and Python 2 support is now officially dead from the | |||||
perspective of the Python language maintainers. Linux distributions are | |||||
starting to rip out Python 2. Packages are dropping Python 2 support in | |||||
new versions. The world is moving to Python 3 only. But <strong>Mercurial still | |||||
officially supports Python 2</strong>. And it is still yet to be determined how | |||||
long we will retain support for Python 2 in the code base. We've only had | |||||
one release supporting Python 3. Our users still need to port their | |||||
extensions (implemented in Python). Our users still need to start widely | |||||
using Mercurial with Python 3. Even our own developers need to switch to | |||||
Python 3 (old habits are hard to break).</p> | |||||
<p>I anticipate a long tail of random bugs in Mercurial on Python 3. While | |||||
the tests may pass, our code coverage is not 100%. And even if it were, | |||||
Python is a dynamic language and there are tons of invariants that aren't | |||||
caught at compile time and can only be discovered at run time. <strong>These | |||||
invariants cannot all be detected by tests, no matter how good your test | |||||
coverage is.</strong> This is a <em>feature</em>/<em>limitation</em> of dynamic languages. Our | |||||
users will likely be finding a long tail of miscellaneous bugs on Python | |||||
3 for <em>years</em>.</p> | |||||
<p>At present, our code base is littered with tons of random hacks to bridge | |||||
the gap between Python 2 and 3. Once Python 2 support is dropped, we'll | |||||
need to remove these hacks and make the source tree Python 3 native, with | |||||
minimal shims to wallpaper over differences in Python 3 versions. <strong>Removing | |||||
this Python version bridge code will likely require hundreds of commits and | |||||
will be a non-trivial effort.</strong> It's likely to be deemed a low priority (it | |||||
is glorified busy work after all), and code for the express purpose of | |||||
supporting Python 2 will likely linger for years.</p> | |||||
<p>We are also still shoring up our packaging and distribution story on | |||||
Python 3. This is easier on some platforms than others. I created | |||||
<a href="https://github.com/indygreg/PyOxidizer">PyOxidizer</a> partially because | |||||
of the poor experience I had with Python application packaging and | |||||
distribution through the Mercurial Project. The Mercurial Project has | |||||
already signed off on using PyOxidizer for distributing Mercurial in | |||||
the future. So look for an <em>oxidized</em> Mercurial distribution in the | |||||
near future! (You could argue PyOxidizer is an epic yak shave to better | |||||
support Mercurial. But that's for another post.)</p> | |||||
<p>Then there's Windows support. A Python 3 powered Mercurial on Windows | |||||
still has a handful of known issues. It may require a few more releases | |||||
before we consider Python 3 on Windows to be stable.</p> | |||||
<p>Because we're still on a code base that must support Python 2, our | |||||
adoption of Python 3 features is very limited. The only Python 3 | |||||
feature that Mercurial developers seem to almost universally get excited | |||||
about is type annotations. We already have some people playing around | |||||
with <code>pytype</code> using comment-based annotations and <code>pytype</code> has already | |||||
caught a few bugs. We're eager to go all in on type annotations and | |||||
uncover lots of dynamic typing bugs and poorly implemented APIs. | |||||
Beyond type annotations, I can't name any feature that people are screaming | |||||
to adopt and which makes a lot of sense for Mercurial. There's a long | |||||
tail of minor features I'm sure will get utilized. But none of the | |||||
marquee features that define major language releases seem that interesting | |||||
to us. Time will tell.</p> | |||||
<h2>Commentary on Python 3</h2> | |||||
<p>Having described Mercurial's ongoing journey to Python 3, I now want to | |||||
focus more on Python itself. Again, the opinions here are my own and | |||||
don't reflect those of the Mercurial Project.</p> | |||||
<p><strong>Succinctly, my experience porting Mercurial and other projects to | |||||
Python 3 has significantly soured my perceptions of Python. As much as | |||||
I have historically loved Python - from the language to the welcoming | |||||
community - I am still struggling to understand how Python could manage | |||||
to inflict so much hardship on the community by choosing the transition | |||||
plan that they did.</strong> I believe Python's choices represent a terrific | |||||
example of what not to do when managing a large project or ecosystem. | |||||
Maintainers of other largely-deployed systems would benefit from taking | |||||
the time to understand and reflect on Python's missteps.</p> | |||||
<p>Python 3.0 was released on December 3, 2008. And it took the better part of | |||||
a decade for the community to embrace it. <strong>This should be universally | |||||
recognized as a failure.</strong> While hindsight is 20/20, many of the issues | |||||
with Python 3 were obvious at the time and could have been mitigated had | |||||
the language maintainers been more accommodating - and dare I say | |||||
empathetic - to its users.</p> | |||||
<p>Initially, Python 3 had a rather cavalier attitude towards backwards and | |||||
forwards compatibility. In the early years of Python 3, the attitude of | |||||
Python's maintainers was <em>Python 3 is a new, better language: you should | |||||
target it explicitly</em>. There were some tools and methods to ease the | |||||
transition. But nothing super polished, especially in the early years. | |||||
Adoption of Python 3 in the overall community was slow. Python developers | |||||
in the wild justifiably complained that the value proposition of Python 3 | |||||
was too weak to justify porting effort. Not helping was that the early | |||||
advice for targeting Python 3 was to rewrite the source code to become | |||||
Python 3 native. This is in contrast with using the same source to run on both | |||||
Python 2 and 3. For library and application maintainers, this potentially | |||||
meant maintaining separate versions of your code or forcing end-users to | |||||
make a giant leap, which would realistically orphan users on an old version, | |||||
fragmenting your user base. Neither of those were great alternatives, so | |||||
you can understand why many projects didn't bite.</p> | |||||
<p>For many projects of non-trivial size, flag day transitions from Python 2 to | |||||
3 were simply not viable: the pathway to Python 3 was to make code dual | |||||
Python 2/3 compatible and gradually switch over the runtime to Python 3. | |||||
But initial versions of Python 3 made this effectively impossible! Let me | |||||
give a few specific examples.</p> | |||||
<p>In Python 2, a string literal <code>''</code> is effectively an array of bytes. In | |||||
Python 3, it is a series of Unicode code points - a fundamentally different | |||||
type! In Python 2, you could write <code>b''</code> to be explicit that a string literal | |||||
was bytes or you could write <code>u''</code> to indicate a Unicode literal, mimicking | |||||
Python 3's behavior. In Python 3, you could write <code>b''</code> to create a <code>bytes</code> | |||||
instance. But for whatever reason, Python 3 initially removed the <code>u''</code> syntax, | |||||
meaning there wasn't as easy way to explicitly denote the type of each | |||||
string literal so that it was consistent between Python 2 and 3! Python 3.3 | |||||
(released September 2012) restored <code>u''</code> support, making it more viable to | |||||
write Python source code that worked on both Python 2 and 3. <strong>For nearly 4 | |||||
years, Python 3 took away the consistent syntax for denoting bytes/Unicode | |||||
string literals.</strong></p> | |||||
<p>Another feature was <code>%</code> formatting of strings. Python 2 allowed use of the | |||||
<code>%</code> formatting operator on both its string types. But Python 3 initially | |||||
removed the implementation of <code>%</code> from <code>bytes</code>. Why, I have no clue. It | |||||
is perfectly reasonable to splice byte sequences into a buffer via use of | |||||
a formatting string. But the Python language maintainers insisted otherwise. | |||||
And it wasn't until the community complained about its absence loudly enough | |||||
that this feature was | |||||
<a href="https://docs.python.org/3/whatsnew/3.5.html#whatsnew-pep-461">restored in Python 3.5</a>, | |||||
which was released in September 2015. Fun fact: the lack of this feature was | |||||
once considered a blocker for Mercurial moving to Python 3 because | |||||
Mercurial uses <code>bytes</code> almost universally, which meant that nearly every use | |||||
of <code>%</code> would have to be changed to something else. And to this day, Python | |||||
3's <code>bytes</code> still doesn't have a <code>format()</code> method, so the alternative was | |||||
effectively string concatenation, which is a massive step backwards from the | |||||
expressiveness of <code>%</code> formatting.</p> | |||||
<p><strong>The initial approach of Python 3 mirrors a folly that many developers | |||||
and projects make: attempting a rewrite instead of performing incremental | |||||
evolution.</strong> For established projects, large scale rewrites often go poorly. | |||||
And Python 3 is no exception. Yes, from a code level, CPython (and likely | |||||
other Python implementations) were incremental changes over Python 2 using | |||||
the same code base. But from a language and standard library level, the | |||||
differences in Python 3 were significant enough that I - and even Python's | |||||
core maintainers - considered it a new language, and therefore a rewrite. | |||||
When your random project attempts a rewrite and fails, the blast radius of that is | |||||
often contained to that project. Maybe you don't publish a new release | |||||
as soon as you otherwise would. <strong>But when you are powering an ecosystem, | |||||
the ripple effects from a failed rewrite percolate throughout that ecosystem | |||||
and last for years and have many second order effects. We see this with | |||||
Python 3, where poor choices made in the late 2000s are inflicting significant | |||||
hardship still in 2020.</strong></p> | |||||
<p>From the initial restrained adoption of Python 3, it is obvious that the | |||||
Python ecosystem overwhelmingly rejected the initial boil the oceans approach | |||||
of Python 3. Python's maintainers eventually got the message and started | |||||
restoring features like <code>u''</code> and <code>bytes</code> <code>%</code> formatting back into the | |||||
language to placate the community. All the while Python 3 had been accumulating | |||||
new features and the cumulative sum of those features was compelling enough | |||||
to win over users.</p> | |||||
<p>For many projects (including Mercurial), Python 3.4/3.5 was the first viable | |||||
porting target for Python 3. Python 3.5 was released in September 2015, almost | |||||
7 years after Python 3.0 was released in December 2008. <strong>Seven. Years.</strong> | |||||
An ecosystem that falters for that long is generally not healthy. What may have | |||||
saved Python from total collapse here is that Python 2 was still going strong and | |||||
people were generally happy with it. I really do think Python dodged a bullet | |||||
here, because there was a massive window where the language could have | |||||
hemorrhaged a critical amount of its user base and been relegated to an | |||||
afterthought. One could draw an analogy to Perl, which lost out to PHP, | |||||
Python, and Ruby, and whose fall from grace aligned with a lengthy | |||||
transition from Perl 5 to 6.</p> | |||||
<p>If you look back at the early history of Python 3, <strong>I think you are forced | |||||
to conclude that Python effectively kneecapped itself for 5-7 years | |||||
through questionable implementation choices that prevented users from | |||||
incurring incremental transitions between the major language versions. 2008 | |||||
to 2013-2015 should be known as the <em>lost years of Python</em> because so much | |||||
opportunity and energy was squandered.</strong> Yes, Python is still healthy today | |||||
and Python 3 is (finally) being adopted at scale. But had earlier versions | |||||
of Python 3 been more <em>empathetic</em> towards Python 2 users porting to it, | |||||
Python and Python 3 in 2020 would be even stronger than it is. The community | |||||
was artificially hindered for years. And we won't know until 2023-2025 what | |||||
things could have looked like in 2020 had the Python core language team | |||||
spent more time paving a smoother road between the major language versions.</p> | |||||
<p>To be clear, I do think Python 3 is generally a better language than Python 2. | |||||
It has fewer warts, more compelling features, and better performance (except | |||||
for startup time, which is still slower than Python 2). I am ecstatic the | |||||
community is finally rallying around Python 3! For my Python coding, it has | |||||
reached the point where I curse under my breath when I need to support | |||||
Python 2 or even older versions of Python 3, like 3.5 or 3.6: I just wish | |||||
the world would move on and adopt the future already!</p> | |||||
<p>But I would be remiss if I failed to mention some of my gripes with Python | |||||
3 beyond the transition shenanigans.</p> | |||||
<p>Perhaps my least favorite <em>feature</em> of Python 3 is its insistence that the | |||||
world is Unicode. In Python 2, the default string type was backed by | |||||
bytes. In Python 3, the default string type is backed by Unicode code | |||||
points. As part of that transition, large parts of the standard library | |||||
now operate in the Unicode space instead of the domain of bytes. I understand | |||||
why Python does this: they want <em>strings</em> to be Unicode and don't want | |||||
users to have to spend that much energy thinking about when to use | |||||
<code>str</code> versus <code>bytes</code>. This approach is admirable and somewhat defensible | |||||
because it takes a stand on a solution that is arguably <em>good enough</em> for | |||||
most users. However, <strong>the approach of assuming the world is Unicode is | |||||
flat out wrong and has significant implications for systems level | |||||
applications</strong> (like version control tools).</p> | |||||
<p>There are a myriad of places in Python's standard library where Python | |||||
insists on using the Unicode-backed <code>str</code> type and rejects <code>bytes</code>. For | |||||
example, various networking modules refuse to accept <code>bytes</code> for hostnames | |||||
or URLs. HTTP libraries won't accept <code>bytes</code> for HTTP header names or values. | |||||
Functions that are proxies to POSIX-defined functions won't accept <code>bytes</code> | |||||
even though the POSIX function it calls into is using <code>char *</code> and isn't | |||||
Unicode aware. Then there's filename handling, where Python assumes the | |||||
existence of a global encoding for filenames and uses this encoding to convert | |||||
between <code>str</code> and <code>bytes</code>. And it does this despite POSIX filesystem paths | |||||
being a bag of bytes where the only rules are that <code>\0</code> terminates the | |||||
filename and <code>/</code> is special.</p> | |||||
<p>In cases like Python refusing to accept <code>bytes</code> for things like HTTP | |||||
header names (which will just be spit out over the wire as bytes), Python's | |||||
pendulum has swung too far towards Unicode only. In my opinion, Python needs | |||||
to be more accommodating and allow <code>bytes</code> when it makes sense. I hope the | |||||
pendulum knocks some sense into people when it swings back towards a more | |||||
reasonable solution that better acknowledges the realities of the world we | |||||
live in.</p> | |||||
<p>For areas like filename handling, the world is more complicated. Python | |||||
is effectively an abstraction layer over the operating system APIs exposing | |||||
this functionality. And there is often an impedance mismatch between operating | |||||
systems. For example, POSIX (Linux) tends to use <code>char *</code> for everything | |||||
and doesn't care about encoding and Windows tends to use 16 bit character | |||||
types where the encoding is... a can of worms.</p> | |||||
<p><strong>The reality here is that it is impossible to abstract over differences | |||||
between operating system behavior without compromises that can result in data | |||||
loss, outright wrong behavior, or loss of functionality. But Python 3 attempts | |||||
to do it anyway, making Python 3 unsuitable (or at least highly undesirable) for | |||||
certain systems level applications that rely on it</strong> (like a version control | |||||
tool).</p> | |||||
<p>In fairness to Python, it isn't the only programming language that gets | |||||
this wrong. The only language I've seen <em>properly</em> implement higher-order | |||||
abstractions on top of operating system facilities is Rust, whose approach can | |||||
be generalized as <em>use Python 3's solution of normalizing to Unicode/UTF-8 by | |||||
default</em>, but expose <em>escape hatches</em> which allow access to the raw underlying | |||||
types and APIs used by the operating system for the advanced consumers who | |||||
require it. For example, Rust's <code>Path</code> type which represents a filesystem path | |||||
<a href="https://doc.rust-lang.org/std/path/struct.Path.html#method.as_os_str">allows access</a> | |||||
to the raw <a href="https://doc.rust-lang.org/std/ffi/struct.OsStr.html">OsStr</a> value | |||||
used by the operating system, not a normalization of it to bytes or Unicode, | |||||
which may be lossy. This allows consumers to e.g. create and retrieve | |||||
OS-native filesystem paths without data loss. This functionality is critical | |||||
in some domains. Python 3's awareness/insistence that the world is | |||||
Unicode (which it isn't universally) reduces Python's applicability in these | |||||
domains.</p> | |||||
<p>Speaking of Rust, at the Mercurial developer meetup in October 2019, we were | |||||
discussing the use of Rust in Mercurial and one of the core maintainers blurted | |||||
out something along the lines of <em>if Rust were at its current state 5 years ago, | |||||
Mercurial would have likely ported from Python 2 to Rust instead of Python 3</em>. | |||||
As crazy as it initially sounded, I think I agree with that assessment. With the | |||||
benefit of hindsight, having been a key player in the Python 3 porting effort, | |||||
seeing all the complications and headaches Python 3 is introducing, and | |||||
having learned Rust and witnessed its benefits for performance, control, | |||||
and correctness firsthand, porting to Rust would likely have been the correct | |||||
move for the project at that point in time. 2020 is not 2014, however, and I'm | |||||
not sure if I would opt for a rewrite in Rust today. (Most rewrites are follies | |||||
after all.) But I know one thing: I certainly wouldn't implement a new version | |||||
control tool in Python 3 and I would probably choose Rust as an implementation | |||||
language for most new projects in the systems level space or with an expected | |||||
shelf life of 10+ years. (I really should blog about how awesome Rust is.)</p> | |||||
<p>Back to the topic of Python itself, <strong>I'm really soured on Python at this | |||||
point in time. The effort required to port to Python 3 was staggering. For | |||||
Mercurial, Python 3 introduces a ton of problems and doesn't really solve | |||||
many. We effectively sludged through mud for several years only to wind | |||||
up in a state that feels strictly worse than where we started. I'm sure it will | |||||
be strictly better in a few years. But at that point, we're talking about a | |||||
5+ year transition. To call the Python 3 transition disruptive and | |||||
distracting for the project would be an understatement. As a project maintainer, | |||||
it's natural to ask what we could have accomplished if we weren't forced | |||||
to carry out this sideshow.</strong></p> | |||||
<p>I can't shake the feeling that a lot of the pain afflicted by the Python 3 | |||||
transition could have been avoided had Python's language leadership made | |||||
a different set of decisions and more highly prioritized the transition | |||||
experience. (Like not initially removing features like <code>u''</code> and <code>bytes %</code> | |||||
and not introducing gratuitous backwards compatibility breaks, like with | |||||
<code>items()/iteritems()</code>. I would have also liked to see a feature like | |||||
<code>from __future__</code> - maybe <code>from __past__</code> - that would make it easier for | |||||
Python 3 code to target semantics in earlier versions in order to provide | |||||
a more turnkey on-ramp onto new versions.) I simultaneously see Python 3 | |||||
losing its position as a justifiable tool in some domains (like systems | |||||
level tooling) due to ongoing design decisions and poor implementation (like | |||||
startup overhead problems). (In contrast, I see Rust excelling where Python | |||||
is faltering and find Rust code surprisingly expressive to write and maintain | |||||
given how low-level it is and therefore feel that Rust is a compelling | |||||
alternative to Python in a surprisingly large number of domains.)</p> | |||||
<p>Look, I know it is easy for me to armchair quarterback and critique with the | |||||
benefit of hindsight/ignorance. I'm sure there is a lot of nuance here. I'm | |||||
sure there was disagreement within the Python community over a lot of these | |||||
issues. Maintaining a large and successful programming language and community | |||||
like Python's is hard and you aren't going to please all the people all the | |||||
time. And speaking as a maintainer, I have mad respect for the people leading | |||||
such a large community. But niceties aside, everyone knows the Python 3 | |||||
transition was rough and could have gone better. It should not have taken 11 | |||||
years to get to where we are today.</p> | |||||
<p><strong>I'd like to encourage the Python Project to conduct a thorough postmortem on | |||||
the transition to Python 3.</strong> Identify what went well, what could have gone | |||||
better, and what should be done next time such a large language change is wanted. | |||||
Speaking as a Python user, a maintainer of a Python project, and as someone in | |||||
industry who is now skeptical about use of Python at work due to risks of | |||||
potentially company crippling high-effort migrations in the future, a postmortem | |||||
would help restore my confidence that Python's maintainers learned from the | |||||
various missteps on the road to Python 3 and these potentially ecosystem | |||||
crippling mistakes won't be made again.</p> | |||||
<p>Python had a wildly successful past few decades. And it can continue to | |||||
thrive for several more. But the Python 3 migration was painful for all | |||||
involved. And as much as we need to move on and leave Python 2 behind us, | |||||
there are some important lessons to be learned. I hope the Python community | |||||
takes the opportunity to reflect and am confident it will grow stronger by | |||||
taking the time to do so.</p> |
<li><a href="/david/cache/2020/17aa5580eb34f39f214e4a72458c535e/" title="Accès à l'article caché">Thinking about the past, present, and future of web development</a> (<a href="https://www.baldurbjarnason.com/past-present-future-web/" title="Accès à l'article original">original</a>)</li> | <li><a href="/david/cache/2020/17aa5580eb34f39f214e4a72458c535e/" title="Accès à l'article caché">Thinking about the past, present, and future of web development</a> (<a href="https://www.baldurbjarnason.com/past-present-future-web/" title="Accès à l'article original">original</a>)</li> | ||||
<li><a href="/david/cache/2020/67c8c54b07137bcfc0069fccd8261b53/" title="Accès à l'article caché">Mercurial's Journey to and Reflections on Python 3</a> (<a href="https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journey-to-and-reflections-on-python-3/" title="Accès à l'article original">original</a>)</li> | |||||
<li><a href="/david/cache/2020/82e58e715a4ddb17b2f9e2a023005b1a/" title="Accès à l'article caché">Wordsmiths | Getting Real</a> (<a href="https://basecamp.com/gettingreal/08.6-wordsmiths" title="Accès à l'article original">original</a>)</li> | <li><a href="/david/cache/2020/82e58e715a4ddb17b2f9e2a023005b1a/" title="Accès à l'article caché">Wordsmiths | Getting Real</a> (<a href="https://basecamp.com/gettingreal/08.6-wordsmiths" title="Accès à l'article original">original</a>)</li> | ||||
<li><a href="/david/cache/2020/c1c53ee2ef8544ad798629bf8a3b7249/" title="Accès à l'article caché">Thinking about Climate on a Dark, Dismal Morning</a> (<a href="https://blogs.scientificamerican.com/hot-planet/thinking-about-climate-on-a-dark-dismal-morning/" title="Accès à l'article original">original</a>)</li> | <li><a href="/david/cache/2020/c1c53ee2ef8544ad798629bf8a3b7249/" title="Accès à l'article caché">Thinking about Climate on a Dark, Dismal Morning</a> (<a href="https://blogs.scientificamerican.com/hot-planet/thinking-about-climate-on-a-dark-dismal-morning/" title="Accès à l'article original">original</a>)</li> |