|
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950 |
- title: Making mistakes
- url: https://www.b-list.org/weblog/2018/mar/20/mistakes/
- hash_url: 7306b2b46708f143bb25dc81d79879fe
-
- <p>A couple weeks ago when I was writing what became <a href="/projects/pwned-passwords-django/">pwned-passwords-django</a>, I tweeted about <a href="https://twitter.com/ubernostrum/status/970788650144301056">a weird issue I was seeing when running the tests for part of it</a>. As it turned out, I’d <a href="https://twitter.com/ubernostrum/status/970867689047511040">overlooked something important</a>, and the fix ended up being a one-line change. But that kicked off a little side discussion about the importance of being open about these kinds of “trivial” mistakes; it’s easy for newer or less-confident programmers to do something like that and start feeling stupid, unqualified and all sorts of other bad things. But experienced programmers — even, in this case, someone who’s allegedly an expert on the piece of technology in question — make those kinds of mistakes, too, and it’s good to make sure people are aware of this and that messing up, even in ways that feel dumb, is a thing that happens to everyone.</p>
- <p><a href="https://twitter.com/ubernostrum/status/944341636640784387">Especially to me</a>.</p>
- <p>I mentioned in a reply in that Twitter thread that I used to keep a list of my own commits fixing “embarrassing” bugs or problems I’d created. Unfortunately I don’t have it around any more, and as I recall a lot of them were just typos that I had pushed too quickly into one of my own repos (if they were still only in my local copy of the code I’d usually just rebase them away, for sake of cleaner commit history). I did manage to dig up a few with a trawl through my commit history, though. First, to drive home the “everybody makes mistakes” point, here are a few quick hits:</p>
- <ul>
- <li><a href="https://github.com/ubernostrum/django-flashpolicies/commit/9686769e9d9f266ed2d6e7effbdcb620b8198e7e"><span class="dquo">“</span>That’s an embarrassing mistake”</a>. In <a href="/projects/django-flashpolicies/">django-flashpolicies</a>, a couple of templates used to generate error messages. I split them across multiple lines because I like 80-character line length (I often have multiple files open side-by-side, so conserving screen space is important), but forgot to actually do the thing that tells Python a string is multiline. Remember: string literals are permitted in lots of places by Python’s syntax, which is how docstrings work. The code in this commit later got changed to wrap the messages in parentheses, which is another option for expressing a one-line string across multiple lines of source code.</li>
- <li><a href="https://github.com/ubernostrum/b_list/commit/b55f8c4f60a57b4237ba35afc12e9da7d987f1bd"><span class="dquo">“</span>Should probably not break Django’s <abbr title="Cross-Site Request Forgery"><span class="caps">CSRF</span></abbr> system”</a>. That one happened last month, in the code that powers this blog. I was turning on the <code>Referrer-Policy</code> header, and forgot that Django’s <abbr title="Cross-Site Request Forgery"><span class="caps">CSRF</span></abbr> tools require the referer when you’re doing <span class="caps">HTTPS</span>. There is a reason for that, which I might go over in more detail in another post, since a lot of security-vulnerability reports to Django consist of misunderstanding how and why the <abbr title="Cross-Site Request Forgery"><span class="caps">CSRF</span></abbr> system works.</li>
- <li><a href="https://github.com/ubernostrum/b_list/commit/c1e41a71336e8dd80787661512375fea320c0927"><span class="dquo">“</span>I know what I’m doing.”</a> Another one from this site’s code. Here I just plain forgot how the syntax of Django template filters works. Yes, I sometimes forget even very basic things. Once, I forgot how to iterate the contents of a file in Python. In an interview.</li>
- <li><a href="https://github.com/ubernostrum/django-contact-form/commit/32b3b2dcfd6c882c4c12a450206d6eeeafe0e02b"><span class="dquo">“</span>is_valid(), not is_valid”</a>. Another “forgot how Django works” bug, this time in <a href="/projects/django-contact-form/">django-contact-form</a>. You check whether a form instance passed validation by calling the <code>is_valid()</code> method. I left off the parentheses, which doesn’t work.</li>
- <li><a href="https://github.com/ubernostrum/django-registration/commit/25b316ae1cc00d446aa3eb9c26f515a5d36cfb00"><span class="dquo">“</span>This will teach me to copy/paste as a starting point.”</a> This time, <a href="/projects/django-registration/">django-registration</a>. This is when I was writing the documentation for the multiple included registration workflows, and began the documentation for one by copy/pasting the documentation for another. Except I forgot to change the name and module in a couple places. I mention this one because I treat documentation issues as bugs, the same as if there’s an issue in code, and will push new releases just to get better/corrected documentation out there.</li>
- </ul>
- <p>These five are nice quick reminders that everybody makes these kinds of mistakes. But I also want to take some time to talk about a few others which meet the same criteria — they were simple, usually one-line problems with easy fixes — that also offer some useful lessons. I’m going to go over three of them in more detail, explaining not just what was wrong, but also how it came to be wrong and what kinds of changes to my own development practices could help avoid them in the future, since while everybody screws up sometimes on small scales, what really matters is how you respond after making one; if there’s something you <em>could</em> do to avoid making the same mistake twice, the appropriate response is to figure out what that is, and do it.</p>
- <h2>The non-functioning password checker</h2>
- <p>While I was building <a href="/projects/pwned-passwords-django/">pwned-passwords-django</a>, that initial problem I tweeted about was a strange test failure: <a href="http://pwned-passwords-django.readthedocs.io/en/1.1/middleware.html">the middleware that checks submitted passwords against Pwned Passwords</a> was working… but only on one version of Django.</p>
- <p>The test suite for the middleware was passing on Django 1.11, but <em>every</em> test for the middleware was failing on Django 2.0. That was a bit of a head-scratcher (and not helped by the fact that I was coming down with a cold at the time).</p>
- <p>If something works on one version of Django but not another, the first suspicion should be that something changed in Django. But the middleware just checks for the presence of certain keys (or rather, keys matching a certain regex) in the <code>POST</code> payload, and if it finds them runs their values through Pwned Passwords. The test failure told me that the appropriate keys weren’t being found and thus the Pwned Passwords <abbr title="Application Programming Interface"><span class="caps">API</span></abbr> wasn’t being hit, despite the tests issuing requests which should have caused the middleware to go into action (and which did cause it to do so on Django 1.11). I spent some time racking my brain, and scouring the Django release notes, for anything that could have changed in the <code>HttpRequest</code> object, or in <code>QueryDict</code> (the data structure Django uses to represent request payloads), between 1.11 and 2.0 to break the way I was iterating and examining the keys in <code>POST</code>, and came up empty.</p>
- <p>And then the light bulb went off: what if the problem <em>wasn’t</em> that the middleware was failing to find the right keys in the <code>POST</code> data? What if the problem was that the middleware wasn’t running at all? And that turned out to be the answer. From the very earliest days of Django, middleware was configured through the setting <code>MIDDLEWARE_CLASSES</code>. But back in Django 1.10, the middleware system got an overhaul. The deprecation/migration plan for moving from old to new middleware systems involved changing the name of the setting to just <code>MIDDLEWARE</code>. Django continued supporting both options — with the assumption that if you were using <code>MIDDLEWARE</code> you’d made the other necessary changes to keep your middleware working — but deprecated the use of <code>MIDDLEWARE_CLASSES</code>. And in Django 2.0, <code>MIDDLEWARE_CLASSES</code> stopped being supported.</p>
- <p>When I build Django applications for distribution and reuse, I package them with a standalone test-runner file that configures the minimal settings for the application, and invokes Django’s test framework. Then I hook it into <code>setup.py test</code> to provide a nice standard entry point for running the tests. Whenever I start work on a new application, I copy and paste the test runner script, tweak what needs tweaking, and then use it. Except I wrote that script back when <code>MIDDLEWARE_CLASSES</code> was still supported, and as of Django 2.0 it isn’t anymore. So the tests for pwned-passwords-django were setting up <code>MIDDLEWARE_CLASSES</code>, which worked on Django 1.11 but was ignored on 2.0, and as a result Django 2.0 didn’t see the middleware and didn’t invoke it. Mystery solved: I changed <code>MIDDLEWARE_CLASSES</code> to <code>MIDDLEWARE</code> in the test runner, and my tests passed on both 1.11 and 2.0.</p>
- <p>There are a few potential lessons here. The big one is that I probably should do a better job of maintaining my test-runner script. Checking at each new Django release to see if it needs any changes would be a good idea. <a href="https://docs.djangoproject.com/en/2.0/topics/checks/">Django’s system-check framework</a> did have a compatibility check for middleware in Django 1.11, but it only warns if you’ve used <em>both</em> setting names, rather than if you only used the deprecated one. Ordinarily another lesson would be to pay more attention to the system check — which does execute as part of the tests — but in this case it wasn’t catching the problem (a bit of quick research seems to indicate there’s a philosophical debate about whether and how the system check framework should handle these things).</p>
- <p>Another lesson is to pay attention to deprecation warnings; although the system check framework didn’t catch this, Django 1.11 <em>did</em> emit a <code>DeprecationWarning</code> any time it processed a request and found you weren’t using the <code>MIDDLEWARE</code> setting, but by default Python suppresses <code>DeprecationWarning</code>. I’ll be tweaking my test setup in the near future to <a href="https://docs.python.org/3.6/using/cmdline.html#cmdoption-w">toggle the warning filter</a> and make sure <code>DeprecationWarning</code> is visible in test output.</p>
- <h2>The non-functioning password checker, part two</h2>
- <p>Speaking of pwned-passwords-django, there’s a reason why it was already at version 1.1 when I publicly announced it, and the reason is that 1.0 had a serious bug.</p>
- <p>The 1.0 release was a bit rushed; I wanted to get a package together, with an <abbr title="Application Programming Interface"><span class="caps">API</span></abbr> I liked, as soon as possible, in order to claim the name on the Python Package Index (since there are quite a few packages doing things with Pwned Passwords). So 1.0 was an <abbr title="Application Programming Interface"><span class="caps">API</span></abbr>-stable release with passing tests… but it didn’t actually <em>work</em>.</p>
- <p>Testing packages that interact with online services can be a bit tricky. I use <a href="https://tox.readthedocs.io">tox</a> on my own laptop to run the tests for my applications, and also have <a href="https://travis-ci.org/">Travis <abbr title="Continuous Integration"><span class="caps">CI</span></abbr></a> configured to run them automatically on every commit I push to GitHub. Which is great up until I write something that has to interact with an external service over the internet. I maintain a couple packages that have this problem, and for pwned-passwords-django I decided <em>not</em> to have the tests use the real Pwned Passwords service. Instead, its test suite uses <a href="https://docs.python.org/3/library/unittest.mock.html">Python’s mock library</a> to simulate the Pwned Passwords <abbr title="Application Programming Interface"><span class="caps">API</span></abbr>, and just asserts that the code is issuing the right kinds of requests and doing the right kinds of things with the responses.</p>
- <p>You might be wondering at this point whether I ever did any <em>manual</em> testing against the live Pwned Passwords service, and the answer is: not until after I’d released the 1.0 package. And that was when I discovered that the core piece of code in the app — the function that actually makes a request to Pwned Passwords, and parses out the response — didn’t work.</p>
- <p>The bug was kind of a silly one. The Pwned Passwords <abbr title="Application Programming Interface"><span class="caps">API</span></abbr> involves passing around fragments of <span class="caps">SHA</span>-1 hashes, and Python’s <code>hashlib</code> generates hash digests with <em>lowercase</em> hexadecimal digits. But the Pwned Passwords <abbr title="Application Programming Interface"><span class="caps">API</span></abbr> uses <em>uppercase</em> hexadecimal digits. So, for example, when checking a password of <code>'swordfish'</code>, I’d be sending a hash prefix of <code>4f571</code> and looking for a response containing a suffix of <code>81dcaade980555f2ce6755ca425f00658be</code>. But Pwned Passwords would be returning a suffix of <code>81DCAADE980555F2CE6755CA425F00658BE</code>.</p>
- <p>Oops.</p>
- <p>The fix was <a href="https://github.com/ubernostrum/pwned-passwords-django/commit/75e15aafb407bc4c198a406e814a81ffb7ba12ca">one line in the actual code of the app, and a tweak to the mock setup</a> to ensure simulated responses came back with uppercase hexadecimal digits. And so pwned-passwords-django 1.1 came out one hour and 43 minutes after pwned-passwords-django 1.0.</p>
- <p>The lesson here is always do a test against the real service. Even if you don’t make it part of your runs-all-the-time automated test suite, you need to know that your code works when interacting with the real thing.</p>
- <p>I do go back and forth, though, on whether to have interaction with the real service in the test suite. In <a href="/projects/akismet/">the wrapper I maintain for the WordPress Akismet spam-detection service</a>, the test suite basically repeats itself: it runs one set of tests using <code>unittest.mock</code> to simulate the Akismet service, and another set of tests against the actual Akismet service. But Akismet provides direct <abbr title="Application Programming Interface"><span class="caps">API</span></abbr> support for testing, including parameters to tell the Akismet service you’re running tests, and specific sample inputs that are guaranteed to produce certain types of responses. As far as I know, Pwned Passwords doesn’t do that, so for now I’ll probably stick to using mocks.</p>
- <h2>The accidental type error</h2>
- <p>This one happened a few years back in <a href="/projects/django-flashpolicies/">django-flashpolicies</a>, and I wrote about it at the time, but it’s a good story so I’ll tell it again. For background: Flash (the browser plugin) has a same-origin rule for making requests, much like JavaScript, and you can override that by publishing an <abbr title="eXtensible Markup Language"><span class="caps">XML</span></abbr> file that describes a policy for whether and how you’d like to allow Flash to make cross-domain requests to your site. So django-flashpolicies is a Django application that knows how to generate the correct <abbr title="eXtensible Markup Language"><span class="caps">XML</span></abbr>, and provides a set of Django views for common types of policies as well as a Python class implementing the full policy specification and a view that can serialize it out to <abbr title="eXtensible Markup Language"><span class="caps">XML</span></abbr> and serve it.</p>
- <p>One of the things you can do in a Flash cross-domain policy is set a “meta-policy”, which controls whether or not a site can serve multiple policy files specifying additional or alternative sets of rules. While the official name for this is “meta-policy”, the element in the generated <abbr title="eXtensible Markup Language"><span class="caps">XML</span></abbr> is called <code>site-control</code>.</p>
- <p>In django-flashpolicies, <a href="http://django-flashpolicies.readthedocs.io/en/1.10/policies.html">the Policy class</a> has a method for setting meta-policy information, and in an older version, since the element in the <abbr title="eXtensible Markup Language"><span class="caps">XML</span></abbr> was named <code>site-control</code>, the method for setting it was called <code>site_control()</code>. Back in 2009, though, I did some refactoring of the <code>Policy</code> class, in order to enable a simpler method of generating the <abbr title="eXtensible Markup Language"><span class="caps">XML</span></abbr>, and renamed the method from <code>site_control()</code> to <code>metapolicy()</code> so that the class’ public <abbr title="Application Programming Interface"><span class="caps">API</span></abbr> would match the terminology of the prose version of Adobe’s policy-file specification.</p>
- <p>And when I did that, I introduced a bug in the tests, but I didn’t realize it at first because the tests continued to pass. In fact, it took me over a month to notice the problem, and when I did it was because I took a look at a test coverage report from <a href="https://coverage.readthedocs.io">coverage.py</a>, which pointed out there was a code path in the <code>Policy</code> class that wasn’t being exercised. That surprised me at first, because I knew there <em>was</em> a test for that. And then I looked at the test, and started to understand.</p>
- <p>The test in question was supposed to verify a case where an invalid option was passed in as the meta-policy value, and in that case the <code>metapolicy()</code> method is supposed to raise an exception. Specifically, it raises <code>TypeError</code>, on the grounds that the argument passed in was of the wrong type — not a meta-policy value. This is probably the wrong exception to use, and if I get around to bumping the major version before Flash gets retired by Adobe, I’ll change it to raise a <code>ValueError</code> instead.</p>
- <p>So the test was calling the method and passing an invalid value, and using <code>assertRaises</code> to make sure a <code>TypeError</code> was raised. But when I changed the method name from <code>site_control()</code> to <code>metapolicy()</code>, I’d missed changing the name in that test; it was still trying to call a method named <code>site_control()</code>. Now, calling a nonexistent method in Python should get an <code>AttributeError</code>, not a <code>TypeError</code>, but I’d compounded the problem by keeping an <em>attribute</em> named <code>site_control</code> to store the value that would end up in the <code>site-control</code> element, and it defaulted to <code>None</code>. So when this test tried to call <code>policy.site_control</code>, it was trying to call whatever value was stored in the <code>site_control</code> attribute. Which happened to be <code>None</code>, and that’s not a callable type in Python. So… a <code>TypeError</code> was raised, which is exactly what the test was expecting to happen (but for the wrong reason), which meant the test passed.</p>
- <p>The coverage report, though, revealed the issue and showed that the appropriate lines in the <code>metapolicy()</code> method were never being called during testing. I <a href="https://github.com/ubernostrum/django-flashpolicies/commit/8144babe3e5815e2903059e1a7616673d80a6a73">fixed the test</a>, and the coverage report came back clean.</p>
- <p>There are a couple of lessons here. One is always let your tools help you when you’re refactoring. I use an editor that’s perfectly capable of finding all references to a name and changing them for me, but when I renamed the <code>site_control()</code> method to <code>metapolicy()</code> <em>I didn’t use it</em>, and instead made the change manually. That was hubris — thinking I didn’t need the helpful automation — and I was appropriately punished for it, with coverage.py playing the role of the Greek chorus pronouncing my doom.</p>
- <p>Another lesson is the value of having multiple metrics available for your code. In this case, a passing test suite was actually a useless metric, because the problem was <em>in</em> the test suite (bugs in tests are kind of a theme here, aren’t they?). But the additional metric of the coverage report revealed the problem immediately. I’m generally not a fan of coverage for coverage’s sake, but I <em>am</em> a fan of coverage reports as a canary; unexpected changes to coverage are a useful sign that something’s gone wrong somewhere.</p>
- <p>Shortly after this bug, I also changed my testing setup to fail if the coverage report comes in under 100%. Again this isn’t about coverage for coverage’s sake; it’s about having a <em>loud</em> signal of something going wrong as opposed to a quiet, easy-to-ignore one. Remember, it took me more than a month to notice this problem — even though the evidence was there in the build output — because the coverage report didn’t affect the build status. I use a coverage metric as part of the <abbr title="Continuous Integration"><span class="caps">CI</span></abbr> build on all of my personal apps and libraries now, but I usually only turn it on once I’ve got the code and tests at a point where I trust them, rather than just rushing to get to 100% coverage as quickly as possible.</p>
- <h2>Everybody falls down sometimes</h2>
- <p>Finally, since all of the examples above were from my own personal projects (which mostly are single-person efforts, plus the occasional helpful pull request from a stranger), I want to mention something about Django itself.</p>
- <p>For the last few years I’ve given occasional conference talks and tutorials on Django and web application security. The tutorial versions (for those who aren’t familiar with this: tutorials at PyCon and DjangoCon are in-depth three-hour sessions on particular topics; I’ve given the security tutorial once at DjangoCon <span class="caps">US</span> and once at PyCon <span class="caps">US</span>) spend much of their time going over categories of security issues and ways Django and Python can help protect you from them, or at least give you tools to mitigate them. The rest of the material — and the main focus of the shorter talk-length versions — is a historical tour of how Django’s security policies developed, and themes in the security issues that have been disclosed in Django over the years.</p>
- <p>I like giving that talk because Django apparently has something of a reputation for security, and it’s good to talk about how you can build a good reputation while still making mistakes. When I gave the tutorial at PyCon 2017, I crunched the numbers and found that Django had averaged one security issue or advisory every 66 days over the preceding eleven years. For the record, I think the good reputation mostly stems from our having public security policies; responding quickly to reported issues; and the amount of work that’s gone into Django itself to try to protect you out of the box from some of the most common security issues in web applications, and to fix any mistakes the Django team makes in the framework itself. We fall down, a lot, and what matters most is that we get back up and learn from it.</p>
- <p>Mistakes are opportunities to learn and grow, and <em>that’s</em> the way we should talk about them. Even when I make “silly” mistakes like forgetting how filters work in a Django template, there’s something I can learn from that: have the documentation open while I’m working! Especially if I’m working on something I don’t do often, like writing templates. And these days, I do that: even though I’ve literally written books on how to use Django, I still always have at least a couple tabs open to Python or Django documentation while I’m writing code, because I’m human and I forget things sometimes.</p>
- <p>And that really is the lesson I want to get across today. Your feeling about yourself as a programmer should not depend on never making a mistake. You’re human, as are we all, and we all mess up from time to time. Sometimes we even forget really basic things that make us feel embarrassed afterward; I certainly do that. But feeling embarrassed, or stupid, or unqualified, even though it’s natural, needs to not be the only response. <em>Everybody</em> makes mistakes. What marks the good people is how they react and learn afterward.</p>
|