|
|
- title: Making 1 million requests with python-aiohttp
- url: http://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html
- hash_url: 44abc2fd416f673b9c6ae0f5147726a9
-
- <p>In this post I’d like to test limits of <a href="http://aiohttp.readthedocs.org/en/stable/">python aiohttp</a> and check its performance in
- terms of requests per minute. Everyone knows that asynchronous code performs
- better when applied to network operations, but it’s still interesting to check this
- assumption and understand how exactly it is better and why it’s is better. I’m going
- to check it by trying to make 1 million requests with aiohttp client. How many requests per minute will aiohttp make?
- What kind of exceptions and crashes can you expect when you try to make such volume
- of requests with very primitive scripts? What are main gotchas that you need
- to think about when trying to make such volume of requests?</p>
-
- <h2 id="hello-asyncioaiohttp">Hello asyncio/aiohttp</h2>
-
- <p>Async programming is not easy. It’s not easy because using callbacks and thinking in terms of events
- and event handlers requires more effort than usual synchronous programming. But
- it is also difficult because asyncio is still relatively new and there are few
- blog posts, tutorials about it. <a href="https://docs.python.org/3/library/asyncio.html">Official docs</a>
- are very terse and contain only basic examples. There are some Stack Overflow questions
- but not <a href="http://stackoverflow.com/questions/tagged/python-asyncio?sort=votes&pageSize=50">that many</a>
- only 410 as of time of writing (compare with <a href="http://stackoverflow.com/questions/tagged/twisted">2 585 questions tagged “twisted”</a>)
- There are couple of nice blog posts and articles about asyncio
- over there such as <a href="http://aosabook.org/en/500L/a-web-crawler-with-asyncio-coroutines.html">this</a>,
- <a href="http://www.snarky.ca/how-the-heck-does-async-await-work-in-python-3-5">that</a>, <a href="http://sahandsaba.com/understanding-asyncio-node-js-python-3-4.html">that</a> or perhaps even <a href="https://community.nitrous.io/tutorials/asynchronous-programming-with-python-3">this</a>
- or <a href="https://compiletoi.net/fast-scraping-in-python-with-asyncio/">this</a>.</p>
-
- <p>To make it easier let’s start with the basics - simple HTTP hello world -
- just making GET and fetching one single HTTP response.</p>
-
- <p>In synchronous world you just do:</p>
-
- <pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">requests</span>
-
- <span class="k">def</span> <span class="nf">hello</span><span class="p">()</span>
- <span class="k">return</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">"http://httpbin.org/get"</span><span class="p">)</span>
-
- <span class="k">print</span><span class="p">(</span><span class="n">hello</span><span class="p">())</span></code></pre>
-
- <p>How does that look in aiohttp?</p>
-
- <pre><code class="language-python" data-lang="python"><span class="c">#!/usr/local/bin/python3.5</span>
- <span class="kn">import</span> <span class="nn">asyncio</span>
- <span class="kn">from</span> <span class="nn">aiohttp</span> <span class="kn">import</span> <span class="n">ClientSession</span>
-
- <span class="n">async</span> <span class="k">def</span> <span class="nf">hello</span><span class="p">():</span>
- <span class="n">async</span> <span class="k">with</span> <span class="n">ClientSession</span><span class="p">()</span> <span class="k">as</span> <span class="n">session</span><span class="p">:</span>
- <span class="n">async</span> <span class="k">with</span> <span class="n">session</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">"http://httpbin.org/headers"</span><span class="p">)</span> <span class="k">as</span> <span class="n">response</span><span class="p">:</span>
- <span class="n">response</span> <span class="o">=</span> <span class="n">await</span> <span class="n">response</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
- <span class="k">print</span><span class="p">(</span><span class="n">response</span><span class="p">)</span>
-
- <span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span>
-
- <span class="n">loop</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">hello</span><span class="p">())</span></code></pre>
-
- <p>hmm looks like I had to write lots of code for such a basic task… There is “async def” and “async with” and two “awaits” here. It
- seems really confusing at first sight, let’s try to explain it then.</p>
-
- <p>You make your function asynchronous by using <a href="https://www.python.org/dev/peps/pep-0492/#await-expression">async keyword</a> before function definition and using await
- keyword. There are actually two asynchronous operations that our hello() function performs. First
- it fetches response asynchronously, then it reads response body in asynchronous manner.</p>
-
- <p>Aiohttp recommends to use ClientSession as primary interface to make requests. ClientSession
- allows you to store cookies between requests and keeps objects that are common for
- all requests (event loop, connection and other things). Session needs to be closed after using it,
- and closing session is another asynchronous operation, this is why you need <a href="https://www.python.org/dev/peps/pep-0492/#asynchronous-context-managers-and-async-with"><code class="highlighter-rouge">async with</code></a>
- every time you deal with sessions.</p>
-
- <p>After you open client session you can use it to make requests. This is where another asynchronous
- operation starts, downloading request. Just as in case of client sessions responses must be closed
- explicitly, and context manager’s <code class="highlighter-rouge">with</code> statement ensures it will be closed properly in all
- circumstances.</p>
-
- <p>To start your program you need to run it in event loop, so you need to create instance of asyncio
- loop and put task into this loop.</p>
-
- <p>It all does sound bit difficult but it’s not that complex and looks logical if you spend
- some time trying to understand it.</p>
-
- <h2 id="fetch-multiple-urls">Fetch multiple urls</h2>
-
- <p>Now let’s try to do something more interesting, fetching multiple urls one after another.
- With synchronous code you would do just:</p>
-
- <pre><code class="language-python" data-lang="python"><span class="k">for</span> <span class="n">url</span> <span class="ow">in</span> <span class="n">urls</span><span class="p">:</span>
- <span class="k">print</span><span class="p">(</span><span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span><span class="o">.</span><span class="n">text</span><span class="p">)</span></code></pre>
-
- <p>This is really quick and easy, async will not be that easy, so you should always consider if something more complex
- is actually necessary for your needs. If your app works nice with synchronous code maybe there
- is no need to bother with async code? If you do need to bother with async code here’s how you do
- that. Our <code class="highlighter-rouge">hello()</code> async function stays the same but we need to wrap it in asyncio <a href="https://docs.python.org/3/library/asyncio-task.html#future"><code class="highlighter-rouge">Future</code></a> object
- and pass whole lists of Future objects as tasks to be executed in the loop.</p>
-
- <pre><code class="language-python" data-lang="python"><span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span>
-
- <span class="n">tasks</span> <span class="o">=</span> <span class="p">[]</span>
- <span class="c"># I'm using test server localhost, but you can use any url</span>
- <span class="n">url</span> <span class="o">=</span> <span class="s">"http://localhost:8080/{}"</span>
- <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
- <span class="n">task</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">ensure_future</span><span class="p">(</span><span class="n">hello</span><span class="p">(</span><span class="n">url</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">i</span><span class="p">)))</span>
- <span class="n">tasks</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">task</span><span class="p">)</span>
- <span class="n">loop</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">asyncio</span><span class="o">.</span><span class="n">wait</span><span class="p">(</span><span class="n">tasks</span><span class="p">))</span></code></pre>
-
- <p>Now let’s say we want to collect all responses in one list and do some
- postprocessing on them. At the moment we’re not keeping response body
- anywhere, we just print it, let’s return this response, keep it in list, and
- print all responses at the end.</p>
-
- <p>To collect bunch of responses you probably need to write something along the lines of:</p>
-
- <pre><code class="language-python" data-lang="python"><span class="c">#!/usr/local/bin/python3.5</span>
- <span class="kn">import</span> <span class="nn">asyncio</span>
- <span class="kn">from</span> <span class="nn">aiohttp</span> <span class="kn">import</span> <span class="n">ClientSession</span>
-
- <span class="n">async</span> <span class="k">def</span> <span class="nf">fetch</span><span class="p">(</span><span class="n">url</span><span class="p">):</span>
- <span class="n">async</span> <span class="k">with</span> <span class="n">ClientSession</span><span class="p">()</span> <span class="k">as</span> <span class="n">session</span><span class="p">:</span>
- <span class="n">async</span> <span class="k">with</span> <span class="n">session</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="k">as</span> <span class="n">response</span><span class="p">:</span>
- <span class="k">return</span> <span class="n">await</span> <span class="n">response</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
-
- <span class="n">async</span> <span class="k">def</span> <span class="nf">run</span><span class="p">(</span><span class="n">loop</span><span class="p">,</span> <span class="n">r</span><span class="p">):</span>
- <span class="n">url</span> <span class="o">=</span> <span class="s">"http://localhost:8080/{}"</span>
- <span class="n">tasks</span> <span class="o">=</span> <span class="p">[]</span>
- <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">r</span><span class="p">):</span>
- <span class="n">task</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">ensure_future</span><span class="p">(</span><span class="n">fetch</span><span class="p">(</span><span class="n">url</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">i</span><span class="p">)))</span>
- <span class="n">tasks</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">task</span><span class="p">)</span>
-
- <span class="n">responses</span> <span class="o">=</span> <span class="n">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">gather</span><span class="p">(</span><span class="o">*</span><span class="n">tasks</span><span class="p">)</span>
- <span class="c"># you now have all response bodies in this variable</span>
- <span class="k">print</span><span class="p">(</span><span class="n">responses</span><span class="p">)</span>
-
- <span class="k">def</span> <span class="nf">print_responses</span><span class="p">(</span><span class="n">result</span><span class="p">):</span>
- <span class="k">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
-
- <span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span>
- <span class="n">future</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">ensure_future</span><span class="p">(</span><span class="n">run</span><span class="p">(</span><span class="n">loop</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
- <span class="n">loop</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">future</span><span class="p">)</span></code></pre>
-
- <p>Notice usage of <a href="https://docs.python.org/3/library/asyncio-task.html#asyncio.gather"><code class="highlighter-rouge">asyncio.gather()</code></a>, this collects bunch of Future objects in one place
- and waits for all of them to finish.</p>
-
- <h3 id="common-gotchas">Common gotchas</h3>
-
- <p>Now let’s simulate real process of learning and let’s make mistake in above script and try to debug it,
- this should be really helpful for demonstration purposes.</p>
-
- <p>This is how sample broken async function looks like:</p>
-
- <pre><code class="language-python" data-lang="python"><span class="c"># WARNING! BROKEN CODE DO NOT COPY PASTE</span>
- <span class="n">async</span> <span class="k">def</span> <span class="nf">fetch</span><span class="p">(</span><span class="n">url</span><span class="p">):</span>
- <span class="n">async</span> <span class="k">with</span> <span class="n">ClientSession</span><span class="p">()</span> <span class="k">as</span> <span class="n">session</span><span class="p">:</span>
- <span class="n">async</span> <span class="k">with</span> <span class="n">session</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="k">as</span> <span class="n">response</span><span class="p">:</span>
- <span class="k">return</span> <span class="n">response</span><span class="o">.</span><span class="n">read</span><span class="p">()</span></code></pre>
-
- <p>This code is broken, but it’s not that easy to figure out why
- if you dont know much about asyncio. Even if you know Python well but you dont
- know asyncio or aiohttp well you’ll be in trouble to figure out what happens.</p>
-
- <p>What is output of above function?</p>
-
- <p>It produces following output:</p>
-
- <pre><code class="language-python" data-lang="python"><span class="n">pawel</span><span class="nd">@pawel</span><span class="o">-</span><span class="n">VPCEH390X</span> <span class="o">~/</span><span class="n">p</span><span class="o">/</span><span class="n">l</span><span class="o">/</span><span class="n">benchmarker</span><span class="o">></span> <span class="o">./</span><span class="n">bench</span><span class="o">.</span><span class="n">py</span>
- <span class="p">[</span><span class="o"><</span><span class="n">generator</span> <span class="nb">object</span> <span class="n">ClientResponse</span><span class="o">.</span><span class="n">read</span> <span class="n">at</span> <span class="mh">0x7fa68d465728</span><span class="o">></span><span class="p">,</span> <span class="o"><</span><span class="n">generator</span> <span class="nb">object</span> <span class="n">ClientResponse</span><span class="o">.</span><span class="n">read</span> <span class="n">at</span> <span class="mh">0x7fa68cdd9468</span><span class="o">></span><span class="p">,</span> <span class="o"><</span><span class="n">generator</span> <span class="nb">object</span> <span class="n">ClientResponse</span><span class="o">.</span><span class="n">read</span> <span class="n">at</span> <span class="mh">0x7fa68d4656d0</span><span class="o">></span><span class="p">,</span> <span class="o"><</span><span class="n">generator</span> <span class="nb">object</span> <span class="n">ClientResponse</span><span class="o">.</span><span class="n">read</span> <span class="n">at</span> <span class="mh">0x7fa68cdd9af0</span><span class="o">></span><span class="p">]</span></code></pre>
-
- <p>What happens here? You expected to get response objects after all processing is done, but here you actually get
- bunch of generators, why is that?</p>
-
- <p>It happens because as I’ve mentioned earlier <code class="highlighter-rouge">response.read()</code> is async
- operation, this means that it does not return result immediately, it just returns generator.
- This generator still needs to be called and
- executed, and this does not happen by default, <code class="highlighter-rouge">yield from</code> in Python 3.4 and <code class="highlighter-rouge">await</code> in Python 3.5 were
- added exactly for this purpose: to actually iterate over generator function. Fix to above error
- is just adding await before <code class="highlighter-rouge">response.read()</code>.</p>
-
- <pre><code class="language-python" data-lang="python"> <span class="c"># async operation must be preceded by await </span>
- <span class="k">return</span> <span class="n">await</span> <span class="n">response</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="c"># NOT: return response.read()</span></code></pre>
-
- <p>Let’s break our code in some other way.</p>
-
- <pre><code class="language-python" data-lang="python"><span class="c"># WARNING! BROKEN CODE DO NOT COPY PASTE</span>
- <span class="n">async</span> <span class="k">def</span> <span class="nf">run</span><span class="p">(</span><span class="n">loop</span><span class="p">,</span> <span class="n">r</span><span class="p">):</span>
- <span class="n">url</span> <span class="o">=</span> <span class="s">"http://localhost:8080/{}"</span>
- <span class="n">tasks</span> <span class="o">=</span> <span class="p">[]</span>
- <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">r</span><span class="p">):</span>
- <span class="n">task</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">ensure_future</span><span class="p">(</span><span class="n">fetch</span><span class="p">(</span><span class="n">url</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">i</span><span class="p">)))</span>
- <span class="n">tasks</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">task</span><span class="p">)</span>
-
- <span class="n">responses</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">gather</span><span class="p">(</span><span class="o">*</span><span class="n">tasks</span><span class="p">)</span>
- <span class="k">print</span><span class="p">(</span><span class="n">responses</span><span class="p">)</span></code></pre>
-
- <p>Again above code is broken but it’s not easy to figure out why if you’re just
- learning asyncio.</p>
-
- <p>Above produces following output:</p>
-
- <pre><code class="language-python" data-lang="python"><span class="n">pawel</span><span class="nd">@pawel</span><span class="o">-</span><span class="n">VPCEH390X</span> <span class="o">~/</span><span class="n">p</span><span class="o">/</span><span class="n">l</span><span class="o">/</span><span class="n">benchmarker</span><span class="o">></span> <span class="o">./</span><span class="n">bench</span><span class="o">.</span><span class="n">py</span>
- <span class="o"><</span><span class="n">_GatheringFuture</span> <span class="n">pending</span><span class="o">></span>
- <span class="n">Task</span> <span class="n">was</span> <span class="n">destroyed</span> <span class="n">but</span> <span class="n">it</span> <span class="ow">is</span> <span class="n">pending</span><span class="err">!</span>
- <span class="n">task</span><span class="p">:</span> <span class="o"><</span><span class="n">Task</span> <span class="n">pending</span> <span class="n">coro</span><span class="o">=<</span><span class="n">fetch</span><span class="p">()</span> <span class="n">running</span> <span class="n">at</span> <span class="o">./</span><span class="n">bench</span><span class="o">.</span><span class="n">py</span><span class="p">:</span><span class="mi">7</span><span class="o">></span> <span class="n">wait_for</span><span class="o">=<</span><span class="n">Future</span> <span class="n">pending</span> <span class="n">cb</span><span class="o">=</span><span class="p">[</span><span class="n">Task</span><span class="o">.</span><span class="n">_wakeup</span><span class="p">()]</span><span class="o">></span> <span class="n">cb</span><span class="o">=</span><span class="p">[</span><span class="n">gather</span><span class="o">.<</span><span class="nb">locals</span><span class="o">>.</span><span class="n">_done_callback</span><span class="p">(</span><span class="mi">0</span><span class="p">)()</span> <span class="n">at</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">local</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">python3</span><span class="o">.</span><span class="mi">5</span><span class="o">/</span><span class="n">asyncio</span><span class="o">/</span><span class="n">tasks</span><span class="o">.</span><span class="n">py</span><span class="p">:</span><span class="mi">602</span><span class="p">]</span><span class="o">></span>
- <span class="n">Task</span> <span class="n">was</span> <span class="n">destroyed</span> <span class="n">but</span> <span class="n">it</span> <span class="ow">is</span> <span class="n">pending</span><span class="err">!</span>
- <span class="n">task</span><span class="p">:</span> <span class="o"><</span><span class="n">Task</span> <span class="n">pending</span> <span class="n">coro</span><span class="o">=<</span><span class="n">fetch</span><span class="p">()</span> <span class="n">running</span> <span class="n">at</span> <span class="o">./</span><span class="n">bench</span><span class="o">.</span><span class="n">py</span><span class="p">:</span><span class="mi">7</span><span class="o">></span> <span class="n">wait_for</span><span class="o">=<</span><span class="n">Future</span> <span class="n">pending</span> <span class="n">cb</span><span class="o">=</span><span class="p">[</span><span class="n">Task</span><span class="o">.</span><span class="n">_wakeup</span><span class="p">()]</span><span class="o">></span> <span class="n">cb</span><span class="o">=</span><span class="p">[</span><span class="n">gather</span><span class="o">.<</span><span class="nb">locals</span><span class="o">>.</span><span class="n">_done_callback</span><span class="p">(</span><span class="mi">1</span><span class="p">)()</span> <span class="n">at</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">local</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">python3</span><span class="o">.</span><span class="mi">5</span><span class="o">/</span><span class="n">asyncio</span><span class="o">/</span><span class="n">tasks</span><span class="o">.</span><span class="n">py</span><span class="p">:</span><span class="mi">602</span><span class="p">]</span><span class="o">></span>
- <span class="n">Task</span> <span class="n">was</span> <span class="n">destroyed</span> <span class="n">but</span> <span class="n">it</span> <span class="ow">is</span> <span class="n">pending</span><span class="err">!</span>
- <span class="n">task</span><span class="p">:</span> <span class="o"><</span><span class="n">Task</span> <span class="n">pending</span> <span class="n">coro</span><span class="o">=<</span><span class="n">fetch</span><span class="p">()</span> <span class="n">running</span> <span class="n">at</span> <span class="o">./</span><span class="n">bench</span><span class="o">.</span><span class="n">py</span><span class="p">:</span><span class="mi">7</span><span class="o">></span> <span class="n">wait_for</span><span class="o">=<</span><span class="n">Future</span> <span class="n">pending</span> <span class="n">cb</span><span class="o">=</span><span class="p">[</span><span class="n">Task</span><span class="o">.</span><span class="n">_wakeup</span><span class="p">()]</span><span class="o">></span> <span class="n">cb</span><span class="o">=</span><span class="p">[</span><span class="n">gather</span><span class="o">.<</span><span class="nb">locals</span><span class="o">>.</span><span class="n">_done_callback</span><span class="p">(</span><span class="mi">2</span><span class="p">)()</span> <span class="n">at</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">local</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">python3</span><span class="o">.</span><span class="mi">5</span><span class="o">/</span><span class="n">asyncio</span><span class="o">/</span><span class="n">tasks</span><span class="o">.</span><span class="n">py</span><span class="p">:</span><span class="mi">602</span><span class="p">]</span><span class="o">></span>
- <span class="n">Task</span> <span class="n">was</span> <span class="n">destroyed</span> <span class="n">but</span> <span class="n">it</span> <span class="ow">is</span> <span class="n">pending</span><span class="err">!</span>
- <span class="n">task</span><span class="p">:</span> <span class="o"><</span><span class="n">Task</span> <span class="n">pending</span> <span class="n">coro</span><span class="o">=<</span><span class="n">fetch</span><span class="p">()</span> <span class="n">running</span> <span class="n">at</span> <span class="o">./</span><span class="n">bench</span><span class="o">.</span><span class="n">py</span><span class="p">:</span><span class="mi">7</span><span class="o">></span> <span class="n">wait_for</span><span class="o">=<</span><span class="n">Future</span> <span class="n">pending</span> <span class="n">cb</span><span class="o">=</span><span class="p">[</span><span class="n">Task</span><span class="o">.</span><span class="n">_wakeup</span><span class="p">()]</span><span class="o">></span> <span class="n">cb</span><span class="o">=</span><span class="p">[</span><span class="n">gather</span><span class="o">.<</span><span class="nb">locals</span><span class="o">>.</span><span class="n">_done_callback</span><span class="p">(</span><span class="mi">3</span><span class="p">)()</span> <span class="n">at</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">local</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">python3</span><span class="o">.</span><span class="mi">5</span><span class="o">/</span><span class="n">asyncio</span><span class="o">/</span><span class="n">tasks</span><span class="o">.</span><span class="n">py</span><span class="p">:</span><span class="mi">602</span><span class="p">]</span><span class="o">></span></code></pre>
-
- <p>What happens here? If you examine your localhost logs you may see that requests are not reaching
- your server at all. Clearly no requests are performed. Print statement prints that
- responses variable contains <code class="highlighter-rouge"><_GatheringFuture pending></code> object, and later it alerts that
- pending tasks were destroyed. Why is it happening? Again you forgot about await</p>
-
- <p>faulty line is this</p>
-
- <pre><code class="language-python" data-lang="python"> <span class="n">responses</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">gather</span><span class="p">(</span><span class="o">*</span><span class="n">tasks</span><span class="p">)</span></code></pre>
-
- <p>it should be:</p>
-
- <pre><code class="language-python" data-lang="python"> <span class="n">responses</span> <span class="o">=</span> <span class="n">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">gather</span><span class="p">(</span><span class="o">*</span><span class="n">tasks</span><span class="p">)</span></code></pre>
-
- <p>I guess main lesson from those mistakes is: always remember about using “await” if
- you’re actually awaiting something.</p>
-
- <h2 id="sync-vs-async">Sync vs Async</h2>
-
- <p>Finally time for some fun. Let’s check if async is really worth the hassle. What’s the
- difference in efficiency between asynchronous client and blocking client? How many
- requests per minute can I send with my async client?</p>
-
- <p>With this questions in mind I set up simple (async) aiohttp server.
- My server is going to read full html text of Frankenstein by Marry Shelley. It will
- add random delays between responses. Some responses will have zero delay, and some will
- have maximum of 3 seconds delay. This should resemble real applications, few
- apps respond to all requests with same latency, usually latency differs
- from response to response.</p>
-
- <p>Server code looks like this:</p>
-
- <pre><code class="language-python" data-lang="python"><span class="c">#!/usr/local/bin/python3.5</span>
- <span class="kn">import</span> <span class="nn">asyncio</span>
- <span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span>
- <span class="kn">from</span> <span class="nn">aiohttp</span> <span class="kn">import</span> <span class="n">web</span>
- <span class="kn">import</span> <span class="nn">random</span>
-
- <span class="c"># set seed to ensure async and sync client get same distribution of delay values</span>
- <span class="c"># and tests are fair</span>
- <span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
-
- <span class="n">async</span> <span class="k">def</span> <span class="nf">hello</span><span class="p">(</span><span class="n">request</span><span class="p">):</span>
- <span class="n">name</span> <span class="o">=</span> <span class="n">request</span><span class="o">.</span><span class="n">match_info</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">"name"</span><span class="p">,</span> <span class="s">"foo"</span><span class="p">)</span>
- <span class="n">n</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">()</span><span class="o">.</span><span class="n">isoformat</span><span class="p">()</span>
- <span class="n">delay</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
- <span class="n">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">delay</span><span class="p">)</span>
- <span class="n">headers</span> <span class="o">=</span> <span class="p">{</span><span class="s">"content_type"</span><span class="p">:</span> <span class="s">"text/html"</span><span class="p">,</span> <span class="s">"delay"</span><span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="n">delay</span><span class="p">)}</span>
- <span class="c"># opening file is not async here, so it may block, to improve</span>
- <span class="c"># efficiency of this you can consider using asyncio Executors</span>
- <span class="c"># that will delegate file operation to separate thread or process</span>
- <span class="c"># and improve performance</span>
- <span class="c"># https://docs.python.org/3/library/asyncio-eventloop.html#executor</span>
- <span class="c"># https://pymotw.com/3/asyncio/executors.html</span>
- <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">"frank.html"</span><span class="p">,</span> <span class="s">"rb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">html_body</span><span class="p">:</span>
- <span class="k">print</span><span class="p">(</span><span class="s">"{}: {} delay: {}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">request</span><span class="o">.</span><span class="n">path</span><span class="p">,</span> <span class="n">delay</span><span class="p">))</span>
- <span class="n">response</span> <span class="o">=</span> <span class="n">web</span><span class="o">.</span><span class="n">Response</span><span class="p">(</span><span class="n">body</span><span class="o">=</span><span class="n">html_body</span><span class="o">.</span><span class="n">read</span><span class="p">(),</span> <span class="n">headers</span><span class="o">=</span><span class="n">headers</span><span class="p">)</span>
- <span class="k">return</span> <span class="n">response</span>
-
- <span class="n">app</span> <span class="o">=</span> <span class="n">web</span><span class="o">.</span><span class="n">Application</span><span class="p">()</span>
- <span class="n">app</span><span class="o">.</span><span class="n">router</span><span class="o">.</span><span class="n">add_route</span><span class="p">(</span><span class="s">"GET"</span><span class="p">,</span> <span class="s">"/{name}"</span><span class="p">,</span> <span class="n">hello</span><span class="p">)</span>
- <span class="n">web</span><span class="o">.</span><span class="n">run_app</span><span class="p">(</span><span class="n">app</span><span class="p">)</span></code></pre>
-
- <p>Synchronous client looks like this:</p>
-
- <pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">requests</span>
- <span class="n">r</span> <span class="o">=</span> <span class="mi">100</span>
-
- <span class="n">url</span> <span class="o">=</span> <span class="s">"http://localhost:8080/{}"</span>
- <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">r</span><span class="p">):</span>
- <span class="n">res</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">i</span><span class="p">))</span>
- <span class="n">delay</span> <span class="o">=</span> <span class="n">res</span><span class="o">.</span><span class="n">headers</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">"DELAY"</span><span class="p">)</span>
- <span class="n">d</span> <span class="o">=</span> <span class="n">res</span><span class="o">.</span><span class="n">headers</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">"DATE"</span><span class="p">)</span>
- <span class="k">print</span><span class="p">(</span><span class="s">"{}:{} delay {}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">d</span><span class="p">,</span> <span class="n">res</span><span class="o">.</span><span class="n">url</span><span class="p">,</span> <span class="n">delay</span><span class="p">))</span></code></pre>
-
- <p>How long will it take to run this?</p>
-
- <p>On my machine running above synchronous client took 2:45.54 minutes.</p>
-
- <p>My async code looks just like above code samples above. How long will async client take?</p>
-
- <p>On my machine it took 0:03.48 seconds.</p>
-
- <p>It is interesting that it took exactly as long as longest delay
- from my server. If you look into messages printed by client script you can see how
- great async HTTP client is. Some responses had 0 delay but others got 3 seconds delay. In synchronous client
- they would be blocking and waiting, your machine would simply stay idle for this time.
- Async client does not waste time, when something is delayed it simply does
- something else, issues other requests or processes all other responses. You can see this clearly in logs, first there
- are responses with 0 delay, then after they arrrived you can see responses with 1 seconds delay,
- and so on until most delayed responses arrive.</p>
-
- <h2 id="testing-the-limits">Testing the limits</h2>
-
- <p>Now that we know our async client is better let’s try to test its limits and try to crash our
- localhost. I’m going to start with sending 1k async requests. I’m curious how many requests
- my client can handle.</p>
-
- <pre><code class="language-bash" data-lang="bash"><span class="gp">> </span><span class="nb">time </span>python3 bench.py
-
- 2.68user 0.24system 0:07.14elapsed 40%CPU <span class="o">(</span>0avgtext+0avgdata 53704maxresident<span class="o">)</span>k
- 0inputs+0outputs <span class="o">(</span>0major+14156minor<span class="o">)</span>pagefaults 0swaps</code></pre>
-
- <p>So 1k requests take 7 seconds, pretty nice! How about 10k? Trying to make 10k requests
- unfortunately fails…</p>
-
- <pre><code class="language-python" data-lang="python"><span class="n">responses</span> <span class="n">are</span> <span class="o"><</span><span class="n">_GatheringFuture</span> <span class="n">finished</span> <span class="n">exception</span><span class="o">=</span><span class="n">ClientOSError</span><span class="p">(</span><span class="mi">24</span><span class="p">,</span> <span class="s">'Cannot connect to host localhost:8080 ssl:False [Can not connect to localhost:8080 [Too many open files]]'</span><span class="p">)</span><span class="o">></span>
- <span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
- <span class="n">File</span> <span class="s">"/home/pawel/.local/lib/python3.5/site-packages/aiohttp/connector.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">581</span><span class="p">,</span> <span class="ow">in</span> <span class="n">_create_connection</span>
- <span class="n">File</span> <span class="s">"/usr/local/lib/python3.5/asyncio/base_events.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">651</span><span class="p">,</span> <span class="ow">in</span> <span class="n">create_connection</span>
- <span class="n">File</span> <span class="s">"/usr/local/lib/python3.5/asyncio/base_events.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">618</span><span class="p">,</span> <span class="ow">in</span> <span class="n">create_connection</span>
- <span class="n">File</span> <span class="s">"/usr/local/lib/python3.5/socket.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">134</span><span class="p">,</span> <span class="ow">in</span> <span class="n">__init__</span>
- <span class="nb">OSError</span><span class="p">:</span> <span class="p">[</span><span class="n">Errno</span> <span class="mi">24</span><span class="p">]</span> <span class="n">Too</span> <span class="n">many</span> <span class="nb">open</span> <span class="n">files</span></code></pre>
-
- <p>That’s bad, seems like I stumbled across <a href="http://www.webcitation.org/6ICibHuyd">10k connections problem</a>.</p>
-
- <p>It says “too many open files”, and probably refers to number of open sockets.
- Why does it call them files? Sockets are just file descriptors, operating systems limit number of open sockets
- allowed. How many files are too many? I checked with python resource module and it seems like it’s around 1024.
- How can we bypass this? Primitive way is just increasing limit of open files. But this
- is probably not the good way to go. Much better way is just adding some synchronization
- in your client limiting number of concurrent requests it can process. I’m going to do this
- by adding <a href="https://docs.python.org/3/library/asyncio-sync.html#asyncio.Semaphore"><code class="highlighter-rouge">asyncio.Semaphore()</code></a> with max tasks of 1000.</p>
-
- <p>Modified client code looks like this now:</p>
-
- <pre><code class="language-python" data-lang="python"><span class="c"># modified fetch function with semaphore</span>
- <span class="kn">import</span> <span class="nn">random</span>
- <span class="kn">import</span> <span class="nn">asyncio</span>
- <span class="kn">from</span> <span class="nn">aiohttp</span> <span class="kn">import</span> <span class="n">ClientSession</span>
-
- <span class="n">async</span> <span class="k">def</span> <span class="nf">fetch</span><span class="p">(</span><span class="n">url</span><span class="p">):</span>
- <span class="n">async</span> <span class="k">with</span> <span class="n">ClientSession</span><span class="p">()</span> <span class="k">as</span> <span class="n">session</span><span class="p">:</span>
- <span class="n">async</span> <span class="k">with</span> <span class="n">session</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="k">as</span> <span class="n">response</span><span class="p">:</span>
- <span class="n">delay</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">headers</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">"DELAY"</span><span class="p">)</span>
- <span class="n">date</span> <span class="o">=</span> <span class="n">response</span><span class="o">.</span><span class="n">headers</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s">"DATE"</span><span class="p">)</span>
- <span class="k">print</span><span class="p">(</span><span class="s">"{}:{} with delay {}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">date</span><span class="p">,</span> <span class="n">response</span><span class="o">.</span><span class="n">url</span><span class="p">,</span> <span class="n">delay</span><span class="p">))</span>
- <span class="k">return</span> <span class="n">await</span> <span class="n">response</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
-
-
- <span class="n">async</span> <span class="k">def</span> <span class="nf">bound_fetch</span><span class="p">(</span><span class="n">sem</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
- <span class="c"># getter function with semaphore</span>
- <span class="n">async</span> <span class="k">with</span> <span class="n">sem</span><span class="p">:</span>
- <span class="n">await</span> <span class="n">fetch</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
-
-
- <span class="n">async</span> <span class="k">def</span> <span class="nf">run</span><span class="p">(</span><span class="n">loop</span><span class="p">,</span> <span class="n">r</span><span class="p">):</span>
- <span class="n">url</span> <span class="o">=</span> <span class="s">"http://localhost:8080/{}"</span>
- <span class="n">tasks</span> <span class="o">=</span> <span class="p">[]</span>
- <span class="c"># create instance of Semaphore</span>
- <span class="n">sem</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">Semaphore</span><span class="p">(</span><span class="mi">1000</span><span class="p">)</span>
- <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">r</span><span class="p">):</span>
- <span class="c"># pass Semaphore to every GET request</span>
- <span class="n">task</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">ensure_future</span><span class="p">(</span><span class="n">bound_fetch</span><span class="p">(</span><span class="n">sem</span><span class="p">,</span> <span class="n">url</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">i</span><span class="p">)))</span>
- <span class="n">tasks</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">task</span><span class="p">)</span>
-
- <span class="n">responses</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">gather</span><span class="p">(</span><span class="o">*</span><span class="n">tasks</span><span class="p">)</span>
- <span class="n">await</span> <span class="n">responses</span>
-
- <span class="n">number</span> <span class="o">=</span> <span class="mi">10000</span>
- <span class="n">loop</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">get_event_loop</span><span class="p">()</span>
-
- <span class="n">future</span> <span class="o">=</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">ensure_future</span><span class="p">(</span><span class="n">run</span><span class="p">(</span><span class="n">loop</span><span class="p">,</span> <span class="n">number</span><span class="p">))</span>
- <span class="n">loop</span><span class="o">.</span><span class="n">run_until_complete</span><span class="p">(</span><span class="n">future</span><span class="p">)</span></code></pre>
-
- <p>At this point I can process 10k urls. It takes 23 seconds and returns some exceptions but overall
- it’s pretty nice!</p>
-
- <p>How about 100 000? This really makes my computer work hard but suprisingly
- it works ok. Server turns out to be suprisingly stable although
- you can see that ram usage gets pretty high at this point, cpu usage is around
- 100% all the time. What I find interesting is that my server takes significantly less cpu than client.
- Here’s snapshot of linux <code class="highlighter-rouge">ps</code> output.</p>
-
- <pre><code class="language-python" data-lang="python"><span class="n">pawel</span><span class="nd">@pawel</span><span class="o">-</span><span class="n">VPCEH390X</span> <span class="o">~/</span><span class="n">p</span><span class="o">/</span><span class="n">l</span><span class="o">/</span><span class="n">benchmarker</span><span class="o">></span> <span class="n">ps</span> <span class="n">ua</span> <span class="o">|</span> <span class="n">grep</span> <span class="n">python</span>
-
- <span class="n">USER</span> <span class="n">PID</span> <span class="o">%</span><span class="n">CPU</span> <span class="o">%</span><span class="n">MEM</span> <span class="n">VSZ</span> <span class="n">RSS</span> <span class="n">TTY</span> <span class="n">STAT</span> <span class="n">START</span> <span class="n">TIME</span> <span class="n">COMMAND</span>
- <span class="n">pawel</span> <span class="mi">2447</span> <span class="mf">56.3</span> <span class="mf">1.0</span> <span class="mi">216124</span> <span class="mi">64976</span> <span class="n">pts</span><span class="o">/</span><span class="mi">9</span> <span class="n">Sl</span><span class="o">+</span> <span class="mi">21</span><span class="p">:</span><span class="mi">26</span> <span class="mi">1</span><span class="p">:</span><span class="mi">27</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">local</span><span class="o">/</span><span class="nb">bin</span><span class="o">/</span><span class="n">python3</span><span class="o">.</span><span class="mi">5</span> <span class="o">./</span><span class="n">test_server</span><span class="o">.</span><span class="n">py</span>
- <span class="n">pawel</span> <span class="mi">2527</span> <span class="mi">101</span> <span class="mf">3.5</span> <span class="mi">674732</span> <span class="mi">212076</span> <span class="n">pts</span><span class="o">/</span><span class="mi">0</span> <span class="n">Rl</span><span class="o">+</span> <span class="mi">21</span><span class="p">:</span><span class="mi">26</span> <span class="mi">2</span><span class="p">:</span><span class="mi">30</span> <span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">local</span><span class="o">/</span><span class="nb">bin</span><span class="o">/</span><span class="n">python3</span><span class="o">.</span><span class="mi">5</span> <span class="o">./</span><span class="n">bench</span><span class="o">.</span><span class="n">py</span></code></pre>
-
- <p>Overall it took around 5 minutes before it crashed for some
- reason. It generated around 100k lines of output so it’s not that easy
- to pinpoint traceback, seems like some responses are not closed, is it
- because of some error from my server or something in client?</p>
-
- <p>After scrolling for couple of seconds I found this exception in client logs.</p>
-
- <pre><code class="language-python" data-lang="python"> <span class="n">File</span> <span class="s">"/usr/local/lib/python3.5/asyncio/futures.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">387</span><span class="p">,</span> <span class="ow">in</span> <span class="n">__iter__</span>
- <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">result</span><span class="p">()</span> <span class="c"># May raise too.</span>
- <span class="n">File</span> <span class="s">"/usr/local/lib/python3.5/asyncio/futures.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">274</span><span class="p">,</span> <span class="ow">in</span> <span class="n">result</span>
- <span class="k">raise</span> <span class="bp">self</span><span class="o">.</span><span class="n">_exception</span>
- <span class="n">File</span> <span class="s">"/usr/local/lib/python3.5/asyncio/selector_events.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">411</span><span class="p">,</span> <span class="ow">in</span> <span class="n">_sock_connect</span>
- <span class="n">sock</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">address</span><span class="p">)</span>
- <span class="nb">OSError</span><span class="p">:</span> <span class="p">[</span><span class="n">Errno</span> <span class="mi">99</span><span class="p">]</span> <span class="n">Cannot</span> <span class="n">assign</span> <span class="n">requested</span> <span class="n">address</span></code></pre>
-
- <p>I dont really know what happens here. My initial hypothesis is that test server went down for some split second,
- and this caused some client error that was printed at the end. <a href="https://news.ycombinator.com/item?id=11557672">One of the readers</a>
- suggests that this exception may be caused by OS running out of free ephemereal ports. I added semaphore
- earlier so number of concurrent connections should be maximum 1k, but some sockets may still be
- closing and not available for kernel to assign.</p>
-
- <p>Overall it’s really not bad, 5 minutes for 100 000 requests? This makes around 20k
- requests per minute. Pretty powerful if you ask me.</p>
-
- <p>Finally I’m going to try 1 million requests. I really hope my laptop is not going
- to explode when testing that. For this amount of requests I reduced delays from server to range between 0 and 1.</p>
-
- <p>1 000 000 requests finished in 52 minutes</p>
-
- <pre><code class="language-bash" data-lang="bash">1913.06user 1196.09system 52:06.87elapsed 99%CPU <span class="o">(</span>0avgtext+0avgdata 5194260maxresident<span class="o">)</span>k
- 265144inputs+0outputs <span class="o">(</span>18692major+2528207minor<span class="o">)</span>pagefaults 0swaps</code></pre>
-
- <p>so it means my client made around 19230 requests per minute. Not bad isn’t it? Note that
- capabilities of my client are limited by server responding with delay of 0 and 1 in this
- scenario, seems like my test server also crashed silently couple of times.</p>
-
- <h2 id="epilogue">Epilogue</h2>
-
- <p>You can see that asynchronous HTTP clients can be pretty powerful. Performing
- 1 million requests from async client is not difficult, and the client performs really well in comparison
- to synchronous code.</p>
-
- <p>I wonder how it compares to other languages and async frameworks? Perhaps in some
- future post I could compare <a href="https://github.com/twisted/treq">Twisted Treq</a> with
- aiohttp. There is also question how many concurrent requests can be issued by
- async libraries in other languages. E.g. what would be results of benchmarks
- for some Java async frameworks? Or C++ frameworks? Or some Rust HTTP clients?</p>
-
- <h3 id="edits-24042016"><em>EDITS (24/04/2016)</em></h3>
-
- <ul>
- <li>improved code sample that uses Semaphore</li>
- <li>added comment about using executor when opening file</li>
- <li>added link to HN comment about EADDRNOTAVAIL exception</li>
- </ul>
|