title: Speed up your Python using Rust
url: https://developers.redhat.com/blog/2017/11/16/speed-python-using-rust/
hash_url: 6a75988501
Rust is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.
Featuring
Description is taken from rust-lang.org.
The better description of Rust I heard from Elias (a member of the Rust Brazil Telegram Group).
Rust is a language that allows you to build high level abstractions, but without giving up low-level control – that is, control of how data is represented in memory, control of which threading model you want to use etc.
Rust is a language that can usually detect, during compilation, the worst parallelism and memory management errors (such as accessing data on different threads without synchronization, or using data after they have been deallocated), but gives you a hatch escape in the case you really know what you’re doing.
Rust is a language that, because it has no runtime, can be used to integrate with any runtime; you can write a native extension in Rust that is called by a program node.js, or by a python program, or by a program in ruby, lua etc. and, however, you can script a program in Rust using these languages. — “Elias Gabriel Amaral da Silva”
There is a bunch of Rust packages out there to help you extending Python with Rust.
I can mention Milksnake created by Armin Ronacher (the creator of Flask) and also PyO3 The Rust bindings for Python interpreter.
Let’s see it in action
For this post, I am going to use Rust Cpython, it’s the only one I have tested, it is compatible with stable version of Rust and found it straightforward to use.
NOTE: PyO3 is a fork of rust-cpython, comes with many improvements, but works only with the nightly version of Rust, so I prefered to use the stable for this post, anyway the examples here must work also with PyO3.
Pros: It is easy to write Rust functions and import from Python and as you will see by the benchmarks it worth in terms of performance.
Cons: The distribution of your project/lib/framework will demand the Rust module to be compiled on the target system because of variation of environment and architecture, there will be a compiling stage which you don’t have when installing Pure Python libraries, you can make it easier using rust-setuptools or using the MilkSnake to embed binary data in Python Wheels.
Yes, Python is known for being “slow” in some cases and the good news is that this doesn’t really matter depending on your project goals and priorities. For most projects, this detail will not be very important.
However, you may face the rare case where a single function or module is taking too much time and is detected as the bottleneck of your project performance, often happens with string parsing and image processing.
Let’s say you have a Python function which does a string processing, take the following easy example of counting pairs of repeated chars
, but have in mind that this example can be reproduced with other string processing
functions or any other generally slow process in Python.
# How many subsequent-repeated group of chars are in the given string?
abCCdeFFghiJJklmnopqRRstuVVxyZZ… {millions of chars here}
1 2 3 4 5 6
Python is slow for doing large string
processing, so you can use pytest-benchmark
to compare a Pure Python (with Iterator Zipping)
function versus a Regexp
implementation.
# Using a Python3.6 environment
$ pip3 install pytest pytest-benchmark
Then write a new Python program called doubles.py
import re import string import random# Python ZIP version def count_doubles(val):
total <span class="pl-k">=</span> <span class="pl-c1">0
# there is an improved version later on this post
<span class="pl-k">for</span> c1, c2 <span class="pl-k">in</span> <span class="pl-c1">zip</span>(val, val[<span class="pl-c1">1</span>:]): <span class="pl-k">if</span> c1 <span class="pl-k">==</span> c2: total <span class="pl-k">+=</span> <span class="pl-c1">1</span> <span class="pl-k">return</span> total
# Python REGEXP version double_re = re.compile(r‘(?=(.)\1)‘)
def count_doubles_regex(val):
<span class="pl-k">return</span> <span class="pl-c1">len</span>(double_re.findall(val))
# Benchmark it # generate 1M of random letters to test it val = ‘‘.join(random.choice(string.ascii_letters) for i in range(1000000))
def test_pure_python(benchmark):
benchmark(count_doubles, val)
def test_regex(benchmark):
benchmark(count_doubles_regex, val)</pre>
Run pytest to compare:
$ pytest doubles.py
platform linux -- Python 3.6.0, pytest-3.2.3, py-1.4.34, pluggy-0.4. benchmark: 3.1.1 (defaults: timer=time.perf_counter disable_gc=False min_roun rootdir: /Projects/rustpy, inifile: plugins: benchmark-3.1.1 collected 2 items
doubles.py ..
test_regex 24.6824 (1.0) 32.3960 (1.0) 27.0167 (1.0)
Lets take the Mean
for comparison:
crate is how we call Rust Packages.
Having rust installed (recommended way is https://www.rustup.rs/) Rust is also available on Fedora and RHEL repositories by the rust-toolset
I used
rustc 1.21.0
In the same folder run:
cargo new pyext-myrustlib
It creates a new Rust project in that same folder called pyext-myrustlib
containing the Cargo.toml
(cargo is the Rust package manager) and also a src/lib.rs
(where we write our library implementation).
It will use the rust-cpython
crate as dependency and tell cargo to generate a dylib
to be imported from Python.
[package] name = “pyext-myrustlib“ version = “0.1.0“ authors = [“Bruno Rocha <rochacbruno@gmail.com>“][lib] name = “myrustlib“ crate-type = [“dylib“]
[dependencies.cpython] version = “0.1“ features = [“extension-module“]
What we need to do:
cpython
crate.Python
and PyResult
types from CPython into our lib scope.count_doubles
function implementation in Rust
, note that this is very similar to the Pure Python version except for:
Python
as first argument, which is a reference to the Python Interpreter and allows Rust to use the Python GIL
.&str
typed val
as reference.PyResult
which is a type that allows the rise of Python exceptions.PyResult
object in Ok(total)
(Result is an enum type that represents either success (Ok) or failure (Err)) and as our function is expected to return a PyResult
the compiler will take care of wrapping our Ok
on that type. (note that our PyResult expects a u64
as return value).py_module_initializer!
macro we register new attributes to the lib, including the doc
and also we add the count_doubles
attribute referencing our Rust implementation of the function
.
try!
macro, which is the equivalent to Python’stry.. except
.Ok(())
– The ()
is an empty result tuple, the equivalent of None
in Python.#[macro_use] extern crate cpython;use cpython::{Python, PyResult};
fn count_doubles(_py: Python, val: &str) -> PyResult<u64> {
<span class="pl-k">let</span> <span class="pl-k">mut</span> total <span class="pl-k">=</span> <span class="pl-c1">0u64</span>; // There is an improved version later on this post <span class="pl-k">for</span> (c1, c2) <span class="pl-k">in</span> val.<span class="pl-en">chars</span>().<span class="pl-en">zip</span>(val.<span class="pl-en">chars</span>().<span class="pl-en">skip</span>(<span class="pl-c1">1</span>)) { <span class="pl-k">if</span> c1 <span class="pl-k">==</span> c2 { total <span class="pl-k">+=</span> <span class="pl-c1">1</span>; } } <span class="pl-c1">Ok</span>(total)
}
py_module_initializer!(libmyrustlib, initlibmyrustlib, PyInit_myrustlib, |py, m | {
<span class="pl-c1">try!</span>(m.<span class="pl-en">add</span>(py, <span class="pl-s">"__doc__"</span>, <span class="pl-s">"This module is implemented in Rust"</span>)); <span class="pl-c1">try!</span>(m.<span class="pl-en">add</span>(py, <span class="pl-s">"count_doubles"</span>, <span class="pl-en">py_fn!</span>(py, <span class="pl-en">count_doubles</span>(val: <span class="pl-k">&</span><span class="pl-k">str</span>)))); <span class="pl-c1">Ok</span>(())
});
$ cargo build --releaseFinished release [optimized] target(s) <span class="pl-k">in</span> 0.0 secs
$ ls -la target/release/libmyrustlib target/release/libmyrustlib.d target/release/libmyrustlib.so <-- Our dylib is here
Now let’s copy the generated .so
lib to the same folder where our doubles.py
is located.
NOTE: on Fedora you must get a
.so
in other system you may get a.dylib
and you can rename it changing extension to.so
.
$ cd .. $ ls doubles.py pyext-myrustlib/$ cp pyext-myrustlib/target/release/libmyrustlib.so myrustlib.so
$ ls doubles.py myrustlib.so pyext-myrustlib/
Having the
myrustlib.so
in the same folder or added to your Python path allows it to be directly imported, transparently as it was a Python module.
Edit your doubles.py
now importing our Rust implemented
version and adding a benchmark
for it.
import re import string import random import myrustlib # <-- Import the Rust implemented module (myrustlib.so)def count_doubles(val):
<span class="pl-s"><span class="pl-pds">"""</span>Count repeated pair of chars ins a string<span class="pl-pds">"""</span></span> total <span class="pl-k">=</span> <span class="pl-c1">0</span> <span class="pl-k">for</span> c1, c2 <span class="pl-k">in</span> <span class="pl-c1">zip</span>(val, val[<span class="pl-c1">1</span>:]): <span class="pl-k">if</span> c1 <span class="pl-k">==</span> c2: total <span class="pl-k">+=</span> <span class="pl-c1">1</span> <span class="pl-k">return</span> total
double_re = re.compile(r‘(?=(.)\1)‘)
def count_doubles_regex(val):
<span class="pl-k">return</span> <span class="pl-c1">len</span>(double_re.findall(val))
val = ‘‘.join(random.choice(string.ascii_letters) for i in range(1000000))
def test_pure_python(benchmark):
benchmark(count_doubles, val)
def test_regex(benchmark):
benchmark(count_doubles_regex, val)
def test_rust(benchmark): # <-- Benchmark the Rust version
benchmark(myrustlib.count_doubles, val)
$ pytest doubles.py
platform linux -- Python 3.6.0, pytest-3.2.3, py-1.4.34, pluggy-0.4. benchmark: 3.1.1 (defaults: timer=time.perf_counter disable_gc=False min_round rootdir: /Projects/rustpy, inifile: plugins: benchmark-3.1.1 collected 3 items
doubles.py …
test_rust 2.5555 (1.0) 2.9296 (1.0) 2.6085 (1.0) test_regex 25.6049 (10.02) 27.2190 (9.29) 25.8876 (9.92) test_pure_python 52.9428 (20.72) 56.3666 (19.24) 53.9732 (20.69) -----------------------------------------------------------------------------
Lets take the Mean
for comparison:
Rust implementation can be 10x faster than Python Regex and 21x faster than Pure Python Version.
Interesting that Regex version is only 2x faster than Pure Python 🙂
NOTE: That numbers makes sense only for this particular scenario, for other cases that comparison may be different.
After this article has been published I got some comments on r/python and also on r/rust
The contributions came as Pull Requests and you can send a new if you think the functions can be improved.
Thanks to: Josh Stone we got a better implementation for Rust which iterates the string only once and also the Python equivalent.
Thanks to: Purple Pixie we got a Python implementation using itertools
, however this version is not performing any better and still needs improvements.
fn count_doubles_once(_py: Python, val: &str) -> PyResult<u64> {<span class="pl-k">let</span> <span class="pl-k">mut</span> total <span class="pl-k">=</span> <span class="pl-c1">0u64</span>; <span class="pl-k">let</span> <span class="pl-k">mut</span> chars <span class="pl-k">=</span> val.<span class="pl-en">chars</span>(); <span class="pl-k">if</span> <span class="pl-k">let</span> <span class="pl-c1">Some</span>(<span class="pl-k">mut</span> c1) <span class="pl-k">=</span> chars.<span class="pl-en">next</span>() { <span class="pl-k">for</span> c2 <span class="pl-k">in</span> chars { <span class="pl-k">if</span> c1 <span class="pl-k">==</span> c2 { total <span class="pl-k">+=</span> <span class="pl-c1">1</span>; } c1 <span class="pl-k">=</span> c2; } } <span class="pl-c1">Ok</span>(total)
}
def count_doubles_once(val):total <span class="pl-k">=</span> <span class="pl-c1">0</span> chars <span class="pl-k">=</span> <span class="pl-c1">iter</span>(val) c1 <span class="pl-k">=</span> <span class="pl-c1">next</span>(chars) <span class="pl-k">for</span> c2 <span class="pl-k">in</span> chars: <span class="pl-k">if</span> c1 <span class="pl-k">==</span> c2: total <span class="pl-k">+=</span> <span class="pl-c1">1</span> c1 <span class="pl-k">=</span> c2 <span class="pl-k">return</span> total</pre>
import itertoolsdef count_doubles_itertools(val):
c1s, c2s <span class="pl-k">=</span> itertools.tee(val) <span class="pl-c1">next</span>(c2s, <span class="pl-c1">None</span>) total <span class="pl-k">=</span> <span class="pl-c1">0</span> <span class="pl-k">for</span> c1, c2 <span class="pl-k">in</span> <span class="pl-c1">zip</span>(c1s, c2s): <span class="pl-k">if</span> c1 <span class="pl-k">==</span> c2: total <span class="pl-k">+=</span> <span class="pl-c1">1</span> <span class="pl-k">return</span> total</pre>
Ok, that is not the purpose of this post, this post was never about comparing Rust
X other language
, this post was specifically about how to use Rust to extend and speed up Python and by doing that it means you have a good reason to choose Rust instead of other language
or by its ecosystem or by its safety and tooling or just to follow the hype, or simply because you like Rust doesn’t matter the reason, this post is here to show how to use it with Python.
I (personally) may say that Rust is more future proof
as it is new and there are lots of improvements to come, also because of its ecosystem, tooling, and community and also because I feel comfortable with Rust syntax, I really like it!
So, as expected people started complaining about the use of other languages and it becomes a sort of benchmark, and I think it is cool!
So as part of my request for improvements some people on Hacker News also sent ideas, martinxyz sent an implementation using C and SWIG that performed very well.
C Code (swig boilerplate omitted)
uint64_t count_byte_doubles(char * str) { uint64_t count = 0; while (str[0] && str[1]) {<span class="pl-k">if</span> (str[<span class="pl-c1">0</span>] == str[<span class="pl-c1">1</span>]) count++; str++;
} return count; }
And our fellow Red Hatter Josh Stone improved the Rust implementation again by replacing chars
with bytes
so it is a fair competition with C
as C is comparing bytes instead of Unicode chars.
fn count_doubles_once_bytes(_py: Python, val: &str) -> PyResult<u64> {<span class="pl-k">let</span> <span class="pl-k">mut</span> total <span class="pl-k">=</span> <span class="pl-c1">0u64</span>; <span class="pl-k">let</span> <span class="pl-k">mut</span> chars <span class="pl-k">=</span> val.<span class="pl-en">bytes</span>(); <span class="pl-k">if</span> <span class="pl-k">let</span> <span class="pl-c1">Some</span>(<span class="pl-k">mut</span> c1) <span class="pl-k">=</span> chars.<span class="pl-en">next</span>() { <span class="pl-k">for</span> c2 <span class="pl-k">in</span> chars { <span class="pl-k">if</span> c1 <span class="pl-k">==</span> c2 { total <span class="pl-k">+=</span> <span class="pl-c1">1</span>; } c1 <span class="pl-k">=</span> c2; } } <span class="pl-c1">Ok</span>(total)
}
There are also ideas to compare Python list comprehension
and numpy
so I included here
Numpy:
import numpy as npdef count_double_numpy(val):
ng<span class="pl-k">=</span>np.fromstring(val,<span class="pl-v">dtype</span><span class="pl-k">=</span>np.byte) <span class="pl-k">return</span> np.sum(ng[:<span class="pl-k">-</span><span class="pl-c1">1</span>]<span class="pl-k">==</span>ng[<span class="pl-c1">1</span>:])</pre>
List comprehension
def count_doubles_comprehension(val):<span class="pl-k">return</span> <span class="pl-c1">sum</span>(<span class="pl-c1">1</span> <span class="pl-k">for</span> c1, c2 <span class="pl-k">in</span> <span class="pl-c1">zip</span>(val, val[<span class="pl-c1">1</span>:]) <span class="pl-k">if</span> c1 <span class="pl-k">==</span> c2)</pre>
The complete test case is on repository test_all.py
file.
NOTE: Have in mind that the comparison was done in the same environment and may have some differences if run in a different environment using another compiler and/or different tags.
-------------------------------------------------------------------------------------------------Name (time in us) Min Max Mean
test_rust_bytes_once 476.7920 (1.0) 830.5610 (1.0) 486.6116 (1.0) test_c_swig_bytes_once 795.3460 (1.67) 1,504.3380 (1.81) 827.3898 (1.70) test_rust_once 985.9520 (2.07) 1,483.8120 (1.79) 1,017.4251 (2.09) test_numpy 1,001.3880 (2.10) 2,461.1200 (2.96) 1,274.8132 (2.62) test_rust 2,555.0810 (5.36) 3,066.0430 (3.69) 2,609.7403 (5.36) test_regex 24,787.0670 (51.99) 26,513.1520 (31.92) 25,333.8143 (52.06) test_pure_python_once 36,447.0790 (76.44) 48,596.5340 (58.51) 38,074.5863 (78.24) test_python_comprehension 49,166.0560 (103.12) 50,832.1220 (61.20) 49,699.2122 (102.13) test_pure_python 49,586.3750 (104.00) 50,697.3780 (61.04) 50,148.6596 (103.06)
test_itertools 56,762.8920 (119.05) 69,660.0200 (83.87) 58,402.9442 (120.02)
new Rust implementation comparing bytes
is 2x better than the old comparing Unicode chars
Rust
version is still better than the C
using SWIGRust
comparing unicode chars
is still better than numpy
Numpy
is better than the first Rust implementation
which had the problem of double iteration over the unicode charslist comprehension
does not make significative difference than using pure Python
NOTE: If you want to propose changes or improvements send a PR here: https://github.com/rochacbruno/rust-python-example/
Back to the purpose of this post “How to Speed Up your Python with Rust” we started with:
– Pure Python function taking 102 ms.
– Improved with Numpy (which is implemented in C) to take 3 ms.
– Ended with Rust taking 1 ms.
In this example Rust performed 100x faster than our Pure Python.
Rust
will not magically save you, you must know the language to be able to implement the clever solution and once implemented in the right it worth as much as C in terms of performance and also comes with amazing tooling, ecosystem, community and safety bonuses.
Rust
may not be yet the general purpose language
of choice by its level of complexity and may not be the better choice yet to write common simple applications
such as web
sites and test automation
scripts.
However, for specific parts
of the project where Python is known to be the bottleneck and your natural choice would be implementing a C/C++
extension, writing this extension in Rust seems easy and better to maintain.
There are still many improvements to come in Rust and lots of others crates to offer Python <--> Rust
integration. Even if you are not including the language in your tool belt right now, it is really worth to keep an eye open to the future!
The code snippets for the examples showed here are available in GitHub repo: https://github.com/rochacbruno/rust-python-example.
The examples in this publication are inspired by Extending Python with Rust
talk by Samuel Cormier-Iijima in Pycon Canada. video here: https://www.youtube.com/watch?v=-ylbuEzkG4M.
Also by My Python is a little Rust-y
by Dan Callahan in Pycon Montreal. video here: https://www.youtube.com/watch?v=3CwJ0MH-4MA.
Other references:
Join Community:
Join Rust community, you can find group links in https://www.rust-lang.org/en-US/community.html.
If you speak Portuguese, I recommend you to join https://t.me/rustlangbr and there is the http://bit.ly/canalrustbr on Youtube.
Bruno Rocha
More info: http://about.me/rochacbruno and http://brunorocha.org
Whether you are new to Containers or have experience, downloading this cheat sheet can assist you when encountering tasks you haven’t done lately.