Django Password Hashing: Speed Up Your Tests by 70%
Run tests 70% faster by not wasting time hashing.
If python Django is part of your tech stack, and you create test users during your tests, you may be spending a lot of time hashing passwords. If so you may be able to speed up your tests by using a more appropriate hash method.
Note: Changing the hash method is specifically mentioned in the official docs on speeding up your Django tests. I wanted to post this to highlight some real-world numbers on just how much of a difference it can make.
Why Speed Up Tests? Developer Flow
It may seem silly, but some people ask: “Why speed up the tests? That’s not part of the main app we are building”.
The answer of course is: increased productivity.
Developers use tests to verify that code works as it should. They are the way you enforce the behaviour you want from your program. Unit tests confirm the basic functionality, and should always pass before you add any new code to your code base. They are especially critical if you are using Test-Driven Development to write your code.
The faster your tests run, the more frequently a programmer can ask “Does this work?”, and get an answer. The more frequently you can do this, the more you can build and iterate. The faster your loop of think-write-verify becomes, the more you can get done, and the more productive your team and company will become. Waiting a week between each change you make and finding out whether it worked would be incredibly painful, and a terrible way to write code. In my view: anything longer than 5 minutes is bad for developer speed and should be addressed. Ideally you have a test runtime shorter than 30 seconds.
Fast tests make it easier for programmers to get into a “flow state”, where they hold all of the working knowledge and context they need in their head at once, and can rapidly make changes, make decisions, iterate, test, and build. This is the most productive mode for programmers, and we should strive to help them enter and stay in flow as much as possible. Waiting for slow tests breaks you out of flow, which can be costly. Keep your tests fast, and you will be more productive.
Speeding Up Tests - The Basics
The first thing you should usually do is measure. What is making your tests slow? However, even without measuring you can improve your test speed and development speed by following basic best practices like:
Remove external dependencies. e.g. no network calls. Don’t make actual network connections or calls to external systems during your tests. They are slow, expensive, and prone to failure. Unit tests should only run locally against the system under test.
Avoid repeating setup steps. When setting up data or conditions for your tests, work to perform setup only once, and as close to the test as possible. For python this usually means - move code out of
setUp()
test case methods tosetUpClass()
class methods
Understanding Django Hashing
By default, Django uses the PBKDF2 password hashing algorithm. This is a smart, modern, reasonable default that makes storing your passwords secure. This is what you want.
Django hashes the password for every user when that user is created and saved. You can check out the code in django.contrib.auth.models and contrib.auth.hashers
:
user.password = make_password(password)
...
hasher = get_hasher(hasher)
hasher.encode(password, salt)
...
user.save(using=self._db)
Under normal conditions this is great. But what about during your tests?
Say you have a simple test function that creates temporary user accounts like this:
def create_user(self, **kwargs):
return User.objects.create(**kwargs)
You might call this, say, inside the unit tests for each view, to create a user with appropriate permissions as part of the test setup. This ensures your tests are atomic and isolated, so that changes to a user, permissions, or other settings in one area do not affect tests from another view.
When you create and save that test user, Django hashes their password.
If you are creating *many* test users (e.g. creating a user for each test class across dozens, hundreds, or thousands of tests), that is a lot of passwords. And a lot of hashing.
PBKDF2 is *intentionally* expensive. By design. To be more secure. This helps to prevent an attacker from throwing lots of hardware at the problem to brute-force through passwords more quickly.
When real users are using this for real use, that's good.
But if you are running tests, and creating a LOT of test users, that's really expensive.
Changing The Hash Algorithm
Fortunately, Django makes it easy to change the algorithm used to hash - simply set a different list of password hashers as part of your settings. e.g.
if DJANGO_TESTS_RUNNING:
# performance: do not waste time hashing passwords during tests.
# use a much simpler hash
PASSWORD_HASHERS = [
"django.contrib.auth.hashers.UnsaltedMD5PasswordHasher",
]
Now Django will use the new algorithm instead. In this example the hasher is set to use unsalted MD5, one of the least-secure but also least-expensive hash methods included with the default Django installation. With modern hardware and technology - hashing via MD5 is much, much faster than the PBKDF2 default.
The Results: A 70% Speed Increase
So how much of a difference will this make in the real world? It turns out - often quite a lot! Below is one example from a modern codebase with ~2500 python unit tests and 2200 calls to create test users.
Here is one test run to profile the full unit test suite. In this case I used cProfile and snakeviz.
Before:
Here is a list of the top most expensive functions. The column we are interested in is “tottime” - the total number of seconds spent doing work inside a function.
You can see the culprit right on the first line:
447s ... built-in method _hashlib.pbkdf2
This hash function is eating up 447 seconds of our 610 second runtime.
We are spending 73% of our time hashing passwords for test users. Users that are immediately thrown away right after the test!
Django suggests using a method named force_login() during tests, to avoid unimportant logic when logging in is not an important part of the test. If your tests use force_login()
, you’re not validating against the password anyway, so hashing the password during user creation only to ignore it is really useless work!
Usually testing the password system is out of scope and not interesting. The test’s job is not to validate that Django’s password and login system works. That’s Django’s responsibility. The test’s job is to test your code, to validate your own functionality.
After:
So how does this compare with MD5 hashing? Here is the new performance output, after the hash has been changed:
Can you spot hashlib.pbkdf2 in the list?
No! You cannot. The hasher is completely removed from the top list of expensive functions. We have literally removed the largest chunk of time spent in our tests. Total runtime dropped by 73%! Huzzah!
Time It More Than Once
Usually when comparing performance profiling and timing you want to run more than just one test. In our case, I ran 20 iterations of the full test suite with and without the hashing code, and the 70%+ speedup continued to be glaringly obvious.
In our particular CI system this meant an improvement from taking ~30 minutes to run our full unit test suite to less than 8 minutes. That’s great.
But on a local dev machine this dropped the runtime from three minutes (not terrible) to less than 45 seconds. Now *that’s* good for flow state.
Conclusion: Fix Your Hashing During Tests
If you create test users and hash passwords during your unit tests - try it and see on your codebase! You may be able to significantly speed up your tests.