Security with Python
Introduction
I recently implemented a Python library which acts as an abstraction layer on top of an existing security algorithm (in this case scrypt).
The motivation was for allowing teams to have a consistent experience utilising encryption (and hashing) in their applications and services without necessarily having to know the ins-and-outs of what’s important with regards to salts, key lengths etc.
Note: I always encourage people to understand what it is they’re doing, but in some cases that’s not always a practical mindset.
The library provides three functions:
generate_digest
decrypt_digest
validate_digest
{{< adverts/pythonforprogrammers >}}
Before we start looking at the three functions provided by this library/interface, let’s very briefly talk about KDF and PBKDF2.
A KDF (Key Derivation Function) accepts a message + a key, and produces a digest for its output. They are designed to be more computationally intensive than standard hashing functions, and so they make it harder to use dictionary or rainbow table style attacks (as they would require a lot of extra memory resources and become more unfeasible as an attack vector).
By default the KDF will generate a random salt (thus output is non-deterministic) and have a maximum computational time of 0.5
(although this can be overridden using a maxtime
argument, as we’ll see later).
A PBKDF2 on the other hand is able to provide deterministic output (as well as the ability to specify an explicit salt value). The internal implementation will repeat its process multiple times, thus reducing the feasibility of automated password cracking attempts (similar to a KDF).
I mention both of these (KDF and PBKDF2) because the generate_digest
function I’ve written is a multi-arity function that will switch implementation based upon the provided arguments in the method signature.
Originally I had two separate functions to distinguish them a bit more clearly but realised if this library is to make life easier for developers who don’t understand encryption or hashing concepts, then I need to provide a single function that intelligently handles things internally.
Because KDF accepts a key and is able to return the original message (given the same key) it’s acting as a form of symmetrical encryption, whereas a PBKDF2 is more like a one-way hash function. Hence I named the function in this library generate_digest
rather than something like encrypt_message
which wouldn’t have made sense when dealing with PBKDF2.
generate_digest
This is a multi-arity function that will generate a digest using either a password-based key derivation function (KDF) or a PBKDF2 depending on the input given.
If a password
argument is provided, then KDF will be used (along with a random salt) to generate a non-deterministic digest.
If a salt
is provided, then a PBKDF2 will be used to generate a deterministic digest.
Note: salts should be a minimum of 128bits (~16 characters) in length. Also, when specifying a maxtime with
generate_digest
, ensure you include that same value when decrypting withdecrypt_digest
or validating viavalidate_digest
.
decrypt_digest and validate_digest
The decrypt_digest
and validate_digest
functions only apply to digests that have been generated using a password (i.e. KDF). Given the right password decrypt_digest
will return the original message, and thus is considered more a form of symmetrical encryption than a straight one-way hash function. The validate_digest
function will return a boolean true or false if the given password was able to decrypt the message.
Dependencies
This abstraction library requires scrypt
, which itself requires the following dependencies to be installed within the context of your service: build-essential
, libssl-dev
, and python-dev
. If your service has a Dockerfile, adding these dependencies should be as simple as adding a line like the following:
RUN apt-get update && apt-get install -y build-essential libssl-dev python-dev
Usage
I suggest looking at the test suite (see below) to get an idea of how you would use the functions in this library.
Note: for a glossary of security terms, refer to this document.
Tests
Before we look at the implementation of the library, let’s take a moment to sift through its test suite.
Note: I named the library
secure
and have it running on a private PyPy instance. This code is made available via GitHub.
import pytest
from secure.interface import ArgumentError, generate_digest, validate_digest, decrypt_digest
message = "my-message"
password = "my-password"
salt = "my-salt-is-long-enough"
def test_generate_digest_with_both_a_password_and_a_salt():
"""Providing both a password and a salt should raise an exception."""
with pytest.raises(ArgumentError):
generate_digest(message, salt=salt, password=password)
def test_generate_digest_with_a_password():
"""Generating a digest with a password should be non-deterministic."""
digest1 = generate_digest(message, password=password)
digest2 = generate_digest(message, password=password)
digest3 = generate_digest(message, password=password, maxtime=1.5)
digest4 = generate_digest(message, password=password, maxtime=1.5)
digest5 = generate_digest(message, password=password, maxtime=int(1))
digest6 = generate_digest(message, password=password, maxtime=int(1))
assert digest1 != digest2
assert digest3 != digest4
assert digest5 != digest6
def test_generate_digest_without_a_password():
"""Generating a digest without a password should be deterministic."""
digest1 = generate_digest(message)
digest2 = generate_digest(message)
digest3 = generate_digest(message, salt=salt)
digest4 = generate_digest(message, salt=salt)
digest5 = generate_digest(message, length=128)
digest6 = generate_digest(message, length=128)
assert digest1 == digest2
assert digest3 == digest4
assert len(digest5) == len(digest6)
def test_generate_digest_with_different_salt_lengths():
"""Salts should be at least 128bits (~16 characters) in length."""
generate_digest(message, salt=salt)
with pytest.raises(ArgumentError):
generate_digest(message, salt="too-short")
def test_validate_digest():
"""Validation only applies to digests generated with a password."""
digest1 = generate_digest(message, password=password)
digest2 = generate_digest(message, password=password)
digest3 = generate_digest(message, password=password, maxtime=1.5)
digest4 = generate_digest(message, password=password, maxtime=1.5)
digest5 = generate_digest(message, password=password, maxtime=int(1))
digest6 = generate_digest(message, password=password, maxtime=int(1))
assert not validate_digest(digest1, 'incorrect-password')
assert validate_digest(digest1, password)
assert validate_digest(digest3, password, maxtime=1.5)
assert validate_digest(digest5, password, maxtime=int(1))
def test_decrypt_digest():
"""Decryption is possible given the right password."""
digest = generate_digest(message, password=password)
assert decrypt_digest(digest, password) == message
Implementation
OK, time to see the library code itself.
Note: I like to use MyPy for type hinting.
import scrypt
from typing import Union
class ArgumentError(Exception):
pass
def generate_digest(message: str,
password: str = None,
maxtime: Union[float, int] = 0.5,
salt: str = "",
length: int = 64) -> bytes:
"""Multi-arity function for generating a digest.
Use KDF symmetric encryption given a password.
Use deterministic hash function given a salt (or lack of password).
"""
if password and salt:
raise ArgumentError("only provide a password or a salt, not both")
if salt != "" and len(salt) < 16:
raise ArgumentError("salts need to be minimum of 128bits (~16 characters)")
if password:
return scrypt.encrypt(message, password, maxtime=maxtime)
else:
return scrypt.hash(message, salt, buflen=length)
def decrypt_digest(digest: bytes,
password: str,
maxtime: Union[float, int] = 0.5) -> bytes:
"""Decrypts digest using given password."""
return scrypt.decrypt(digest, password, maxtime)
def validate_digest(digest: bytes,
password: str,
maxtime: Union[float, int] = 0.5) -> bool:
"""Validate digest using given password."""
try:
scrypt.decrypt(digest, password, maxtime)
return True
except scrypt.error:
return False
Conclusion
Let me know what you think on twitter. Have fun.