In 2012, the social network LinkedIn lost 6.5 million password hashes in a breach. The hashes were unsalted SHA-1. Within days, the security community had cracked the majority of them. In 2016, researchers discovered that the actual breach had been 117 million accounts — all unsalted. In 2016, Dropbox disclosed a 2012 breach of 68 million bcrypt and SHA-1 hashes; the bcrypt ones were effectively uncracked years later. The difference between those two outcomes is the subject of this article.
Password storage is one of the most consequential decisions in backend engineering, and it is also one of the most commonly bungled. The mistakes are predictable: storing plaintext, using a fast general-purpose hash with no salt, or confusing encoding with security. This post walks through each layer of the correct approach — salt, pepper, and a deliberately slow key derivation function — and explains precisely why each layer exists.
Never Store Plaintext Passwords
This sounds obvious, but it still happens. The consequences of a plaintext database leak are permanent: every user's password is immediately readable, and because a large fraction of users reuse passwords, attackers gain access to their email, banking, and other accounts as well. There is no remediation after the fact — you cannot un-leak a password.
The correct baseline is one-way transformation: store something derived from the password, not the password itself. A cryptographic hash function produces a fixed-size digest from arbitrary input; running the function on the correct password at login time reproduces the stored digest, confirming the match without ever storing the original. The question is which hash function to use and how to use it.
Why Unsalted Hashes Still Fail: Rainbow Tables and Precomputation
A naive approach: hash each password with MD5 or SHA-256 and store the result. This breaks in two distinct ways.
The first attack is a rainbow table. Attackers precompute the hashes of billions of common passwords and dictionary words and store them in a lookup table keyed by hash value. When a database leaks, cracking every hash in it becomes a simple table lookup — no computation needed at crack time. Tables covering all 8-character alphanumeric strings have been publicly available for over a decade.
The second attack exploits identical hashes for identical passwords. If
your database has ten thousand users with the password Password123!, they all
produce the same MD5 digest: 9a0e7d5...same for all ten thousand. An attacker
who cracks one has cracked them all simultaneously. They can also sort by hash and
immediately identify the most common passwords in your user base, prioritising the
highest-value accounts.
You can observe this directly with our Hash Generator — paste the same string twice and you get the same digest every time. That determinism is by design for checksums and file integrity, but it is a liability for passwords.
What a Salt Is and What It Defeats
A salt is a unique, randomly generated value created for each user at registration time. It is concatenated with the password before hashing, and stored alongside the resulting hash in the database. The salt itself is not secret — it just needs to be unique.
-- Conceptual schema
CREATE TABLE users (
id BIGINT PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
salt TEXT NOT NULL, -- e.g. 32 random hex bytes
hash TEXT NOT NULL -- KDF(password || salt)
); With per-user salts in place, two users with the password Password123! produce completely
different stored hashes, because their salts differ. Rainbow tables become useless: a table precomputed
without the salt cannot match any hash in your database. An attacker who cracks one hash gains
nothing about any other — each hash must be attacked individually, with its own salt factored
in.
The salt should come from a cryptographically secure random number generator (CSPRNG), not
from predictable sources like timestamps or user IDs. In most languages the right call is crypto.randomBytes(32) in Node.js, secrets.token_bytes(32) in
Python, or SecureRandom in Java. A UUID v4 (which is itself 122 bits of CSPRNG
output) also works as a salt source — our UUID Generator demonstrates the format, though
in production code you should generate UUIDs programmatically via your language's UUID library
rather than copy-pasting from a web tool.
Salts and the resulting hash digests are typically stored as hex strings or Base64-encoded bytes. Both encodings are lossless and printable, which matters for TEXT columns and JSON serialisation. Hex is slightly more verbose (2 chars per byte vs Base64's ~1.33 chars per byte) but is visually unambiguous and universally supported. Whichever you pick, be consistent: mixing hex and Base64 in the same column is a silent corruption hazard.
A practical schema note: column widths for hash and salt strings are not arbitrary. A bcrypt hash is always exactly 60 characters. A SHA-256 hex digest is always 64 characters. An Argon2id output string (which embeds the algorithm parameters and salt) is typically around 95 characters. Getting those column sizes right matters for schema design — this is exactly the kind of situation where understanding why character counting matters prevents silent truncation bugs in VARCHAR columns.
What a Pepper Adds
A pepper is a secret value that is the same across all users, but stored outside the database — in an environment variable, application configuration, or a dedicated secret manager such as HashiCorp Vault or AWS Secrets Manager. It is typically prepended or appended to the password before the salt is mixed in.
The threat model a pepper addresses is a database-only breach: an attacker who exfiltrates
your users table but does not have access to your application server cannot reproduce
the exact input to the hash function, so they cannot crack any hash at all — even with unlimited
compute. Conversely, a pepper provides no protection against a full server compromise where the
attacker also reads your environment variables, so it is a hardening layer, not a substitute for
a strong KDF.
If you rotate a pepper (e.g., after a suspected partial compromise), you need to re-hash every password at next login, which requires users to authenticate first. Plan for that migration cost before introducing a pepper into an existing system.
Why Fast Hashes Are the Wrong Tool for Passwords
MD5, SHA-1, SHA-256, and SHA-3 are all general-purpose cryptographic hash functions. They were designed to be fast — SHA-256 can process several gigabytes per second on a modern CPU, and GPU implementations are orders of magnitude faster. A consumer RTX 4090 can compute roughly 160 billion MD5 hashes per second. That speed is an asset for checksums and digital signatures, where you are hashing large files or many messages. For passwords, it is catastrophic: it means an attacker who steals your hashed passwords can try billions of guesses per second against each one.
What you need instead is a key derivation function (KDF) that is deliberately slow and/or memory-intensive, so that each guess costs the attacker significant time and resources. The three mainstream choices are:
- bcrypt (1999) — still widely used, battle-tested, with a configurable cost factor (work rounds double with each increment). bcrypt is CPU-bound and has a 72-byte input limit, which is a constraint worth knowing. The output is always 60 characters, self-describing its cost factor and salt.
- scrypt (2009) — memory-hard as well as CPU-bound, parameterised by N (CPU/memory cost), r (block size), and p (parallelisation). Harder to tune correctly than bcrypt; a common misconfiguration is leaving the memory parameter too low.
- Argon2id (2015) — winner of the Password Hashing Competition, current
NIST SP 800-63B recommendation. Parameterised by memory cost (m), time cost (t), and
parallelism (p). The
idvariant balances GPU-resistance (from Argon2d) with side-channel resistance (from Argon2i). For new systems, Argon2id is the right default.
A rough production starting point for Argon2id is m=65536 (64 MB), t=3 iterations, p=4 parallelism. Adjust upward until a login hash takes
roughly 300–500 ms on your production hardware — slow enough to punish attackers, fast enough
that users do not notice.
Putting It Together: A Pseudocode Walkthrough
Registration:
import argon2
import secrets
PEPPER = os.environ["PASSWORD_PEPPER"] # loaded from secret manager
def register_user(email, plaintext_password):
salt = secrets.token_hex(32) # 64-char hex string, unique per user
peppered = plaintext_password + PEPPER
hash_str = argon2.hash(
peppered,
salt=salt,
memory_cost=65536, # 64 MB
time_cost=3,
parallelism=4,
)
db.insert("users", email=email, hash=hash_str, salt=salt) Login (timing-safe comparison is handled by the KDF library's verify call):
def login_user(email, plaintext_password):
row = db.get("users", email=email)
if row is None:
# Still call argon2.verify on a dummy hash to prevent
# timing-based user enumeration
argon2.verify("dummy_hash", "dummy_password")
return False
peppered = plaintext_password + PEPPER
return argon2.verify(row["hash"], peppered) # returns True/False Note the dummy verify call in the not-found branch. Without it, login for a non-existent user returns almost instantly (just a DB miss), while login for an existing user takes ~300 ms (the KDF). An attacker can use that timing difference to enumerate valid email addresses in your database — a subtler problem than it sounds in high-value systems.
Timing-Safe Comparison
Beyond the KDF itself, the final comparison of a computed hash against a stored one should
use a constant-time comparison function. A naive string equality check in most languages
short-circuits on the first mismatched byte, leaking information about how many leading
bytes matched. Vetted KDF libraries expose a verify function that handles this
correctly. If for any reason you are comparing raw digests yourself (you should not be), use hmac.compare_digest in Python, crypto.timingSafeEqual in Node.js, or the equivalent in your language.
What Our Hash Generator Is For — and What It Is Not For
Our Hash Generator lets you compute MD5, SHA-1, SHA-256, and SHA-512 digests of arbitrary text in your browser. It is useful for verifying file checksums, learning how hash functions behave, confirming that a salt changes the digest entirely (try the same word with and without a prefix — the output is completely different), and debugging data-integrity pipelines.
It is emphatically not a tool for hashing production passwords. It does not apply a salt, it does not use a KDF, and it produces raw fast-hash digests that are trivially crackable. For real user accounts, use a vetted KDF library:
- Node.js:
argon2(npm) orbcrypt/bcryptjs - Python:
argon2-cffiorpasslib - PHP:
password_hash($pw, PASSWORD_ARGON2ID)andpassword_verify()— built in since PHP 7.2 - Go:
golang.org/x/crypto/argon2or thebcryptpackage in the same module - Java / Spring:
Argon2PasswordEncoderorBCryptPasswordEncoderfrom Spring Security - Ruby on Rails:
bcrypt-ruby(the default for Devise)
Every one of these libraries generates a unique salt internally, so you do not need to manage salt generation yourself — the KDF call takes a plaintext string and returns a self-describing output string that contains the algorithm, parameters, salt, and digest all in one.
Do / Don't: The Quick Reference
- Do use Argon2id, bcrypt, or scrypt for password hashing.
- Do generate a unique CSPRNG salt per user (32+ bytes).
- Do store the salt alongside the hash — it is not secret.
- Do use a pepper for additional protection, stored outside the database.
- Do tune the KDF work factor so login takes ~300 ms on your hardware.
- Do use constant-time comparison (your KDF library's verify function).
- Do re-hash (upgrade) stored hashes on next login when you raise the work factor.
- Don't use MD5, SHA-1, SHA-256, or any fast hash for passwords.
- Don't store passwords or reversible encodings of passwords.
- Don't invent your own salting or hashing scheme — use a library.
- Don't share salts across users or generate them from predictable inputs.
- Don't use an online hash tool for production password storage.
Summary
The complete picture is three layers. A salt (unique, random, per-user, stored in the DB) eliminates rainbow tables and makes each hash independent. A pepper (secret, app-wide, stored outside the DB) makes the hashes uncrackable without the application secret. A slow KDF (bcrypt, scrypt, Argon2id) makes each individual guess expensive, turning billions-per-second GPU attacks into thousands-per-second at best.
None of these layers requires you to write cryptographic primitives yourself. Every mature language ecosystem has a well-maintained Argon2id or bcrypt library that handles salt generation, parameterisation, and comparison in a single function call. The cost of using the right tool is a few minutes of setup. The cost of using the wrong one is a breach headline and your users' accounts on a credential-stuffing list.
