Is MD5 completely broken and should I never use it?

MD5 is broken for any security-sensitive purpose. Researchers can generate two different inputs that produce the same MD5 digest in seconds on commodity hardware. However, for non-security tasks — detecting accidental file corruption, generating cache keys, deduplicating data in a trusted pipeline — MD5 remains fast and perfectly adequate. The key question is: does an attacker ever get to craft the input?

What makes SHA-256 better than MD5?

SHA-256 produces a 256-bit digest (compared to MD5's 128 bits), has no known collision attacks, and is part of the NIST-approved SHA-2 family. Its wider output makes brute-force preimage attacks astronomically harder, and its design avoids the structural weaknesses in MD5's Merkle–Damgård compression function that enable chosen-prefix collisions.

Can I use MD5 for password storage?

No — and this applies to SHA-256 too. Fast hashing algorithms (MD5, SHA-256, SHA-512) are designed to be quick, which makes them easy to brute-force at billions of guesses per second using a GPU. Password storage requires a slow, salted, adaptive algorithm specifically designed for the purpose: bcrypt, scrypt, Argon2id, or PBKDF2. Using plain MD5 or SHA-256 for passwords, even salted, is a serious security vulnerability.

What is a collision attack and why does it matter?

A collision attack finds two different inputs that hash to the same digest. For MD5, this was demonstrated theoretically in 1996 and made practical by Wang et al. in 2004 — finding a collision in under a minute. Chosen-prefix collisions (where an attacker controls the prefix of each colliding input) were achieved in 2007 and weaponized in the Flame malware in 2012 to forge a Microsoft code-signing certificate. If a hash is used to verify integrity or authenticity, a collision attack lets an attacker substitute malicious content that passes the check.

Are hashes the same as encryption?

No. Hashing is a one-way function — you cannot reverse a digest back to the original input (without brute force). Encryption is reversible: given a key, you can decrypt ciphertext back to plaintext. A hash digest is a fixed-size fingerprint of data; it says nothing about confidentiality. For secrets, use encryption or a key-derivation function. For integrity verification, use a hash — but choose one that is still collision-resistant.

MD5 vs SHA-256: Which Hash Should You Use (and Why MD5 Isn't Dead)

Open any codebase that handles files, tokens, or cache invalidation and you will find MD5 and SHA-256 side by side, often used interchangeably. They are not interchangeable. MD5 is cryptographically broken and should never touch anything that needs to be secure. SHA-256 remains the safe default for almost everything else. Yet MD5 is not dead — it is faster, produces a shorter digest, and for the large class of tasks where no attacker is involved, it is perfectly adequate.

This article explains what a cryptographic hash actually does, why MD5's collision resistance failed, what SHA-256 does differently, and how to decide which one belongs in your code. To compute and compare digests while reading, the Hash Generator on this site produces MD5, SHA-1, SHA-256, and SHA-512 digests in your browser with no server round-trip.

What a Cryptographic Hash Function Does

A hash function takes an arbitrary-length byte sequence as input and returns a fixed-length digest. Four properties define a cryptographic hash function:

Deterministic. The same input always produces the same digest, on any machine, at any time.
Fixed output size. MD5 always returns 128 bits (16 bytes, usually displayed as 32 hex characters). SHA-256 always returns 256 bits (32 bytes, 64 hex characters). Input length is irrelevant — a single character and a 4 GB ISO file each produce a digest of the same length.
One-way (preimage resistance). Given a digest, it must be computationally infeasible to find any input that produces it. You cannot "decrypt" a hash.
Avalanche effect. A single-bit change in the input flips roughly half the bits in the output. "hello" and "Hello" produce completely different digests with no predictable relationship between them.

Collision resistance is the fourth property: it must be computationally infeasible to find two different inputs that produce the same digest. This is the property MD5 has lost.

MD5 Internals and Why It Broke

MD5 was designed by Ron Rivest in 1991, replacing his earlier MD4. It operates on 512-bit message blocks, uses the Merkle–Damgård construction (each block updates a running 128-bit internal state), and applies 64 rounds of bitwise operations across four 32-bit working variables. The final 128-bit state is the digest.

The Merkle–Damgård structure has a known theoretical weakness: if you can find a collision in a single block, the construction often lets you extend it — craft prefixes on both colliding inputs and preserve the collision. Researchers exploited this against MD5 in stages:

1996: Hans Dobbertin demonstrated weaknesses in MD5's compression function, triggering serious academic concern.
2004: Wang Xiaoyun and Yu Hongbo published a collision attack that found two distinct 1024-bit inputs with the same MD5 digest in under a minute on a standard workstation — a watershed moment. This shattered the assumption that MD5's collision resistance held in practice.
2007: Marc Stevens, Arjen Lenstra, and Benne de Weger demonstrated chosen-prefix collisions: given two arbitrary attacker-controlled prefixes, they could generate two suffixes such that the concatenated documents collide. This is far more dangerous than a simple collision because the attacker controls the meaningful part of the input.
2012: The Flame malware — a state-sponsored cyberespionage toolkit — used a chosen-prefix collision against MD5 to forge a Microsoft code-signing certificate. A rogue certificate was made to collide with a legitimate Windows Update certificate, letting Flame sign its components as authentic Microsoft updates.

Today, MD5 collisions can be found in seconds on a laptop. The algorithm is structurally compromised for any purpose that requires collision resistance.

SHA-256: Still Secure

SHA-256 belongs to the SHA-2 family, published by NIST in 2001 (FIPS 180-2). Like MD5 it uses a Merkle–Damgård-like construction, but with several important differences: 512-bit blocks, 64 rounds of mixing using six logical functions and message scheduling, and a 256-bit state across eight 32-bit words. The larger state and more aggressive mixing make the differential cryptanalysis techniques that broke MD5 computationally infeasible against SHA-256.

As of 2026, there are no known practical collision attacks against SHA-256. The best theoretical attacks reduce the security margin slightly (attacks on reduced-round variants) but remain far outside practical reach. NIST still recommends SHA-256 for general-purpose integrity and digital signatures, and it is the dominant algorithm in TLS certificates, Git object storage, Bitcoin's proof-of-work, and JWT signatures.

Comparison Table

Property	MD5	SHA-256
Output size	128 bits (32 hex chars)	256 bits (64 hex chars)
Speed (software, x86)	~400–600 MB/s	~150–250 MB/s
Collision-resistant?	No — broken since 2004	Yes — no known attack
Preimage-resistant?	Yes (still holds)	Yes
Use for TLS/signatures	Never	Yes
Use for file integrity (trusted input)	Acceptable	Preferred
Use for cache keys / dedup	Fine	Fine, slightly more overhead
Use for passwords	Never	Never (use bcrypt/Argon2)
NIST-approved	No	Yes (FIPS 180-4)

MD5 vs SHA-256 comparison: MD5 outputs 128 bits at 400–600 MB/s, collision-broken since 2004, no NIST approval, fine for cache keys but never for security. SHA-256 outputs 256 bits at 150–250 MB/s, no known collision attack, NIST FIPS 180-4 approved, required for TLS, signatures, and any attacker-influenced input. — MD5 is 2–3x faster but cryptographically broken for security since 2004; SHA-256 is the safe default for all new code.

The Avalanche Effect in Practice

Hashes are case- and byte-exact. A one-character difference in input produces a completely different digest with no predictable relationship to the original. This is not a quirk — it is a design requirement called the avalanche effect:

MD5("hello")  = 5d41402abc4b2a76b9719d911017c592
MD5("Hello")  = 8b1a9953c4611296a827abf8c47804d7
MD5("hello ") = f814893777bcc2295fff05f00e508da6  ← trailing space

Three nearly identical inputs — a one-bit change from "hello" to "Hello", then one extra space — each produce entirely different 128-bit digests. There is no way to tell from the digests that the inputs were similar. This is what makes hashing useful for integrity checking — any corruption, however small, changes the digest completely.

The flip side is that hashing is byte-exact, so input normalization matters before you hash. "Hello" and "hello" yield completely different digests — if you need to normalize case before hashing identifiers, a tool like Word Case Swap does it in one click. The same principle applies to naming conventions in code: if you are hashing identifiers or cache keys derived from variable names, the choice between camelCase and snake_case must be consistent — see camelCase vs snake_case for a detailed breakdown of where that consistency matters most.

Encoding the Digest: Hex vs Base64

A raw hash digest is a sequence of bytes — 16 bytes for MD5, 32 for SHA-256. To transmit or store it as text, you need an encoding. Two common choices:

Hexadecimal: each byte becomes two ASCII characters (00–ff). MD5 becomes 32 chars, SHA-256 becomes 64 chars. Case-insensitive in practice. 100% overhead over raw bytes.
Base64: every 3 bytes become 4 printable characters. MD5's 16 bytes encode to 24 characters; SHA-256's 32 bytes become 44 characters. The raw bytes of a digest can be passed straight through a Base64 Encoder for use in HTTP headers, JSON payloads, or JWT claims where a shorter, URL-safe representation is preferable to a long hex string.

The encoding is purely cosmetic — it does not affect the security properties of the underlying hash. Both representations carry the same information. Hex is more readable for debugging; Base64 is more compact in structured formats.

Computing Digests in JavaScript

The Web Crypto API's crypto.subtle.digest supports SHA-1, SHA-256, SHA-384, and SHA-512 natively in any modern browser or Node.js 18+. MD5 is not in the Web Crypto spec — use it only through a library, and only for non-security purposes:

async function sha256hex(message) {
    const encoder = new TextEncoder();
    const data = encoder.encode(message);
    const hashBuffer = await crypto.subtle.digest('SHA-256', data);
    const hashArray = Array.from(new Uint8Array(hashBuffer));
    return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}

// "hello" → "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824"
// "Hello" → "185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969"
await sha256hex('hello');
await sha256hex('Hello');

The two digests share no visible pattern despite the inputs differing by only one bit — the avalanche effect at work. Note that crypto.subtle is async and returns an ArrayBuffer, which you convert to a hex string via Uint8Array. There is no sync SHA-256 built into the browser platform.

When to Use MD5

The critical question is: can an attacker control or craft the input? If yes, MD5 is off the table entirely. If no, MD5 is fast, widely supported, and produces a compact digest. Appropriate MD5 uses:

Checksums for accidental corruption. Verifying a file download over a trusted channel (where the checksum is published separately by the same party) detects random bit-flips in transit. A collision attack requires intentional crafting — cosmic rays do not generate chosen-prefix collisions.
Cache keys. Hashing a request URL, query parameters, or a rendered template to generate a cache key is a non-security operation. You control all the inputs and only care about false collisions being rare, which MD5's 128 bits provides adequately.
Content deduplication in internal pipelines. Detecting duplicate blobs in a storage system you control, with no attacker-provided content, is a non-cryptographic use.
Legacy compatibility. Some protocols (older LDAP password stores, certain RADIUS implementations, legacy API authentication schemes) require MD5 by specification. In those cases you have no choice — but consider the security context carefully.

When to Use SHA-256

Use SHA-256 whenever collision resistance matters:

Digital signatures and certificates. TLS certificates, code-signing, and JWT signatures all depend on SHA-256. An MD5-based signature can be forged using a collision (as Flame demonstrated).
Software distribution checksums. If you publish a checksum alongside a binary for users to verify, use SHA-256. An attacker who can serve a malicious binary and a colliding MD5 checksum passes the verification silently.
Content-addressable storage. Git switched from SHA-1 to SHA-256 (via the ongoing sha256 transition) precisely because content-addressed systems break if two different objects share a hash.
Any user-controlled or attacker-influenced input. File uploads, API payloads, form data — anything whose bytes the user chooses.
Default for new code. When you are writing new code with no reason to prefer MD5, use SHA-256. It is a little slower but the speed difference rarely matters, and the security margin is vastly larger.

What Hashing Is Not

Two common misconceptions cause real vulnerabilities:

Hashing is not encryption. Encryption is reversible — given the key, you can recover the plaintext. A hash digest is not reversible by design. You cannot "decrypt" 5d41402abc4b2a76b9719d911017c592 back to "hello" using any key. The one-wayness is the point. Do not use hashing where you need to recover the original value; use encryption.

Hashing is not password storage, not even with SHA-256. Fast hash functions can compute billions of digests per second on a GPU. An attacker with the digest file and a wordlist runs every candidate through SHA-256 and compares — this is a brute-force attack, and it works quickly against anything remotely guessable. Password hashing requires an intentionally slow, salted, adaptive algorithm: Argon2id is the current recommendation, with bcrypt and scrypt as proven alternatives. The salt prevents precomputation attacks (rainbow tables). The cost factor means each guess takes milliseconds instead of nanoseconds. Neither MD5 nor SHA-256 provides either of those properties.

Putting It Together

The decision tree is simple: if an attacker could ever craft or influence the input, use SHA-256 (or SHA-3 if you need defense-in-depth beyond SHA-2). If the input is entirely within your control and the only threat is accidental corruption or you need a short, fast fingerprint, MD5 is fine. If you are hashing passwords — stop, and reach for Argon2id.

To generate both digests and see the avalanche effect for yourself, paste any text into the Hash Generator. Type a single character, then change its case, and watch how completely the digest changes — that instability is exactly the property that makes cryptographic hashes useful. Try the same experiment with two inputs that differ only by a trailing space; the digests will share nothing visible, which is why checksums catch even small forms of corruption reliably.