Committing to Secrets via Hashing

TL;DR: Cryptography is often thought of in terms of encryption (hiding messages) or signatures (proving authenticity). But there’s another fundamental building block that underpins many protocols, from digital lotteries to zero-knowledge proofs: the commitment. A commitment is a way to promise something now without revealing it, yet in such a way that you cannot change your mind later. It can be efficiently implemented via hashing.

It is the digital equivalent of sealing a message in an envelope and handing it to someone: they cannot see the contents until you open it, but once it is sealed, you cannot replace what is inside.

The Problem: Promises Without Revealing

There are many situations where the “sealing away” of a message might be useful. For example, imagine these situations:

You want to prove later that you already knew the solution to a problem, but you do not want to reveal it today.
You want to place a bet on the outcome of an event but do not want your opponent to know which side you picked in advance.
You are running a digital lottery and need to show that the winning number was (random but) fixed before the draw and not manipulated after the draw.
Or more nerdy, you want to convince someone you found a valid graph coloring without revealing the coloring itself (a classic zero-knowledge proof setting).

All of these require a way to commit to a secret so that two key properties are satisfied:

Hiding. Nobody learns the secret until you reveal it.
Binding. Once you have committed, you cannot change your mind.

That is precisely what commitment schemes do.

Figure 1: A completely useless AI generated image of a hashed commitment.

A Quick Primer on Hash Functions

The simplest commitment schemes rely on cryptographic hash functions such as SHA-256. A hash function H takes an input of arbitrary length and produces a fixed-size digest. Good cryptographic hashes satisfy several key properties:

Deterministic. The same input always produces the same output.
Preimage resistant. Given a hash h, it is infeasible to find any input m such that H(m)=h.
Second preimage resistant. Given an input m, it is infeasible to find a different m' with the same hash.
Collision resistant. It is infeasible to find any two distinct inputs m_1 != m_2 with H(m_1)=H(m_2).

Hashes are one-way functions: you can go from message to hash easily, but not back. This makes them perfect for commitments. Here “infeasible” is meant in the sense of computational complexity theory, i.e., it takes a lot of computational resources and time. While hash functions are designed to be collision-resistant they might still have collisions simply because we digest arbitrary-length messages into fixed-size digests. This means in particular it is infeasible (in the real sense) to invert a hash function simply because the preimage is not unique; we may find one preimage (if we are extremely lucky) but there is no guarantee that this is the one that was used to compute the hash.

Secret Commitments: The Basic Idea

The simplest commitment is:

C = H(message)

Later, you reveal message, and anyone can verify that H(message) == C.

However, this naive approach leaks some information. If message comes from a small set (like "YES" or "NO"), anyone can brute-force all possibilities and see which one matches C. That breaks the hiding property.

Hashing with Salt

To prevent this, we add randomness. The usual fix is to prepend a random salt to the message before hashing:

C = H(salt || message)

salt is a random value (e.g., 128 bits) that makes every commitment (assuming no collision) unique, even if the message is common.
Without knowing salt, an attacker cannot precompute hash tables or brute-force easily.

Later, you reveal (salt, message). Anyone can check that H(salt || message) == C. For example here is a hash from a hash tool that uses SHA-256 with a salt and an additional domain separation string to hash effectively H(domain || salt || message); see also the appendix below for a lightweight verification script.

e708b9946bee40db71df3486d2ea663f69ffb10a9ed8cabcd01b95aac4cc8028

with full commitment record:

{
  "commitment": "e708b9946bee40db71df3486d2ea663f69ffb10a9ed8cabcd01b95aac4cc8028",
  "salt": "29ea6ad30117dfeb82928cd5b38c4c1a",
  "domain": "commit-v1:",
  "message": "Tomorrow it is going to rain.",
  "mode": "hash",
  "timestamp": "2025-09-28T17:57:29.790Z"
}

The hash alone is worthless but binds the commitment. Once the salt and message are revealed, anyone can check that H(salt || message) == C.

However, now we face an (important) design choice:

Publish the salt immediately. This guarantees stronger binding. You cannot change it later to try to force a collision but at the same time, while not being able to use precomputed tables, attackers can start brute-forcing the message; expensive but might be successful if the actual message is known to come from a small list of messages, e.g., “YES” or “NO”.
Keep the salt secret until reveal. This provides stronger hiding. Attackers cannot even start guessing until you reveal it but at the same time you may be accused of cheating with a new salt s and new message messageNew so that H(s || messageNew) == C; super unlikely but possible.

In practice, both are considered secure if the salt is long and random and it depends on the application: (a) publishing it makes accusations of “cheating with a new salt” impossible, (b) hiding it makes attacks against short message lists impossible.

Verifier-supplied randomness: stronger binding if needed

We can even go a step further and use additional verifier-supplied (or supplied via a public beacon) randomness r to compute the commitment. This provides even stronger binding than the basic scheme with salt: the committer cannot influence r and the salt still hides the message. The reasoning is one of those subtle but fundamental cryptography points which becomes clear once you think about who controls the randomness and when.

In the basic scheme you control both salt and message before committing. By contrast, if the verifier (or a randomness beacon) first provides a random challenge r, you now compute the commitment as

C = H(r || salt || message)

Why is this stronger?

You cannot “grind” over many salts anymore. With r fixed by someone else, you lose the ability to search over huge numbers of salts to pick a “nice” one with special properties.
Binding is strengthened even against powerful adversaries. Without r, binding relies on collision resistance alone. With r, the committer faces preimage resistance under a fixed prefix, which is much harder to game retroactively, even if someone could manufacture collisions offline.
It provides context and time binding. Different r values yield different commitments for the same message. If r is tied to a timestamp, block hash, or VRF output, the commitment could not have been generated before r existed.
It removes the “choose your own salt” accusation. Since you do not control all randomness, a verifier need not trust that you sampled salt honestly.

Analogy (lottery tickets):

Without r: you pick your own ticket numbers and could generate many until you like one.
With r: the machine draws a random number you must include; advance “ticket shopping” no longer helps.

Summary:

Property	`H(salt \|\| message)`	`H(r \|\| salt \|\| message)`
Who controls randomness?	Committer	Verifier & Committer
Binding (practical)	Strong (collision resistance)	Stronger (fixed-prefix preimage)
Hiding	Strong (if salt kept secret)	Same
Resistance to salt grinding	❌	✅
Time/context binding	❌	✅
Trust model	Some trust in committer	No trust needed

Such stronger guarantees that use verifier-provided nonces or public randomness sources eliminate additional theoretical avenues for manipulation and are foundational to advanced protocols like verifiable delay functions, blockchain randomness beacons, and secure multiparty computation.

HMAC Mode: Adding a Secret Key

Another variant uses an HMAC (Hash-based Message Authentication Code):

C = HMAC(key, message)

Here key is a secret shared between two parties. The result:

Only someone with the key can generate a valid commitment.
Only someone with the key can verify it.

General structure of HMAC (as a function of H). At a high level, HMAC wraps a hash H in an inner/outer composition with two fixed byte patterns (often called “pads”) and the secret key:

HMAC_H(key, msg) = H( (key_xor_opad) || H( (key_xor_ipad) || msg ) )

where key_xor_ipad and key_xor_opad are the secret key combined with two fixed constants (one for the inner hash, one for the outer hash). This gives message authentication on top of H without exposing the raw key, and it prevents simple extension attacks against Merkle–Damgård hashes.

Remark (why HMAC is different than hashing with a private salt). Using a secret salt like H(secret || msg) or H(msg || secret) is not the same as HMAC:

A private-salt hash is typically vulnerable to length-extension/structural issues with Merkle–Damgård hashes (this includes SHA-256); HMAC’s inner/outer pads provide domain separation that defeats these.
HMAC has strong, widely accepted security guarantees (MAC/PRF) under assumptions on H, whereas a naive private-salt construction does not.
Verification differs: with a private salt, revealing the salt for verification weakens reuse (this affects the reusability requirements of the key); with HMAC, verifiers share a key and tags reveal nothing about the raw key.

What is length extension? A simple example.

Suppose you try to authenticate with tag = H(secret || msg) using SHA-256. An attacker who sees msg and tag can compute a valid tag for an extended message msg || padding || extra without knowing secret:

They reconstruct SHA-256’s padding for the unknown secret || msg length.
They continue the hash state from tag to process extra and produce a forged tag for the longer message.

Result. The attacker forges tag' for msg' = msg || padding || extra that will verify under the naive scheme with secret. This is a serious problem if the secret should be reused as key. HMAC’s inner/outer construction prevents this kind of extension because the outer hash re-keys and finalizes the computation in a way that cannot be resumed by an attacker. In the context of commitments, this is not a problem if a fresh secret, i.e., a salt is used for each commitment.

Note that HMAC is not strictly a commitment scheme; it is primarily for authenticity, but it is useful when commitments must be verifiable only by authorized parties.

Applications: Where Commitments Matter

Commitment schemes may seem abstract, but they have applications in many real-world protocols.

Zero-Knowledge Proofs: Graph Coloring

One of the most elegant applications of commitments is in zero-knowledge proofs. These are interactive protocols that let a prover convince a verifier that a certain statement is true without revealing anything beyond the truth of the statement itself. A canonical example uses the NP-complete Graph 3-Coloring problem.

The Setting.

The prover (P) knows a valid 3-coloring of a graph $G=(V,E)$ but does not want to reveal it to the verifier (V).
The verifier (V) wants to be convinced that P indeed knows such a coloring.

Protocol: Iterative Zero-Knowledge Proof for 3-Coloring.

Commitment Phase:
- P randomly permutes the color labels (e.g., swaps “red,” “green,” and “blue”).
- For each vertex $v_i$, P commits to the permuted color with a salt:
  \[C_i = H(\text{salt}_i \parallel \pi(c(v_i)))\]
- Salts are freshly sampled for each round (and for each vertex); do not reuse salts across rounds.
- P sends all commitments ${C_i}$ to V.
Challenge Phase:
- V picks a random edge $e=(u,v)$ and sends it as a challenge.
Opening Phase:
- P reveals the committed colors and salts for $u$ and $v$.
- V verifies:
  - The openings match the original commitments.
  - $c’(u) \neq c’(v)$.
Repeat:
- Steps 1–3 are repeated $k$ times with fresh random permutations and fresh salts each round.

Each round convinces V a little more. If P does not have a valid coloring, the probability of cheating successfully falls exponentially with $k$.

Remark: Why Shuffle Every Round? Without the random permutation step (and fresh salts), repeated openings would gradually leak the full coloring. The shuffling ensures that even after many rounds, no information beyond “adjacent nodes differ” is revealed.

Complexity Perspective. The 3-coloring problem is NP-complete: verifying a proposed coloring is easy (polynomial time), but finding one is believed to be hard. This protocol does not make solving NP-complete problems easier, but it demonstrates a powerful result:

Every language in NP has a zero-knowledge proof system (under standard cryptographic assumptions).

This is a foundational insight in modern cryptography. The graph-coloring protocol is a canonical constructive example.

Betting and Predictions

Commitments are common in prediction markets or bets. You commit to “Team A will win” by publishing C = H(salt || prediction) now. After the match, you reveal the prediction. Because the commitment was binding, nobody can accuse you of changing your guess after the fact.

Timestamped Knowledge Proofs

Researchers or inventors can commit to a discovery before publication. Later, revealing the original text proves they knew it at the earlier time. Useful for priority claims and intellectual property.

Digital Lotteries and Random Draws

A lottery organizer commits to a random seed before ticket sales close: C = H(seed). After the sale, they reveal the seed, and everyone can verify that the winning draw was predetermined and not manipulated after the fact.

Authenticity and Integrity Checks

If you want to authenticate data without revealing it, you can commit to its hash in advance. Later, when the data is disclosed, others can verify it has not changed. This is basically what the MD5 checksums did for downloads in the past.

Hashes Are Not Encryption

One final note: a hash is not encryption.

Feature	Hash Function	Encryption
Purpose	One-way fingerprint of data	Reversible hiding of data
Direction	One-way (irreversible)	Two-way (reversible with key)
Recover plaintext?	❌ Impossible	✅ Yes, with key

You cannot “decrypt” a hash. This one-way property is precisely what makes hashes ideal for commitments.

References

[BR] Bellare, M., & Rogaway, P. (2005). Introduction to modern cryptography. Ucsd Cse, 207, 207.

[KL] Katz, J., & Lindell, Y. (2007). Introduction to modern cryptography: principles and protocols. Chapman and hall/CRC.

[RFC2104] RFC 2104: HMAC: Keyed-Hashing for Message Authentication

Appendix: Lightweight Verification Script (Python)

Below is a minimal verifier you can use to check commitments encoded as JSON. It supports both SHA-256 and HMAC-SHA256 modes; you can also use it to create commitments by using the computed hash.

#!/usr/bin/env python3
"""Lightweight commitment verification script."""

import sys
import os
import json
import hashlib
import hmac
from datetime import datetime

def hex_to_bytes(hex_str):
    """Convert hex string to bytes."""
    return bytes.fromhex(hex_str)

def bytes_to_hex(data):
    """Convert bytes to hex string."""
    return data.hex()

def verify_commitment(data, hmac_key=None):
    """Verify a commitment from JSON data."""

    # Extract components
    commitment = data['commitment']
    salt_hex = data['salt']
    domain = data['domain']
    message = data['message']
    mode = data.get('mode', 'hash')
    timestamp = data.get('timestamp')

    # Convert salt to bytes
    salt = hex_to_bytes(salt_hex)

    # Encode domain and message to UTF-8 bytes
    domain_bytes = domain.encode('utf-8')
    message_bytes = message.encode('utf-8')

    # Recreate the data that was hashed
    data_bytes = salt + domain_bytes + message_bytes

    # Hash the data
    if mode == 'hmac':
        if not hmac_key:
            print("❌ HMAC mode requires key. Use --key parameter or set HMAC_KEY environment variable")
            return False, None

        # HMAC-SHA256 verification
        computed = hmac.new(hmac_key.encode('utf-8'), data_bytes, hashlib.sha256).digest()
        computed_hex = bytes_to_hex(computed)
    else:
        # SHA-256 hash
        computed = hashlib.sha256(data_bytes).digest()
        computed_hex = bytes_to_hex(computed)

    # Verify
    is_valid = computed_hex == commitment

    return is_valid, {
        'commitment': commitment,
        'computed': computed_hex,
        'message': message,
        'domain': domain,
        'mode': mode,
        'timestamp': timestamp
    }

def main():
    """Main verification function."""
    try:
        # Parse command line arguments
        hmac_key = None
        if len(sys.argv) > 1:
            if sys.argv[1] == '--key' and len(sys.argv) > 2:
                hmac_key = sys.argv[2]
            elif sys.argv[1].startswith('--key='):
                hmac_key = sys.argv[1][6:]  # Remove '--key=' prefix

        # Also check environment variable
        if not hmac_key:
            hmac_key = os.environ.get('HMAC_KEY')

        # Read JSON from stdin
        json_input = sys.stdin.read().strip()

        if not json_input:
            print("❌ No JSON input provided")
            print("Usage: echo 'PASTE_JSON_HERE' | python3 verify_commitment.py [--key YOUR_KEY]")
            print("   or: export HMAC_KEY=your_key")
            print("   or: echo 'PASTE_JSON_HERE' | python3 verify_commitment.py --key=your_key")
            return

        # Parse JSON
        try:
            data = json.loads(json_input)
        except json.JSONDecodeError as e:
            print(f"❌ Invalid JSON: {e}")
            return

        # Verify commitment
        is_valid, details = verify_commitment(data, hmac_key)

        # Output results
        print("\n\n")
        print(f"Verification: {'✅ VERIFIED' if is_valid else '❌ NOT VERIFIED'}")

        if details:
            print(f"Mode: {details['mode']}")
            print(f"Domain: {details['domain']}")
            print(f"Message: {details['message']}")

            if details['timestamp']:
                try:
                    dt = datetime.fromisoformat(details['timestamp'].replace('Z', '+00:00'))
                    print(f"Timestamp: {dt.strftime('%Y-%m-%d %H:%M:%S UTC')}")
                except:
                    print(f"Timestamp: {details['timestamp']}")

            if not is_valid and details.get('computed'):
                print(f"Expected: {details['commitment']}")
                print(f"Computed: {details['computed']}")

    except KeyboardInterrupt:
        print("\n⚠️  Interrupted")
    except Exception as e:
        print(f"❌ Error: {e}")

if __name__ == "__main__":
    main()