Logothash method

thash method is an improved way of using cryptographic hash functions for integrity purposes, designed to maximize speed when processing large volumes of data.

auditor is the first forensic tool to implement this method. See more about auditor

How it Works

Normal Method

In the normal method, which is how all tools work, the content of the file is read and hashed as a whole, as a single block:

Normal Method
Fig.1 - Hashing in the normal method

thash method

In the thash method, we divide the file into individual blocks of size BlockSize, which are processed separately and in parallel. The hashes of each block are encoded in hexadecimal format and then concatenated in the same order as the blocks. Finally, the hash of this concatenated sequence of hashes is calculated.

It's similar to blockchains: hash of hashes.

However, if the entire file fits into a single block (fileSize <= blockSize), the normal method described above is applied.

How thash works
Fig.2 - Hashing with the thash method

Important: The hexadecimal format of H1, H2 ... Hn was chosen because it is the standard encoding used by most popular tools and simplifies the process of replicating thash with other tools.



Advantages

Speed: Since block hashes can be computed in parallel, the process is significantly faster.

Security level: The same level as the normal method, since all original data is hashed and any modification results in a different hash. The idea of hash of hashes was inspired by blockchain systems widely used in cryptocurrencies. Click here for questions.

Considerations

Hash Algorithm: Can be any algorithm, such as SHA256, SHA512, BLAKE3, etc.

BlockSize: This defines the size of each block with a unit of measurement (KB, MB, GB, TB, etc.). The last block will contain the remaining data (unless FileSize is an exact multiple of BlockSize). When processing very large amounts of data, BlockSize can be fixed for all files, or it can be automatically calculated for each file based on its properties (such as fileSize), or based on the hardware being used (storage type, SSDs, processor architecture, number of CPUs, available memory, etc.), among other factors.

Normal method identification:

SHA256 = normal method using hash algorithm: SHA256, without BlockSize.

thash identification: To correctly identify when a file was hashed using the thash method, we use the tag <THASH-BlockSize> as:

SHA256<THASH-50MB> = algorithm: SHA256 using thash method with BlockSize: 50MB.

BLAKE3<THASH-1GB> = algorithm: BLAKE3 using thash method with BlockSize: 1GB.

Attention! To ensure that verification succeeds, it is necessary to use the same BlockSize used during generation; otherwise, it will fail. Just as you need to store which hash algorithm was used, the BlockSize must also be stored.

Proof

Below is a script that reproduces what the thash method does. It can be used as a proof of concept and also to verify the correctness of the auditor.

proof.sh
#!/bin/bash
# This script simulates thash method with selectable hash algorithms.
#!/bin/bash
# This script simulates thash method with selectable hash algorithms.

if [ "$#" -ne 3 ]; then
    echo "Use: $0 file alg_hash blockSize"
    echo "Ex: $0 ./any_file sha256 10MB"
    echo "alg_hash can be: sha256, sha512, sha3-256, sha3-512, k12 or blake3 ."
    echo "blockSize is a number with KB, MB, GB or TB. Ex: 10MB, 5GB, etc"
    
    echo "alg_hash will use one of commands to hash: sha256sum, sha512sum, sha3sum, k12sum or b3sum . So, it needs to in the path !"
    echo "If the file is in current dir, use ./file"
    exit 1
fi

# Get arguments
file="$1"
alg_hash="$2"
block_str_auditor="$3"
block_str="${block_str_auditor::-1}"

last_char="${block_str_auditor: -1}"
if [[ "$last_char" != "B" ]]; then
    echo "Error: Last character must be 'B'" >&2
    exit 1
fi

unity="${block_str_auditor: -2:1}"

if [[ ! "$unity" =~ ^[KMGTPE]$ ]]; then
    echo "Error: Invalid unit '$unity'" >&2
    exit 1
fi

# Test if file exists
if [ ! -f "$file" ]; then
    echo "File '$file' not found."
    exit 1
fi

# Select hashing utility based on alg_hash
option_cmd=""
case "$alg_hash" in
    sha256)
        hash_cmd="sha256sum"
    ;;
    sha512)
        hash_cmd="sha512sum"
    ;;
    sha3-256)
        hash_cmd="sha3sum"
        option_cmd=" -a 256"
    ;;
    sha3-512)
        hash_cmd="sha3sum"
        option_cmd=" -a 512"
    ;;
    k12)
        hash_cmd="k12sum"
    ;;
    blake3)
        hash_cmd="b3sum"
    ;;
    *)
        echo "Invalid hash algorithm. Choose: sha256, sha512, sha3-256, sha3-512, k12, blake3."
        exit 1
    ;;
esac

if ! command -v "$hash_cmd" &> /dev/null; then
    echo "Error: $hash_cmd is not installed. Install it and try again."
    exit 1
fi
hash_cmd="$hash_cmd$option_cmd"

auditor_cmd="auditor"

if ! command -v "$auditor_cmd" &> /dev/null; then
    echo "Error: $auditor_cmd is not installed. Install it and try again."
    exit 1
fi

# Create dir, name files, clean eventual files
dir_name="${file}_dir_${block_str}_${alg_hash}"
file_chain_hash="${file}_${block_str}_chain.${alg_hash}.txt"
file_thash="${file}_${block_str}.${alg_hash}.thash.txt"
file_thash_auditor="${file}_${block_str}.${alg_hash}.thash.auditor.txt"
mkdir -p "$dir_name"
rm -f "$dir_name"/part_*

# Split file in blocks of blockSize
split -b "$block_str" "$file" "$dir_name"/part_

# Hash each split file and save the hash
for part_file in "$dir_name"/part_*; do
    
    $hash_cmd "$part_file" > "$part_file"."${alg_hash}".hash.txt
    
    # sha3sum prints c2a0 (in hex) after hash value. Need remove it, as well \n and spaces
    cat "$part_file"."${alg_hash}".hash.txt | awk '{gsub(/ *\*.*/, ""); print $1}' | cut -d " " -f 1 | tr -d '\n' | sed $'s/\xc2\xa0//g' > "$part_file"."${alg_hash}".txt
    
done

# Chain all hashes
cat "$dir_name"/*."${alg_hash}".txt > "$file_chain_hash"

# Hash the chain of hashes, cutting spaces and newline char
$hash_cmd "$file_chain_hash" | cut -d " " -f 1 | tr -d '\n' > "$file_thash"

# Print result
echo "${alg_hash}<THASH-${block_str_auditor}> with this script and ${hash_cmd}:"
cat "$file_thash"
echo ""

echo ""
echo "${alg_hash}<THASH-${block_str_auditor}> with auditor:"
auditor hash "$file" -a "$alg_hash" -b "$block_str_auditor" -l -q > "$file_thash_auditor" 2> /dev/null
cat "$file_thash_auditor"

echo ""
echo "normal hash with ${hash_cmd}"
$hash_cmd "$file"

echo ""
echo "normal hash with auditor "
auditor hash "$file" -d -a "$alg_hash" -q -l 2> /dev/null
echo ""
        

Here are the results of proof.sh applied to a random file:

proof.sh results: Proof

Questions about security

If you have comments or concerns about the security of this method, a public discussion can be found here: crypto.stackexchange.com

Have suggestions or found a bug? Contact us at: [email protected]