thash method
is an improved way of using
cryptographic hash functions
for integrity purposes, designed to maximize speed when processing large
volumes of data.
auditor is the first forensic tool to
implement this method. See more about auditor
In the normal method, which is how all tools work, the content of the file is read and hashed as a whole, as a single block:
In the thash method, we divide the file into individual blocks of size BlockSize, which are processed separately and in parallel. The hashes of each block are encoded in hexadecimal format and then concatenated in the same order as the blocks. Finally, the hash of this concatenated sequence of hashes is calculated.
It's similar to blockchains: hash of hashes.
However, if the entire file fits into a single block (fileSize <= blockSize), the normal method described above is applied.
Important: The hexadecimal format of H1, H2 ... Hn was chosen because it is the standard encoding used by most popular tools and simplifies the process of replicating thash with other tools.
Speed: Since block hashes can be computed in parallel, the process is significantly faster.
Security level: The same level as the normal method, since all original data is hashed and any modification results in a different hash. The idea of hash of hashes was inspired by blockchain systems widely used in cryptocurrencies. Click here for questions.
Hash Algorithm: Can be any algorithm, such as SHA256, SHA512, BLAKE3, etc.
BlockSize: This defines the size of each block with a unit of measurement (KB, MB, GB, TB, etc.). The last block will contain the remaining data (unless FileSize is an exact multiple of BlockSize). When processing very large amounts of data, BlockSize can be fixed for all files, or it can be automatically calculated for each file based on its properties (such as fileSize), or based on the hardware being used (storage type, SSDs, processor architecture, number of CPUs, available memory, etc.), among other factors.
Normal method identification:
SHA256 = normal method using hash algorithm: SHA256, without BlockSize.
thash identification: To correctly identify when a file was hashed using the thash method,
we use the tag
<THASH-BlockSize>
as:
SHA256<THASH-50MB> = algorithm: SHA256 using thash method with BlockSize: 50MB.
BLAKE3<THASH-1GB> = algorithm: BLAKE3 using thash method with BlockSize: 1GB.
Attention! To ensure that verification succeeds, it is necessary to use the same BlockSize used during generation; otherwise, it will fail. Just as you need to store which hash algorithm was used, the BlockSize must also be stored.
Below is a script that reproduces what the
thash method does. It can be
used as a proof of concept and also to verify the correctness of the
auditor.
proof.sh
#!/bin/bash
# This script simulates thash method with selectable hash algorithms.
#!/bin/bash
# This script simulates thash method with selectable hash algorithms.
if [ "$#" -ne 3 ]; then
echo "Use: $0 file alg_hash blockSize"
echo "Ex: $0 ./any_file sha256 10MB"
echo "alg_hash can be: sha256, sha512, sha3-256, sha3-512, k12 or blake3 ."
echo "blockSize is a number with KB, MB, GB or TB. Ex: 10MB, 5GB, etc"
echo "alg_hash will use one of commands to hash: sha256sum, sha512sum, sha3sum, k12sum or b3sum . So, it needs to in the path !"
echo "If the file is in current dir, use ./file"
exit 1
fi
# Get arguments
file="$1"
alg_hash="$2"
block_str_auditor="$3"
block_str="${block_str_auditor::-1}"
last_char="${block_str_auditor: -1}"
if [[ "$last_char" != "B" ]]; then
echo "Error: Last character must be 'B'" >&2
exit 1
fi
unity="${block_str_auditor: -2:1}"
if [[ ! "$unity" =~ ^[KMGTPE]$ ]]; then
echo "Error: Invalid unit '$unity'" >&2
exit 1
fi
# Test if file exists
if [ ! -f "$file" ]; then
echo "File '$file' not found."
exit 1
fi
# Select hashing utility based on alg_hash
option_cmd=""
case "$alg_hash" in
sha256)
hash_cmd="sha256sum"
;;
sha512)
hash_cmd="sha512sum"
;;
sha3-256)
hash_cmd="sha3sum"
option_cmd=" -a 256"
;;
sha3-512)
hash_cmd="sha3sum"
option_cmd=" -a 512"
;;
k12)
hash_cmd="k12sum"
;;
blake3)
hash_cmd="b3sum"
;;
*)
echo "Invalid hash algorithm. Choose: sha256, sha512, sha3-256, sha3-512, k12, blake3."
exit 1
;;
esac
if ! command -v "$hash_cmd" &> /dev/null; then
echo "Error: $hash_cmd is not installed. Install it and try again."
exit 1
fi
hash_cmd="$hash_cmd$option_cmd"
auditor_cmd="auditor"
if ! command -v "$auditor_cmd" &> /dev/null; then
echo "Error: $auditor_cmd is not installed. Install it and try again."
exit 1
fi
# Create dir, name files, clean eventual files
dir_name="${file}_dir_${block_str}_${alg_hash}"
file_chain_hash="${file}_${block_str}_chain.${alg_hash}.txt"
file_thash="${file}_${block_str}.${alg_hash}.thash.txt"
file_thash_auditor="${file}_${block_str}.${alg_hash}.thash.auditor.txt"
mkdir -p "$dir_name"
rm -f "$dir_name"/part_*
# Split file in blocks of blockSize
split -b "$block_str" "$file" "$dir_name"/part_
# Hash each split file and save the hash
for part_file in "$dir_name"/part_*; do
$hash_cmd "$part_file" > "$part_file"."${alg_hash}".hash.txt
# sha3sum prints c2a0 (in hex) after hash value. Need remove it, as well \n and spaces
cat "$part_file"."${alg_hash}".hash.txt | awk '{gsub(/ *\*.*/, ""); print $1}' | cut -d " " -f 1 | tr -d '\n' | sed $'s/\xc2\xa0//g' > "$part_file"."${alg_hash}".txt
done
# Chain all hashes
cat "$dir_name"/*."${alg_hash}".txt > "$file_chain_hash"
# Hash the chain of hashes, cutting spaces and newline char
$hash_cmd "$file_chain_hash" | cut -d " " -f 1 | tr -d '\n' > "$file_thash"
# Print result
echo "${alg_hash}<THASH-${block_str_auditor}> with this script and ${hash_cmd}:"
cat "$file_thash"
echo ""
echo ""
echo "${alg_hash}<THASH-${block_str_auditor}> with auditor:"
auditor hash "$file" -a "$alg_hash" -b "$block_str_auditor" -l -q > "$file_thash_auditor" 2> /dev/null
cat "$file_thash_auditor"
echo ""
echo "normal hash with ${hash_cmd}"
$hash_cmd "$file"
echo ""
echo "normal hash with auditor "
auditor hash "$file" -d -a "$alg_hash" -q -l 2> /dev/null
echo ""
Here are the results of proof.sh applied
to a random file:
proof.sh results:
If you have comments or concerns about the security of this method, a public discussion can be found here: crypto.stackexchange.com
Have suggestions or found a bug? Contact us at: [email protected]