Have you ever wanted to share a list of emails without exposing sensitive data for the emails that are unknown to a partner?
Enter one-way cryptographic hash functions
Crypto – Not Just the magic sauce for Bitcoin, cryptographic hash functions underpin most website security.
A one-way hash function is an algorithm which takes an input, encrypts and outputs a fixed length hash. A strong algorithm is one where the hash is irreversible with modern computing power. As in there is no current way to go from the hashed output to the input
The most commonly adopted standard for one-way hash functions is the SHA-256 algorithm. The output of the SHA-256 algorithm is a fixed length encoded 64-character string.
- “Tevunah” becomes ‘a5dd7414c077318ba6c21a9620aa78aecfb86c4d65cc362366e5222d0867a9b1’
- “tevunah” becomes ‘66514d92bd1c19543a663ccff8d31522d1d9c4a853102231154cb67dd3f33b43’
Salting
- Example: ” Matt@Tevunah.com” becomes “matt@tevunah.comtable”
The primary function of salts is to defend against dictionary attacks or against its hashed equivalent, a pre-computed rainbow table attack.
By adding a random salt you substantially decrease the chance of finding an email in a lookup table of hashes for common names.
The hash of ‘helloworld’ 936a185caaa266bb9cbe981e9e05cb78cd732b0b3280eb944412bb6f8f8f07af is found in a lookuptable but the hash of helloworldtable is not.
In Python:
import hashlib import pandas as pd email_df = pd.read_csv('~/Downloads/Emailtest.csv') email_df['email'] = email_df['email'].str.strip().str.lower() def hash_email(email): salt = 'table' return hashlib.sha256(str(email) + salt).hexdigest() email_df['emailhash'] = email_df['email'].apply(hash_email) email_df.to_csv('~/Downloads/Emailtesthashed.csv')
In Excel: ‘=CONCAT(TRIM(LOWER(email_
client_df = pd.read_csv('~/Downloads/our_list.csv') joined_df = email_df.join(client_df.set_index('emailhash'), on='emailhash', how='inner') joined_df.to_csv('~/Downloads/joined.csv')
hashing eliminates the need of sharing the full customer list, and clients can solely share the hashed results so we can determine the matched emails.