Inbox Analysis
Just another day staring at my inbox, attempting to get to inbox zero. I noticed that Tim Ferriss and Ramit Sethi sent out an email blast within minutes of each other, this got me thinking “When do I get the most emails?” Those two are pretty big on A/B testing and being quantitative marketers so I wanted to see if there was any patterns in my inbox.
I first started out messing around with the Gmail API but after a few misfires I did some research and found I did not have to recreate the wheel. You can export your google data using Google Takeout the process took about 24 hours and I had a 2.8 Gb MBOX file of all the information associated with my gmail account.
I wanted to look at the following information, “To,” “From,” and “Date.”
Using Python I could parse the MBOX file into a more malleable csv
import mailbox import csv from email.utils import parsedate_tz mbox = mailbox.mbox('Matt_mail.mbox') outputFile = open('rawcsv_412.csv', 'wb') outputWriter = csv.writer(outputFile) for message in mbox: outputWriter.writerow([message['to'], message['from'], message['date'], parsedate_tz(message['date'])]) outputFile.close()
The date function in a mbox file is hard to work with so I used “parsedate_tz” from email.utils. This is the point where I used some Excel magic to clean up the csv a bit (Yes this could have been 100% Python but I know I can do this in Excel much faster). I removed the parenthesis around the parsed date, and used the text to column function to separate out the datetime information
Since string manipulation is easier in pandas, I ran the following script for further cleaning:
import pandas as pd import numpy as np from pandas import ExcelWriter data = pd.ExcelFile('Email_Draft.xlsx') df = data.parse('rawcsv_412') print len(df.index) #Clean out nulls in the To column df_1 = df[df.To.notnull()] print len(df_1.index) # Create new columns for selected emails df_1['Me'] = 0 df_1['Ramit'] = 0 df_1['Tim'] = 0 df_1['Insider'] = 0 df_1['Me'][df_1['From'].str.contains("MyEmail@gmail.com")] = 1 df_1['Ramit'][df_1['From'].str.contains("ramit@iwillteachyoutoberich.com")] = 1 df_1['Tim'][df_1['From'].str.contains("tim@fourhourbody.com")] = 1 df_1['Insider'][df_1['From'].str.contains("newsletter@businessinsider.com")] = 1 #concatenate in python, in the future sticking with Excel, but hey why not def concat(*args): strs = [str(arg) for arg in args if not pd.isnull(arg)] return '/'.join(strs) if strs else np.nan np_concat = np.vectorize(concat) df_1['ShortDate'] = np_concat(df_1['Month'],df_1['Date'],df_1['Year']) #Write to a new Excel file writer = ExcelWriter('PythonExport.xlsx') df_1.to_excel(writer,'Sheet1') writer.save()
Then add a Day of week column “=Text(Cell, “DDDD”)”
Now I had a spreadsheet that looks like the following
After some sanity checks, it’s time for some exploratory analysis.
Looks like getting to Inbox Zero is getting harder every year!
We see a clear trend that Ramit sends 41% of his emails at 11 am EST, while the all email average is ~7%. Does this mean we should all send emails at 11? Not necessarily, look at Business Insider which primarily sends emails at 4 PM EST. Different emails categories are better suited at different times; emails that have action steps may be better suited earlier in the day, easy to digest articles might be better near the end of the work day. Also without email click through rate data from multiple people I can’t give an accurate answer on the best time to send emails.
Then I got curious about who is sending the most emails over time:
Thankfully I turned off Facebook notifications by email in 2010, and if anyone is familiar with chapterspot you probably have memories of long frat email chains over nothing. As expected emails are more concentrated during typical work hours, and fall off during nights and weekends. From the heatmap below I see the Weekend starts at 9 PM on Friday.
I use Unroll.me to clean up my inbox, the email total count stays the same but the amount of time spent in gmail goes way down. Also Immersion by MIT has a pretty cool visual based off your email metadata; graphing your email network and seeing with whom you message most.
Leave a Reply
Want to join the discussion?Feel free to contribute!