Inbox Analysis

Just another day staring at my inbox, attempting to get to inbox zero. I noticed that Tim Ferriss and Ramit Sethi sent out an email blast within minutes of each other, this got me thinking  “When do I get the most emails?” Those two are pretty big on A/B testing and being quantitative marketers so I wanted to see if there was any patterns in my inbox.

I first started out messing around with the Gmail API but after a few misfires I did some research and found I did not have to recreate the wheel. You can export your google data using Google Takeout the process took about 24 hours and I had a 2.8 Gb MBOX file of all the information associated with my gmail account.

I wanted to look at the following information, “To,” “From,” and “Date.”

Using Python I could parse the MBOX file into a more malleable csv

import mailbox
import csv
from email.utils import parsedate_tz

mbox = mailbox.mbox('Matt_mail.mbox')
outputFile = open('rawcsv_412.csv', 'wb')
outputWriter = csv.writer(outputFile)

for message in mbox:
    outputWriter.writerow([message['to'], message['from'], message['date'], parsedate_tz(message['date'])]) 

outputFile.close()

The date function in a mbox file is hard to work with so I used “parsedate_tz” from email.utils. This is the point where I used some Excel magic to clean up the csv a bit (Yes this could have been 100% Python but I know I can do this in Excel much faster). I removed the parenthesis around the parsed date, and used the  text to column function to separate out the datetime information

Since string manipulation is easier in pandas, I ran the following script for further cleaning:

import pandas as pd
import numpy as np
from pandas import ExcelWriter

data = pd.ExcelFile('Email_Draft.xlsx')
df = data.parse('rawcsv_412')

print len(df.index)


#Clean out nulls in the To column
df_1 = df[df.To.notnull()]

print len(df_1.index)

# Create new columns for selected emails
df_1['Me'] = 0
df_1['Ramit'] = 0
df_1['Tim'] = 0
df_1['Insider'] = 0 

df_1['Me'][df_1['From'].str.contains("MyEmail@gmail.com")] = 1
df_1['Ramit'][df_1['From'].str.contains("ramit@iwillteachyoutoberich.com")] = 1
df_1['Tim'][df_1['From'].str.contains("tim@fourhourbody.com")] = 1
df_1['Insider'][df_1['From'].str.contains("newsletter@businessinsider.com")] = 1


#concatenate in python, in the future sticking with Excel, but hey why not
 
def concat(*args):
    strs = [str(arg) for arg in args if not pd.isnull(arg)]
    return '/'.join(strs) if strs else np.nan
np_concat = np.vectorize(concat)


df_1['ShortDate'] = np_concat(df_1['Month'],df_1['Date'],df_1['Year'])

#Write to a new Excel file
writer = ExcelWriter('PythonExport.xlsx')
df_1.to_excel(writer,'Sheet1')
writer.save()

Then add a Day of week column “=Text(Cell, “DDDD”)”

Now I had a spreadsheet that looks like the following

Email

After some sanity checks, it’s time for some exploratory analysis.

Trend email

Looks like getting to Inbox Zero is getting harder every year!

By Hour

We see a clear trend that Ramit sends 41% of his emails at 11 am EST, while the all email average is ~7%. Does this mean we should all send emails at 11? Not necessarily, look at Business Insider which primarily sends emails at 4 PM EST. Different emails categories are better suited at different times; emails that have action steps may be better suited earlier in the day, easy to digest articles might be better near the end of the work day. Also without email click through rate data from multiple people I can’t give an accurate answer on the best time to send emails.

Then I got curious about who is sending the most emails over time:

Top10

Thankfully I turned off Facebook notifications by email in 2010, and if anyone is familiar with chapterspot you probably have memories of long frat email chains over nothing. As expected emails are more concentrated during typical work hours, and fall off during nights and weekends. From the heatmap below I see the Weekend starts at 9 PM on Friday.

 

heatmap

 

I use Unroll.me to clean up my inbox, the email total count stays the same but the amount of time spent in gmail goes way down. Also Immersion by MIT has a pretty cool visual based off your email metadata; graphing your email network and seeing with whom you message most.

 

 

 

 

 

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *