WSJ visualizes Foursquare user activity in “A week on Foursquare”
Immediately after seeing Wall Street Journal’s recent interactive visualization “A week on Foursquare” I was eager to learn how it was developed. I sought out insights from Albert Sun, graphics producer at WSJ who was one of three producers on this package, alongside Jennifer Valentino-DeVries and Zach Seward. Read on to learn how this project came together and the hiccups they experienced along the way.
“We started working on it last summer when Jen Valentino, who covers tech here, got us access to the Foursquare check-in firehose and along with our social media editor Zach Seward we started brainstorming a project to do something with this data. This was a side project for all of us, we were squeezing it in between things like our What They Know series, the midterm elections, Japan, Libya, etc.
First, I started working on a script to read and save the firehose data which would turn out to be tricky to do in a consistent manner. I had the script just running on the iMac on my desk and would often come in on Monday morning to find that it had stopped running for some reason.
Once we had some data we just started running some simple descriptive statistics and counting things in it. We put a small sample of the data into Fusion Tables, and another smaller sample just in Excel, just to play with it.
We knew we’d want a heatmap of the data, because we had seen Steven Lehrburger’s project and kind of envisioned this as a “Where does everyone go?” I used a variant of the heatmap rendering program he used, called Gheat. (I used Gheat for Django, his was Gheat for App Engine) After fiddling with the program a bit and modifying it a bit, we used EC2 servers to render the tiles. Originally I had planned to use the Google Maps API to render a heatmap of the entire world for the whole week, but there were just too many tiles and it never looked good when zoomed away from any particular urban area.
Eventually we decided to restrict the scope of the project to NYC and SF, cities that we personally knew about, and where we thought there would be the highest interest in this kind of thing.
The line charts comparing men and women, and NYC to SF were inspired by a study called Demographic Diversity on the Web done by Yahoo Research.
As far as things done differently, I wish I had been able to get a scale or key or some kind of detail on the heatmaps (to for example, say that there were 100 check-ins within 100 meters of this street corner in an hour). I think I could’ve done this with R if I had committed to using it from the beginning, but I only started learning and using R for analysis later on when making the line charts and other plots.”
Other posts that might interest you:
Tags: Albert Sun, data visualization, Foursquare, Gheat, heatmap, Jennifer Valentino-DeVries, Steven Lehrburger, Wall Street Journal, Zach Seward