We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Facebook versus Foursquare: The New York City War of 2012

by SAS Employee cakramer on ‎10-30-2015 10:38 AM - edited on ‎10-30-2015 01:03 PM by Community Manager (1,323 Views)



What’s this data? 


In 2012, when it enacted the Open Data Law, New York City became one of the first municipal governments to embrace transparency through open data.  According to a city report, the law requires each borough to identify and publish all of its digital data by 2018 and give an account each July 15 of its progress.


Today, we dip into this rich data source to gain insight into New Yorkers' social media use to interact with city departments and agencies. We examine which social platforms citizens used most between August 2011 and November 2012. Fortunately, these data already include a variable we can use: Likes_Followers_Visits_Downloads.


How to download


If you don’t already have University Edition, get it here and carefully follow the pdf instructions. If you need help with almost any aspect of using University Edition, check out these video tutorials. Additional resources are available in this article.


Go here to download the data. Look for this icon in the top right corner...





...and hit export. In the drop down menu, select csv. Save it to your hard drive and you've got your data ready to load up into University Edition.


How to get the data and prep it for analysis 


First, use the upcase function on the platform variable so that you can make the responses uniform. For example, "Youtube" and "YouTube" will now be considered the same as "YOUTUBE."  Next, do a datepart on date_sampled to eliminate the time off the end. Date_sampled appears to be a date time value with the time being midnight for every single value. However, eliminating the time will make the data easier to use. FreeDataFriday_graphic.jpg


Use the datepart function and then format the date variable. To eliminate all observations after November 1, 2012 use the number created above, as SAS date values are the number of days since January 1, 1960. For the sake of juxtaposition, we will create two datasets, one for each platform we're comparing:  Facebook and Foursquare.  In both cases, our target variable is going to be Likes_Followers_Visits_Downloads.


The sgplot will be very similar to the syntax of the bubble plot we made a few weeks ago. The major difference is the group= statement. This allows us to have a bar to represent each instance of the variable you assign to it. We use URL so we can examine which exact pages are doing well.




proc import datafile="/folders/myfolders/my_data/NY Social Media.csv" 
out=NY_Social_Media dbms=csv replace;

data NY_Social_Media;
set NY_Social_Media;
format date mmddyys.;
drop date_sampled;

data ny_soc;
set NY_Social_Media;
where date < (365.25*52 + 303);

proc sort data=ny_soc;
by platform agency url;

proc means data=ny_soc;
class platform;
var Likes_Followers_Visits_Downloads;

data foursquare;
set NY_Soc;
where platform="FOURSQUARE";

proc sort data=foursquare;
by date;

data facebook;
set NY_Soc;
where platform="FACEBOOK";

proc sort data=facebook;
by date;

proc sgplot data=facebook;
series x=date y=likes_followers_visits_downloads/ group=url;

proc sgplot data=foursquare;
series x=date y=likes_followers_visits_downloads/ group=url;




Below is an analysis of the Likes_Followers_Visits_Downloads by Platform. The difference between N and N Obs is the amount of missing data for each platform. Here we can see that Facebook has (1428-1251) = 177 missing values for Likes_Followers_Visits_Downloads and Foursquare has (170-123) = 47 missing values.


Foursquare's minimum value for Likes_Followers_Visits_Downloads is 0. Facebook's minimum for Likes_Followers_Visits_Downloads is 3. The maximum for Likes_Followers_Visits_Downloads for Facebook is over three times larger than the maximum for Foursquare (110,570 to 34,096). Looking at the maximum, it may not be surprising that Facebook would have a higher maximum with so many more pages overall. 


post 7 anova.JPG


The graph below represents the various Facebook pages and the amount of Likes_Followers_Hits_Downloads they have. Each line represents a different Facebook page.


post 7 facebook.jpg

The graph below represents the various Foursquare pages and the amount of Likes_Followers_Hits_Downloads they have. Each line represents a different Foursquare page.


post 7 foursquare.JPG



What does this output mean?


We can use the eyeball test to see that the use of these two social platforms differ greatly. While they look extremely different, due to the amount of missing data and large difference in N size, we can't draw any grand conclusions.


What we do know is, compared to Facebook interactions, Foursquare sees less traffic in its respective unit.  The means procedure showed us that Facebook has a higher mean. The graphs show us that while they both platforms have some pages that stand out, the other pages cluster at a much lower value in comparison to the stand out pages. In Foursquare all the pages tend to have very low traffic mostly, if not entirely, below the mean (3746.70). The Facebook pages tend to sit around the actual mean (4361.17) of the variable. The few stand out pages for Foursquare really affect the mean and cause the mean to be far higher than the typical page.


Unfortunately, this is where our data ends. This article was released about a month after this data ends and offers a little more insight into this comparison, and what happened next between these two competitors. Of course, you probably have a good idea of the trend because, when was the last time you updated your Foursquare page?


Now it’s your turn!


Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.


Need data for learning?


The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:



We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:




Click Analytics U, then select "Subscribe" from the Options menu.


Happy Learning!

by Super Contributor
on ‎10-30-2015 06:24 PM

I have been wanting to play around with Twitter data for a while.  This has given me some ideas on what i can do, and once again very well written and well-researched!!  


Happy Halloween :-)


Your turn
Sign In!

Want to write an article? Sign in with your profile.

Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.