We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Using SAS to read the White House Visitors Log and other Socrata-hosted data sets

by Frequent Contributor on ‎01-04-2016 08:48 AM - edited on ‎01-04-2016 08:56 AM by Community Manager (1,008 Views)

In Analyzing Open Data with SAS, Socrata's Mark Silverberg explains how to use SAS code and the Socrata API to consume a Socrata-hosted open data dataset (the CDC's Behavioral Risk Factor Surveillance System health survey data).

 

There's a wealth of Socrata-hosted datasets available on the web, including the White House Visitors Log. Below is a little SAS code that retrieves data from the WH Visitors Log. Plug in the name of your favorite public figure - the example uses Oprah - and see not only when they visited, but also with whom they met. Sure beats having to download 100+ MB of zipped .csv files!

 

*==> 1. Run request to retrieve White House Visitor Log meeting IDs (UINs) from Socrata;

filename whquery url 'http://open.whitehouse.gov/resource/p86s-ychb.csv?namelast=winfrey&namefirst=oprah&$limit=30000';

data whmeetings;
infile whquery dsd dlm=',' lrecl=2000 truncover firstobs=2; 
input NAMELAST : $30. NAMEFIRST : $30. NAMEMID : $30. UIN : $30. BDGNBR : $30. 
      Type_of_Access : $30. TOA : ANYDTDTM40. POA : $30. TOD : ANYDTDTM40. 
      POD : $30. APPT_MADE_DATE : ANYDTDTM40. APPT_START_DATE : ANYDTDTM40. 
      APPT_END_DATE : ANYDTDTM40. APPT_CANCEL_DATE : ANYDTDTM40. Total_People 
      LAST_UPDATEDBY : $30. POST : $30. LastEntryDate : ANYDTDTM40. 
      TERMINAL_SUFFIX : $30. visitee_namelast : $30. visitee_namefirst : $30. 
      MEETING_LOC : $30. MEETING_ROOM : $30. CALLER_NAME_LAST : $30. 
      CALLER_NAME_FIRST : $30. CALLER_ROOM : $30. Description : $255.
      RELEASE_DATE : MMDDYY10.;
FORMAT TOA TOD APPT_MADE_DATE APPT_START_DATE APPT_END_DATE APPT_CANCEL_DATE
       LastEntryDate DATETIME16. RELEASE_DATE MMDDYY10.;

proc sort data=whmeetings nodupkey; by uin;  * De-dupe;

*==> 2. Run request to retrieve meeting attendees from Socrata;

filename whquery2 url 'http://www.google.com';  * Placeholder URL initially;

data whattendees;
length whquery2 $255.;                     * Drop observations with no appointment ID;
set whmeetings(keep=UIN rename=(uin=inuin) where=(inuin^='')); 
whquery2="http://open.whitehouse.gov/resource/p86s-ychb.csv?UIN="||trim(inuin);
do while(^eof);
  infile whquery2 url filevar=whquery2  dsd dlm=',' lrecl=2000 truncover firstobs=2 end=eof;
  input NAMELAST : $30. NAMEFIRST : $30. NAMEMID : $30. UIN : $30. BDGNBR : $30. 
        Type_of_Access : $30. TOA : ANYDTDTM40. POA : $30. TOD : ANYDTDTM40. 
        POD : $30. APPT_MADE_DATE : ANYDTDTM40. APPT_START_DATE : ANYDTDTM40. 
        APPT_END_DATE : ANYDTDTM40. APPT_CANCEL_DATE : ANYDTDTM40. Total_People 
        LAST_UPDATEDBY : $30. POST : $30. LastEntryDate : ANYDTDTM40. 
        TERMINAL_SUFFIX : $30. visitee_namelast : $30. visitee_namefirst : $30. 
        MEETING_LOC : $30. MEETING_ROOM : $30. CALLER_NAME_LAST : $30. 
        CALLER_NAME_FIRST : $30. CALLER_ROOM : $30. Description : $255.
        RELEASE_DATE : MMDDYY10.;
  output;
end;
eof=0; 
FORMAT TOA TOD APPT_MADE_DATE APPT_START_DATE APPT_END_DATE APPT_CANCEL_DATE
       LastEntryDate DATETIME16. RELEASE_DATE MMDDYY10.;

*==> 3. UINs are reused, so only select matches on UIN & appointment start time;

proc sql;
create table whmeetingattendees as select a.* from whmeetings m 
left join whattendees a on m.uin=a.uin and m.APPT_START_DATE=a.APPT_START_DATE
order by a.APPT_START_DATE, uin, namelast, namefirst;

*==> 4. Export results to Excel;

proc export data=whmeetingattendees
            outfile='/folders/myfolders/whmeetingattendees.xlsx' dbms=xlsx replace;
sheet="whmeetingattendees";

proc sort data=whmeetings; by APPT_START_DATE uin;
proc export data=whmeetings
            outfile='/folders/myfolders/whmeetingattendees.xlsx' dbms=xlsx replace;
sheet="whmeetings";
run;

 

Here's a screen shot of the sample output in SAS University Edition:

 

WhVisitorsLog.jpg

Comments
by Super User
on ‎01-04-2016 01:24 PM

Since it was mentioned in the introduction, I would wonder what advantage this approach has for use with the CDC BRFSS data for SAS users. CDC provides the data in ASCII format with SAS programs to read the data or SAS transport format files as well as a well of documentation related to collection, weighting and calculating variables.

 

I sort of understand use for a topic like the White House visitors log as it changes frequently but the BRFSS data sets, barring major corrections of some sort, are pretty static once they are released.

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.