BookmarkSubscribeRSS Feed

Using SAS to read the White House Visitors Log and other Socrata-hosted data sets

Started ‎01-04-2016 by
Modified ‎08-04-2021 by
Views 2,482

In Analyzing Open Data with SAS, Socrata's Mark Silverberg explains how to use SAS code and the Socrata API to consume a Socrata-hosted open data dataset (the CDC's Behavioral Risk Factor Surveillance System health survey data).

 

There's a wealth of Socrata-hosted datasets available on the web, including the White House Visitors Log. Below is a little SAS code that retrieves data from the WH Visitors Log. Plug in the name of your favorite public figure - the example uses Oprah - and see not only when they visited, but also with whom they met. Sure beats having to download 100+ MB of zipped .csv files!

 

*==> 1. Run request to retrieve White House Visitor Log meeting IDs (UINs) from Socrata;

filename whquery url 'http://open.whitehouse.gov/resource/p86s-ychb.csv?namelast=winfrey&namefirst=oprah&$limit=30000';

data whmeetings;
infile whquery dsd dlm=',' lrecl=2000 truncover firstobs=2; 
input NAMELAST : $30. NAMEFIRST : $30. NAMEMID : $30. UIN : $30. BDGNBR : $30. 
      Type_of_Access : $30. TOA : ANYDTDTM40. POA : $30. TOD : ANYDTDTM40. 
      POD : $30. APPT_MADE_DATE : ANYDTDTM40. APPT_START_DATE : ANYDTDTM40. 
      APPT_END_DATE : ANYDTDTM40. APPT_CANCEL_DATE : ANYDTDTM40. Total_People 
      LAST_UPDATEDBY : $30. POST : $30. LastEntryDate : ANYDTDTM40. 
      TERMINAL_SUFFIX : $30. visitee_namelast : $30. visitee_namefirst : $30. 
      MEETING_LOC : $30. MEETING_ROOM : $30. CALLER_NAME_LAST : $30. 
      CALLER_NAME_FIRST : $30. CALLER_ROOM : $30. Description : $255.
      RELEASE_DATE : MMDDYY10.;
FORMAT TOA TOD APPT_MADE_DATE APPT_START_DATE APPT_END_DATE APPT_CANCEL_DATE
       LastEntryDate DATETIME16. RELEASE_DATE MMDDYY10.;

proc sort data=whmeetings nodupkey; by uin;  * De-dupe;

*==> 2. Run request to retrieve meeting attendees from Socrata;

filename whquery2 url 'http://www.google.com';  * Placeholder URL initially;

data whattendees;
length whquery2 $255.;                     * Drop observations with no appointment ID;
set whmeetings(keep=UIN rename=(uin=inuin) where=(inuin^='')); 
whquery2="http://open.whitehouse.gov/resource/p86s-ychb.csv?UIN="||trim(inuin);
do while(^eof);
  infile whquery2 url filevar=whquery2  dsd dlm=',' lrecl=2000 truncover firstobs=2 end=eof;
  input NAMELAST : $30. NAMEFIRST : $30. NAMEMID : $30. UIN : $30. BDGNBR : $30. 
        Type_of_Access : $30. TOA : ANYDTDTM40. POA : $30. TOD : ANYDTDTM40. 
        POD : $30. APPT_MADE_DATE : ANYDTDTM40. APPT_START_DATE : ANYDTDTM40. 
        APPT_END_DATE : ANYDTDTM40. APPT_CANCEL_DATE : ANYDTDTM40. Total_People 
        LAST_UPDATEDBY : $30. POST : $30. LastEntryDate : ANYDTDTM40. 
        TERMINAL_SUFFIX : $30. visitee_namelast : $30. visitee_namefirst : $30. 
        MEETING_LOC : $30. MEETING_ROOM : $30. CALLER_NAME_LAST : $30. 
        CALLER_NAME_FIRST : $30. CALLER_ROOM : $30. Description : $255.
        RELEASE_DATE : MMDDYY10.;
  output;
end;
eof=0; 
FORMAT TOA TOD APPT_MADE_DATE APPT_START_DATE APPT_END_DATE APPT_CANCEL_DATE
       LastEntryDate DATETIME16. RELEASE_DATE MMDDYY10.;

*==> 3. UINs are reused, so only select matches on UIN & appointment start time;

proc sql;
create table whmeetingattendees as select a.* from whmeetings m 
left join whattendees a on m.uin=a.uin and m.APPT_START_DATE=a.APPT_START_DATE
order by a.APPT_START_DATE, uin, namelast, namefirst;

*==> 4. Export results to Excel;

proc export data=whmeetingattendees
            outfile='/folders/myfolders/whmeetingattendees.xlsx' dbms=xlsx replace;
sheet="whmeetingattendees";

proc sort data=whmeetings; by APPT_START_DATE uin;
proc export data=whmeetings
            outfile='/folders/myfolders/whmeetingattendees.xlsx' dbms=xlsx replace;
sheet="whmeetings";
run;

 

Here's a screen shot of the sample output in SAS University Edition:

 

WhVisitorsLog.jpg

Comments

Since it was mentioned in the introduction, I would wonder what advantage this approach has for use with the CDC BRFSS data for SAS users. CDC provides the data in ASCII format with SAS programs to read the data or SAS transport format files as well as a well of documentation related to collection, weighting and calculating variables.

 

I sort of understand use for a topic like the White House visitors log as it changes frequently but the BRFSS data sets, barring major corrections of some sort, are pretty static once they are released.

Version history
Last update:
‎08-04-2021 12:57 PM
Updated by:

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags