In Analyzing Open Data with SAS, Socrata's Mark Silverberg explains how to use SAS code and the Socrata API to consume a Socrata-hosted open data dataset (the CDC's Behavioral Risk Factor Surveillance System health survey data).
There's a wealth of Socrata-hosted datasets available on the web, including the White House Visitors Log. Below is a little SAS code that retrieves data from the WH Visitors Log. Plug in the name of your favorite public figure - the example uses Oprah - and see not only when they visited, but also with whom they met. Sure beats having to download 100+ MB of zipped .csv files!
*==> 1. Run request to retrieve White House Visitor Log meeting IDs (UINs) from Socrata;
filename whquery url 'http://open.whitehouse.gov/resource/p86s-ychb.csv?namelast=winfrey&namefirst=oprah&$limit=30000';
data whmeetings;
infile whquery dsd dlm=',' lrecl=2000 truncover firstobs=2;
input NAMELAST : $30. NAMEFIRST : $30. NAMEMID : $30. UIN : $30. BDGNBR : $30.
Type_of_Access : $30. TOA : ANYDTDTM40. POA : $30. TOD : ANYDTDTM40.
POD : $30. APPT_MADE_DATE : ANYDTDTM40. APPT_START_DATE : ANYDTDTM40.
APPT_END_DATE : ANYDTDTM40. APPT_CANCEL_DATE : ANYDTDTM40. Total_People
LAST_UPDATEDBY : $30. POST : $30. LastEntryDate : ANYDTDTM40.
TERMINAL_SUFFIX : $30. visitee_namelast : $30. visitee_namefirst : $30.
MEETING_LOC : $30. MEETING_ROOM : $30. CALLER_NAME_LAST : $30.
CALLER_NAME_FIRST : $30. CALLER_ROOM : $30. Description : $255.
RELEASE_DATE : MMDDYY10.;
FORMAT TOA TOD APPT_MADE_DATE APPT_START_DATE APPT_END_DATE APPT_CANCEL_DATE
LastEntryDate DATETIME16. RELEASE_DATE MMDDYY10.;
proc sort data=whmeetings nodupkey; by uin; * De-dupe;
*==> 2. Run request to retrieve meeting attendees from Socrata;
filename whquery2 url 'http://www.google.com'; * Placeholder URL initially;
data whattendees;
length whquery2 $255.; * Drop observations with no appointment ID;
set whmeetings(keep=UIN rename=(uin=inuin) where=(inuin^=''));
whquery2="http://open.whitehouse.gov/resource/p86s-ychb.csv?UIN="||trim(inuin);
do while(^eof);
infile whquery2 url filevar=whquery2 dsd dlm=',' lrecl=2000 truncover firstobs=2 end=eof;
input NAMELAST : $30. NAMEFIRST : $30. NAMEMID : $30. UIN : $30. BDGNBR : $30.
Type_of_Access : $30. TOA : ANYDTDTM40. POA : $30. TOD : ANYDTDTM40.
POD : $30. APPT_MADE_DATE : ANYDTDTM40. APPT_START_DATE : ANYDTDTM40.
APPT_END_DATE : ANYDTDTM40. APPT_CANCEL_DATE : ANYDTDTM40. Total_People
LAST_UPDATEDBY : $30. POST : $30. LastEntryDate : ANYDTDTM40.
TERMINAL_SUFFIX : $30. visitee_namelast : $30. visitee_namefirst : $30.
MEETING_LOC : $30. MEETING_ROOM : $30. CALLER_NAME_LAST : $30.
CALLER_NAME_FIRST : $30. CALLER_ROOM : $30. Description : $255.
RELEASE_DATE : MMDDYY10.;
output;
end;
eof=0;
FORMAT TOA TOD APPT_MADE_DATE APPT_START_DATE APPT_END_DATE APPT_CANCEL_DATE
LastEntryDate DATETIME16. RELEASE_DATE MMDDYY10.;
*==> 3. UINs are reused, so only select matches on UIN & appointment start time;
proc sql;
create table whmeetingattendees as select a.* from whmeetings m
left join whattendees a on m.uin=a.uin and m.APPT_START_DATE=a.APPT_START_DATE
order by a.APPT_START_DATE, uin, namelast, namefirst;
*==> 4. Export results to Excel;
proc export data=whmeetingattendees
outfile='/folders/myfolders/whmeetingattendees.xlsx' dbms=xlsx replace;
sheet="whmeetingattendees";
proc sort data=whmeetings; by APPT_START_DATE uin;
proc export data=whmeetings
outfile='/folders/myfolders/whmeetingattendees.xlsx' dbms=xlsx replace;
sheet="whmeetings";
run;
Here's a screen shot of the sample output in SAS University Edition:
Since it was mentioned in the introduction, I would wonder what advantage this approach has for use with the CDC BRFSS data for SAS users. CDC provides the data in ASCII format with SAS programs to read the data or SAS transport format files as well as a well of documentation related to collection, weighting and calculating variables.
I sort of understand use for a topic like the White House visitors log as it changes frequently but the BRFSS data sets, barring major corrections of some sort, are pretty static once they are released.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.