About cyrribrae

SuzanneDorinski · ‎09-22-2018

Finding super-utilizers in a data set of patient records seems similar to looking for housing units with lots of people in survey data. I wrote the code below for the 2015 American Community Survey public use microdata files, available on the Census Bureau web site. The data set has 3,147,005 observations and 284 variables. In the data set, SERIALNO is the identifier for the housing unit. The first PROC SQL counts the number of person records for each SERIALNO. The PROC FREQ shows the number of housing units that have 1 person record, 2 person records, etc. I can look at the output from the PROC FREQ to find housing units containing relatively large numbers of people. The second PROC SQL creates a macro variable that has the list of values of SERIALNO which have 20 person records. The third PROC SQL creates a data set of the person records for people in housing units that have 20 person records. libname acs '/folders/myfolders/ACS PUMS'; proc sql; create table record_counts as select serialno, count(serialno) as number_of_records from acs.population group by serialno order by serialno; quit; proc freq data=record_counts; tables number_of_records; title 'Number of person records by SERIALNO in 2015 American Community Survey Public Use Microdata'; run; proc sql noprint; select quote(serialno) into :list_of_largest_housing_units separated by ', ' from record_counts where number_of_records=20; quit; proc sql; create table units_with_20_people as select serialno, sporder, agep, sex, rac1p, mar, hisp, relp from acs.population where serialno in (&list_of_largest_housing_units) order by serialno, sporder; quit; PROC FREQ output showing number of person records per housing unit

Online Status	Offline
Date Last Visited	‎09-21-2018 09:36 AM

Re: Filtering Dataset by Variable Count

Filtering Dataset by Variable Count

Re: Filtering Dataset by Variable Count