I am working with CMS data and trying to convert it so that each facility has only one row, but I'm having a difficult time with one specific aspect.
The original data in formatted like (I removed some unnecessary columns for ease of reading):
Obs | Facility_ID | Facility_Name | Measure_ID | Measure_Name | Compared_to_National | Score |
1 | 100001 | SHANDS JACKSONVILLE | HAI_1_CILOWER | Central Line Associated Bloodstream Infection (ICU + select Wards): Lower Confidence Limit | Worse than the National Benchmark | 1.081 |
2 | 100001 | SHANDS JACKSONVILLE | HAI_5_CILOWER | MRSA Bacteremia: Lower Confidence Limit | Worse than the National Benchmark | 1.595 |
3 | 100001 | SHANDS JACKSONVILLE | HAI_1_SIR | Central Line Associated Bloodstream Infection (ICU + select Wards) | Worse than the National Benchmark | 1.62 |
Each hospital has about 40 rows with a separate measure in each.
I am trying to get data that looks like below, for all HAI_x_SIR measures.
Obs | Facility_ID | State | HAI_1_SIR | Compared_1 | HAI_2_SIR | Compared_2 | HAI_3_SIR | Compared_3 |
1 | 100001 | FL | 1.62 | Better than average | 0.832 | Better than average | 0.904 | No different than average |
2 | 100002 | FL | 1.685 | Worse than average | 1.106 | No different than average | 1.197 | Worse than average |
3 | 100006 | FL | 0.495 | No different than average | 0.286 | Better than average | 0.966 | Better than average |
data hai_test;
set hai;
if facility_id = . then delete;
if score="Not Available" then delete;
run;
proc sort data=hai;
by Facility_ID;
run;
proc print data=work.hai_test (obs=100);
run;
proc transpose data=hai_test out=hai_test_row (keep=facility_id
state HAI_1_SIR HAI_2_SIR HAI_3_SIR
HAI_4_SIR HAI_5_SIR);
by facility_ID state;
var score;
id measure_id;
run;
With this code, I have been able to get data that looks like this:
Obs | Facility_ID | State | HAI_1_SIR | HAI_2_SIR | HAI_3_SIR | HAI_4_SIR | HAI_5_SIR |
1 | 100001 | FL | 1.62 | 0.832 | 0.904 | 0.386 | 2.322 |
2 | 100002 | FL | 1.685 | 1.106 | 1.197 | 2 | 1.433 |
3 | 100006 | FL | 0.495 | 0.286 | 0.966 | 1.514 | 1.247 |
proc transpose data=hai_test out=hai_test_row (keep=facility_id
state HAI_1_SIR HAI_2_SIR HAI_3_SIR
HAI_4_SIR HAI_5_SIR);
by facility_ID state;
var score compared_to_national;
id measure_id;
run;
Using this code instead, including compared_to_national, gives the following result:
Obs | Facility_ID | State | HAI_1_SIR | HAI_5_SIR | HAI_4_SIR | HAI_2_SIR | HAI_3_SIR |
1 | 100001 | FL | 1.62 | 2.322 | 0.386 | 0.832 | 0.904 |
2 | 100001 | FL | Worse than the National Benchmark | Worse than the National Benchmark | No Different than National Benchmark | No Different than National Benchmark | No Different than National Benchmark |
3 | 100002 | FL | 1.685 | 1.433 | 1.106 | 1.197 | |
4 | 100002 | FL | Worse than the National Benchmark | No Different than National Benchmark | No Different than National Benchmark | No Different than National Benchmark | |
5 | 100006 | FL | 0.495 | 1.247 | 1.514 | 0.286 | 0.966 |
6 | 100006 | FL | Better than the National Benchmark | No Different than National Benchmark | No Different than National Benchmark | Better than the National Benchmark | No Different than National Benchmark |
Which is obviously still not correct, but I think may be an intermediary step? Or I may be completely off base. I have no idea how to proceed with what I've been able to get so far. I'm thinking an array may be necessary or maybe retain, but I don't even know what to search for to find advice on that for this scenario.
Thank you so much!
So both of your PROC TRANSPOSE steps will create variables with the same names because you are only using the ID statement to name them. If you want the numeric and the character variables to have different names you could add the PREFIX= or SUFFIX= option to the PROC statement.
What is the purpose of the KEEP= dataset option in your code? Are you just trying to get rid of the _NAME_ variable? It might be easier to use a DROP= option instead.
proc transpose data=hai_test out=hai_test_numbers (drop=_name_);
by facility_ID state;
id measure_id;
var score;
run;
proc transpose data=hai_test suffix=_change out=hai_test_strings(drop=_name_);
by facility_ID state;
id measure_id;
var compared_to_national;
run;
data want;
merge hai_test_numbers hai_test_strings;
by facility_ID state;
run;
So both of your PROC TRANSPOSE steps will create variables with the same names because you are only using the ID statement to name them. If you want the numeric and the character variables to have different names you could add the PREFIX= or SUFFIX= option to the PROC statement.
What is the purpose of the KEEP= dataset option in your code? Are you just trying to get rid of the _NAME_ variable? It might be easier to use a DROP= option instead.
proc transpose data=hai_test out=hai_test_numbers (drop=_name_);
by facility_ID state;
id measure_id;
var score;
run;
proc transpose data=hai_test suffix=_change out=hai_test_strings(drop=_name_);
by facility_ID state;
id measure_id;
var compared_to_national;
run;
data want;
merge hai_test_numbers hai_test_strings;
by facility_ID state;
run;
The purpose of KEEP= was to drop the many other measure variables that aren't relevant to my final project (HAI_5_ELIGCASES for example).
Your solution worked though! Thank you so much, you're my hero. 🙂
Since the MEASURE_ID variable is used to NAME the resulting variables you can use a WHERE on the INPUT to remove them rather than a DROP or KEEP on the OUTPUT. Should make it run faster too.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.