Solved: Re: Transposing CMS Data

bonnie27 · Posted 03-28-2023 06:09 PM

I am working with CMS data and trying to convert it so that each facility has only one row, but I'm having a difficult time with one specific aspect.

The original data in formatted like (I removed some unnecessary columns for ease of reading):

Obs	Facility_ID	Facility_Name	Measure_ID	Measure_Name	Compared_to_National	Score
1	100001	SHANDS JACKSONVILLE	HAI_1_CILOWER	Central Line Associated Bloodstream Infection (ICU + select Wards): Lower Confidence Limit	Worse than the National Benchmark	1.081
2	100001	SHANDS JACKSONVILLE	HAI_5_CILOWER	MRSA Bacteremia: Lower Confidence Limit	Worse than the National Benchmark	1.595
3	100001	SHANDS JACKSONVILLE	HAI_1_SIR	Central Line Associated Bloodstream Infection (ICU + select Wards)	Worse than the National Benchmark	1.62

Each hospital has about 40 rows with a separate measure in each.

I am trying to get data that looks like below, for all HAI_x_SIR measures.

Obs	Facility_ID	State	HAI_1_SIR	Compared_1	HAI_2_SIR	Compared_2	HAI_3_SIR	Compared_3
1	100001	FL	1.62	Better than average	0.832	Better than average	0.904	No different than average
2	100002	FL	1.685	Worse than average	1.106	No different than average	1.197	Worse than average
3	100006	FL	0.495	No different than average	0.286	Better than average	0.966	Better than average

data hai_test;
	set hai;
	if facility_id = . then delete;
	if score="Not Available" then delete;
run;

proc sort data=hai;
    by Facility_ID;
run;

proc print data=work.hai_test (obs=100);
run;

proc transpose data=hai_test out=hai_test_row (keep=facility_id 
		state HAI_1_SIR HAI_2_SIR HAI_3_SIR 
		HAI_4_SIR HAI_5_SIR);
	by facility_ID state;
	var score;
	id measure_id;
run;

With this code, I have been able to get data that looks like this:

Obs	Facility_ID	State	HAI_1_SIR	HAI_2_SIR	HAI_3_SIR	HAI_4_SIR	HAI_5_SIR
1	100001	FL	1.62	0.832	0.904	0.386	2.322
2	100002	FL	1.685	1.106	1.197	2	1.433
3	100006	FL	0.495	0.286	0.966	1.514	1.247

proc transpose data=hai_test out=hai_test_row (keep=facility_id 
		state HAI_1_SIR HAI_2_SIR HAI_3_SIR 
		HAI_4_SIR HAI_5_SIR);
	by facility_ID state;
	var score compared_to_national;
	id measure_id;
run;

Using this code instead, including compared_to_national, gives the following result:

Obs	Facility_ID	State	HAI_1_SIR	HAI_5_SIR	HAI_4_SIR	HAI_2_SIR	HAI_3_SIR
1	100001	FL	1.62	2.322	0.386	0.832	0.904
2	100001	FL	Worse than the National Benchmark	Worse than the National Benchmark	No Different than National Benchmark	No Different than National Benchmark	No Different than National Benchmark
3	100002	FL	1.685	1.433		1.106	1.197
4	100002	FL	Worse than the National Benchmark	No Different than National Benchmark		No Different than National Benchmark	No Different than National Benchmark
5	100006	FL	0.495	1.247	1.514	0.286	0.966
6	100006	FL	Better than the National Benchmark	No Different than National Benchmark	No Different than National Benchmark	Better than the National Benchmark	No Different than National Benchmark

Which is obviously still not correct, but I think may be an intermediary step? Or I may be completely off base. I have no idea how to proceed with what I've been able to get so far. I'm thinking an array may be necessary or maybe retain, but I don't even know what to search for to find advice on that for this scenario.

Thank you so much!

Tom · Posted 03-28-2023 06:38 PM

So both of your PROC TRANSPOSE steps will create variables with the same names because you are only using the ID statement to name them. If you want the numeric and the character variables to have different names you could add the PREFIX= or SUFFIX= option to the PROC statement.

What is the purpose of the KEEP= dataset option in your code? Are you just trying to get rid of the _NAME_ variable? It might be easier to use a DROP= option instead.

proc transpose data=hai_test out=hai_test_numbers (drop=_name_);
  by facility_ID state;
  id measure_id;
  var score;
run;
proc transpose data=hai_test suffix=_change out=hai_test_strings(drop=_name_);
  by  facility_ID state;
  id measure_id;
  var compared_to_national;
run;

data want;
  merge hai_test_numbers hai_test_strings;
  by facility_ID state;
run;

View solution in original post

Tom · Posted 03-28-2023 06:38 PM

So both of your PROC TRANSPOSE steps will create variables with the same names because you are only using the ID statement to name them. If you want the numeric and the character variables to have different names you could add the PREFIX= or SUFFIX= option to the PROC statement.

What is the purpose of the KEEP= dataset option in your code? Are you just trying to get rid of the _NAME_ variable? It might be easier to use a DROP= option instead.

proc transpose data=hai_test out=hai_test_numbers (drop=_name_);
  by facility_ID state;
  id measure_id;
  var score;
run;
proc transpose data=hai_test suffix=_change out=hai_test_strings(drop=_name_);
  by  facility_ID state;
  id measure_id;
  var compared_to_national;
run;

data want;
  merge hai_test_numbers hai_test_strings;
  by facility_ID state;
run;

bonnie27 · Posted 03-28-2023 06:50 PM

The purpose of KEEP= was to drop the many other measure variables that aren't relevant to my final project (HAI_5_ELIGCASES for example).

Your solution worked though! Thank you so much, you're my hero. 🙂

Tom · Posted 03-28-2023 07:02 PM

Since the MEASURE_ID variable is used to NAME the resulting variables you can use a WHERE on the INPUT to remove them rather than a DROP or KEEP on the OUTPUT. Should make it run faster too.

Transposing CMS Data

Re: Transposing CMS Data

Re: Transposing CMS Data

Re: Transposing CMS Data

Re: Transposing CMS Data

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away