I have a data set of ~10K hospital visits. Each observation has about 60 variables describing the patient. Below is a mock sample of a small part of the data set.
Hosp | id | AI_Pace | PSR | cPWR | RTG | BOARD |
A | 1 | 1 | 3 | |||
A | 2 | 1 | 2 | |||
B | 1 | 1 | 2 | 2 | 2 | |
B | 2 | 2 | 1 | 1 | 1 | 3 |
B | 3 | 3 | ||||
B | 4 | 2 | 1 | 1 | ||
B | 5 | |||||
B | 6 | 2 | ||||
C | 1 | 2 | 3 | 2 | ||
C | 2 | 2 | ||||
C | 3 | 3 | 1 | |||
C | 4 | 3 | 3 | |||
C | 5 | |||||
C | 6 | 1 | 1 | 1 | 3 |
I'd like to transform this data into a set of transactions, almost as if every patient were a consumer at a store. Below is the data set above transformed into what I'd like. Please note that if a patient doesn't have any columns marked, then they are deleted. Thanks!!!!! Andrew
Hosp | id | |||||
A | 1 | AI_PACE = 1 | PSR=3 | |||
A | 2 | AI_PACE = 1 | BOARD = 2 | |||
B | 1 | AI_PACE = 1 | PSR = 2 | cPWR = 2 | RTG = 2 | |
B | 2 | AI_PACE = 2 | PSR = 1 | cPWR = 1 | RTG = 1 | BOARD = 3 |
B | 3 | AI_PACE = 3 | ||||
B | 4 | AI_PACE = 2 | cPWR = 1 | BOARD = 1 | ||
B | 6 | AI_PACE = 2 | ||||
C | 1 | PSR = 2 | cPWR = 3 | RTG = 2 | ||
C | 2 | BOARD =2 | ||||
C | 3 | AI_PACE = 3 | BOARD = 1 | |||
C | 4 | AI_PACE = 3 | RTG = 3 | |||
C | 6 | AI_PACE = 1 | PSR = 1 | cPWR = 1 | BOARD = 3 |
Do you want a DATASET or a FILE?
A FILE is probably simpler (and more useful??).
First let's convert your LISTING back into an actual SAS dataset.
data have ;
infile cards dsd dlm='|' truncover;
input Hosp $ id AI_Pace PSR cPWR RTG BOARD;
cards;
A|1|1|3| | |
A|2|1| | | |2
B|1|1|2|2|2|
B|2|2|1|1|1|3
B|3|3| | | |
B|4|2| |1| |1
B|5| | | | |
B|6|2| | | |
C|1| |2|3|2|
C|2| | | | |2
C|3|3| | | |1
C|4|3| | |3|
C|5| | | | |
C|6|1|1|1| |3
;
Since all of those variables are numeric you can use and ARRAY to handle them.
Let's write to a temporary file for testing. Just change the FILENAME statement to write to a permanent file.
filename trans temp;
data _null_;
file trans;
set have ;
array measures AI_Pace PSR cPWR RTG BOARD;
if n(of measures[*]);
put hosp id @;
do over measures;
if not missing(measures) then put measures= @;
end;
put;
run;
This produces a file like this for the example input:
A 1 AI_Pace=1 PSR=3 A 2 AI_Pace=1 BOARD=2 B 1 AI_Pace=1 PSR=2 cPWR=2 RTG=2 B 2 AI_Pace=2 PSR=1 cPWR=1 RTG=1 BOARD=3 B 3 AI_Pace=3 B 4 AI_Pace=2 cPWR=1 BOARD=1 B 6 AI_Pace=2 C 1 PSR=2 cPWR=3 RTG=2 C 2 BOARD=2 C 3 AI_Pace=3 BOARD=1 C 4 AI_Pace=3 RTG=3 C 6 AI_Pace=1 PSR=1 cPWR=1 BOARD=3
Which is something you could read back into SAS using NAMED input statement.
data test;
length Hosp $8 id AI_Pace PSR cPWR RTG BOARD 8;
infile trans ;
input Hosp $ id (AI_Pace -- BOARD) (=) ;
run;
Which produces the same data (only missing the observations with no transactions).
proc compare data=have compare=test;
id hosp id;
run;
Observation Summary Observation Base Compare ID First Obs 1 1 Hosp=A id=1 Last Obs 14 12 Hosp=C id=6 Number of Observations in Common: 12. Number of Observations in WORK.HAVE but not in WORK.TEST: 2. Total Number of Observations Read from WORK.HAVE: 14. Total Number of Observations Read from WORK.TEST: 12. Number of Observations with Some Compared Variables Unequal: 0. Number of Observations with All Compared Variables Equal: 12. NOTE: No unequal values were found.
Do you want a DATASET or a FILE?
A FILE is probably simpler (and more useful??).
First let's convert your LISTING back into an actual SAS dataset.
data have ;
infile cards dsd dlm='|' truncover;
input Hosp $ id AI_Pace PSR cPWR RTG BOARD;
cards;
A|1|1|3| | |
A|2|1| | | |2
B|1|1|2|2|2|
B|2|2|1|1|1|3
B|3|3| | | |
B|4|2| |1| |1
B|5| | | | |
B|6|2| | | |
C|1| |2|3|2|
C|2| | | | |2
C|3|3| | | |1
C|4|3| | |3|
C|5| | | | |
C|6|1|1|1| |3
;
Since all of those variables are numeric you can use and ARRAY to handle them.
Let's write to a temporary file for testing. Just change the FILENAME statement to write to a permanent file.
filename trans temp;
data _null_;
file trans;
set have ;
array measures AI_Pace PSR cPWR RTG BOARD;
if n(of measures[*]);
put hosp id @;
do over measures;
if not missing(measures) then put measures= @;
end;
put;
run;
This produces a file like this for the example input:
A 1 AI_Pace=1 PSR=3 A 2 AI_Pace=1 BOARD=2 B 1 AI_Pace=1 PSR=2 cPWR=2 RTG=2 B 2 AI_Pace=2 PSR=1 cPWR=1 RTG=1 BOARD=3 B 3 AI_Pace=3 B 4 AI_Pace=2 cPWR=1 BOARD=1 B 6 AI_Pace=2 C 1 PSR=2 cPWR=3 RTG=2 C 2 BOARD=2 C 3 AI_Pace=3 BOARD=1 C 4 AI_Pace=3 RTG=3 C 6 AI_Pace=1 PSR=1 cPWR=1 BOARD=3
Which is something you could read back into SAS using NAMED input statement.
data test;
length Hosp $8 id AI_Pace PSR cPWR RTG BOARD 8;
infile trans ;
input Hosp $ id (AI_Pace -- BOARD) (=) ;
run;
Which produces the same data (only missing the observations with no transactions).
proc compare data=have compare=test;
id hosp id;
run;
Observation Summary Observation Base Compare ID First Obs 1 1 Hosp=A id=1 Last Obs 14 12 Hosp=C id=6 Number of Observations in Common: 12. Number of Observations in WORK.HAVE but not in WORK.TEST: 2. Total Number of Observations Read from WORK.HAVE: 14. Total Number of Observations Read from WORK.TEST: 12. Number of Observations with Some Compared Variables Unequal: 0. Number of Observations with All Compared Variables Equal: 12. NOTE: No unequal values were found.
I should have said that I wanted to write to a CSV file .
Here's what I ran, with a few lines of output shown:
filename mba;
data _null_;
file mba;
set mba1 ;
array measures AI_Pace PSR cPWR RTG itm;
if n(of measures[*]);
put race prg @;
do over measures;
if not missing(measures) then put measures= @;
end;
put;
run;
1 1 AI_Pace=1
1 3 AI_Pace=1 ITM=2
1 4 AI_Pace=1
1 6 AI_Pace=2 ITM=3
1 7 AI_Pace=2
1 8 AI_Pace=2 ITM=1
2 1 AI_Pace=9
2 2 AI_Pace=2
2 3 AI_Pace=9
2 4 AI_Pace=9
2 6 AI_Pace=9
2 7 AI_Pace=9
2 8 AI_Pace=9
2 9 AI_Pace=9 ITM=2
2 10 AI_Pace=3 ITM=1
2 11 AI_Pace=3
2 12 AI_Pace=9
2 13 AI_Pace=1 ITM=3
3 1 AI_Pace=1 ITM=1
3 2 AI_Pace=2
3 3 AI_Pace=2 ITM=3
3 4 AI_Pace=1
3 5 AI_Pace=2
3 6 AI_Pace=9
3 7 AI_Pace=3 ITM=2
4 1 AI_Pace=9 ITM=2
4 2 AI_Pace=3 ITM=3
4 3 AI_Pace=1 ITM=1
4 5 AI_Pace=3
4 6 AI_Pace=1
For some reason, all I'm getting are AI_Line and ITM. Do you know what might be occurring?
Thank you.
Andrew
Most likely you used the wrong variable names in your ARRAY statement. That would cause SAS to make NEW variables that will always be empty, and so not appear in the output.
If you know the order they appear in the dataset (use the VARNUM option on PROC CONTENTS) you can use a positional variable list. Then you only need to get two of the variable names right, the first and last.
array measures First_var_name -- Name_of_last_var ;
Yep, I forgot to put he letter "r" in front of some of the variables. 😒
Do you want a CSV file like this?
A,1,AI_Pace=1 PSR=3 A,2,AI_Pace=1 BOARD=2 B,1,AI_Pace=1 PSR=2 cPWR=2 RTG=2
Then just add the DSD option to the FILE statament.
Or like this?
A,1,AI_Pace=1,PSR=3 A,2,AI_Pace=1,BOARD=2 B,1,AI_Pace=1,PSR=2,cPWR=2,RTG=2
Then don't add DSD option but instead write the commas before writing ID and the measures.
put hosp ',' id @;
do over measures;
if not missing(measures) then put +(-1) ',' measures= @;
end;
put;
Or worse like this?
A,1,AI_Pace=1,PSR=3,, A,2,AI_Pace=1,BOARD=2,, B,1,AI_Pace=1,PSR=2,cPWR=2,RTG=2
For that you might want to keep track of how many non missing you found and then write the correct number of commas at the end.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.