BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
DocMartin
Quartz | Level 8

I have a data set of ~10K hospital visits. Each observation has about 60 variables describing the patient. Below is a mock sample of a small part of the data set.

HospidAI_PacePSRcPWRRTGBOARD
A113   
A21   2
B11222 
B221113
B33    
B42 1 1
B5     
B62    
C1 232 
C2    2
C33   1
C43  3 
C5     
C6111 3

 

I'd like to transform this data into a set of transactions, almost as if every patient were a consumer at a store. Below is the data set above transformed into what I'd like. Please note that if a patient doesn't have any columns marked, then they are deleted. Thanks!!!!! Andrew

Hospid     
A1AI_PACE = 1PSR=3   
A2AI_PACE = 1BOARD = 2   
B1AI_PACE = 1PSR = 2cPWR = 2RTG = 2 
B2AI_PACE = 2PSR = 1cPWR = 1RTG = 1BOARD = 3
B3AI_PACE = 3    
B4AI_PACE = 2cPWR = 1BOARD = 1  
B6AI_PACE = 2    
C1PSR = 2cPWR = 3RTG = 2  
C2BOARD =2    
C3AI_PACE = 3BOARD = 1   
C4AI_PACE = 3RTG = 3   
C6AI_PACE = 1PSR = 1cPWR = 1BOARD = 3 
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Do you want a DATASET or a FILE?

A FILE is probably simpler (and more useful??).

First let's convert your LISTING back into an actual SAS dataset.

Spoiler
data have ;
  infile cards dsd dlm='|' truncover;
  input Hosp $ id AI_Pace PSR cPWR RTG BOARD;
cards;
A|1|1|3| | | 
A|2|1| | | |2
B|1|1|2|2|2| 
B|2|2|1|1|1|3
B|3|3| | | | 
B|4|2| |1| |1
B|5| | | | | 
B|6|2| | | | 
C|1| |2|3|2| 
C|2| | | | |2
C|3|3| | | |1
C|4|3| | |3| 
C|5| | | | | 
C|6|1|1|1| |3
;

Since all of those variables are numeric you can use and ARRAY to handle them.

Let's write to a temporary file for testing.  Just change the FILENAME statement to write to a permanent file.

filename trans temp;
data _null_;
  file trans;
  set have ;
  array measures AI_Pace PSR cPWR RTG BOARD;
  if n(of measures[*]);
  put hosp id @;
  do over measures;
    if not missing(measures) then put measures= @;
  end;
  put;
run;

This produces a file like this for the example input:

A 1 AI_Pace=1 PSR=3
A 2 AI_Pace=1 BOARD=2
B 1 AI_Pace=1 PSR=2 cPWR=2 RTG=2
B 2 AI_Pace=2 PSR=1 cPWR=1 RTG=1 BOARD=3
B 3 AI_Pace=3
B 4 AI_Pace=2 cPWR=1 BOARD=1
B 6 AI_Pace=2
C 1 PSR=2 cPWR=3 RTG=2
C 2 BOARD=2
C 3 AI_Pace=3 BOARD=1
C 4 AI_Pace=3 RTG=3
C 6 AI_Pace=1 PSR=1 cPWR=1 BOARD=3

Which is something you could read back into SAS using NAMED input statement.

data test;
  length Hosp $8 id AI_Pace PSR cPWR RTG BOARD 8;
  infile trans ;
  input Hosp $ id (AI_Pace -- BOARD) (=) ;
run;

Which produces the same data (only missing the observations with no transactions).

proc compare data=have compare=test;
 id hosp id;
run;
Observation Summary                                                           
                                                                              
Observation      Base  Compare  ID                                            
                                                                              
First Obs           1        1  Hosp=A id=1                                   
Last  Obs          14       12  Hosp=C id=6                                   
                                                                              
Number of Observations in Common: 12.                                         
Number of Observations in WORK.HAVE but not in WORK.TEST: 2.                  
Total Number of Observations Read from WORK.HAVE: 14.                         
Total Number of Observations Read from WORK.TEST: 12.                         
                                                                              
Number of Observations with Some Compared Variables Unequal: 0.               
Number of Observations with All Compared Variables Equal: 12.                 
                                                                              
NOTE: No unequal values were found.  

 

 

 

View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

Do you want a DATASET or a FILE?

A FILE is probably simpler (and more useful??).

First let's convert your LISTING back into an actual SAS dataset.

Spoiler
data have ;
  infile cards dsd dlm='|' truncover;
  input Hosp $ id AI_Pace PSR cPWR RTG BOARD;
cards;
A|1|1|3| | | 
A|2|1| | | |2
B|1|1|2|2|2| 
B|2|2|1|1|1|3
B|3|3| | | | 
B|4|2| |1| |1
B|5| | | | | 
B|6|2| | | | 
C|1| |2|3|2| 
C|2| | | | |2
C|3|3| | | |1
C|4|3| | |3| 
C|5| | | | | 
C|6|1|1|1| |3
;

Since all of those variables are numeric you can use and ARRAY to handle them.

Let's write to a temporary file for testing.  Just change the FILENAME statement to write to a permanent file.

filename trans temp;
data _null_;
  file trans;
  set have ;
  array measures AI_Pace PSR cPWR RTG BOARD;
  if n(of measures[*]);
  put hosp id @;
  do over measures;
    if not missing(measures) then put measures= @;
  end;
  put;
run;

This produces a file like this for the example input:

A 1 AI_Pace=1 PSR=3
A 2 AI_Pace=1 BOARD=2
B 1 AI_Pace=1 PSR=2 cPWR=2 RTG=2
B 2 AI_Pace=2 PSR=1 cPWR=1 RTG=1 BOARD=3
B 3 AI_Pace=3
B 4 AI_Pace=2 cPWR=1 BOARD=1
B 6 AI_Pace=2
C 1 PSR=2 cPWR=3 RTG=2
C 2 BOARD=2
C 3 AI_Pace=3 BOARD=1
C 4 AI_Pace=3 RTG=3
C 6 AI_Pace=1 PSR=1 cPWR=1 BOARD=3

Which is something you could read back into SAS using NAMED input statement.

data test;
  length Hosp $8 id AI_Pace PSR cPWR RTG BOARD 8;
  infile trans ;
  input Hosp $ id (AI_Pace -- BOARD) (=) ;
run;

Which produces the same data (only missing the observations with no transactions).

proc compare data=have compare=test;
 id hosp id;
run;
Observation Summary                                                           
                                                                              
Observation      Base  Compare  ID                                            
                                                                              
First Obs           1        1  Hosp=A id=1                                   
Last  Obs          14       12  Hosp=C id=6                                   
                                                                              
Number of Observations in Common: 12.                                         
Number of Observations in WORK.HAVE but not in WORK.TEST: 2.                  
Total Number of Observations Read from WORK.HAVE: 14.                         
Total Number of Observations Read from WORK.TEST: 12.                         
                                                                              
Number of Observations with Some Compared Variables Unequal: 0.               
Number of Observations with All Compared Variables Equal: 12.                 
                                                                              
NOTE: No unequal values were found.  

 

 

 

DocMartin
Quartz | Level 8

I should have said that I wanted to write to a CSV file .

Here's what I ran, with a few lines of output shown:

filename mba;

data _null_;
  file mba;
  set mba1 ;
  array measures AI_Pace PSR cPWR RTG itm;
  if n(of measures[*]);
  put race prg @;
  do over measures;
    if not missing(measures) then put measures= @;
  end;
  put;
run;

 

1 1 AI_Pace=1
1 3 AI_Pace=1 ITM=2
1 4 AI_Pace=1
1 6 AI_Pace=2 ITM=3
1 7 AI_Pace=2
1 8 AI_Pace=2 ITM=1
2 1 AI_Pace=9
2 2 AI_Pace=2
2 3 AI_Pace=9
2 4 AI_Pace=9
2 6 AI_Pace=9
2 7 AI_Pace=9
2 8 AI_Pace=9
2 9 AI_Pace=9 ITM=2
2 10 AI_Pace=3 ITM=1
2 11 AI_Pace=3
2 12 AI_Pace=9
2 13 AI_Pace=1 ITM=3
3 1 AI_Pace=1 ITM=1
3 2 AI_Pace=2
3 3 AI_Pace=2 ITM=3
3 4 AI_Pace=1
3 5 AI_Pace=2
3 6 AI_Pace=9
3 7 AI_Pace=3 ITM=2
4 1 AI_Pace=9 ITM=2
4 2 AI_Pace=3 ITM=3
4 3 AI_Pace=1 ITM=1
4 5 AI_Pace=3
4 6 AI_Pace=1

 

For some reason, all I'm getting are AI_Line and ITM. Do you know what might be occurring?

 

Thank you.

Andrew

Tom
Super User Tom
Super User

Most likely you used the wrong variable names in your ARRAY statement.  That would cause SAS to make NEW variables that will always be empty, and so not appear in the output.

 

If you know the order they appear in the dataset (use the VARNUM option on PROC CONTENTS) you can use a positional variable list.  Then you only need to get two of the variable names right, the first and last.

array measures First_var_name -- Name_of_last_var ;
DocMartin
Quartz | Level 8

Yep, I forgot to put he letter "r" in front of some of the variables. 😒

Tom
Super User Tom
Super User

Do you want a CSV file like this?

A,1,AI_Pace=1 PSR=3
A,2,AI_Pace=1 BOARD=2
B,1,AI_Pace=1 PSR=2 cPWR=2 RTG=2

Then just add the DSD option to the FILE statament.

 

Or like this?

A,1,AI_Pace=1,PSR=3
A,2,AI_Pace=1,BOARD=2
B,1,AI_Pace=1,PSR=2,cPWR=2,RTG=2

Then don't add DSD option but instead write the commas before writing ID and the measures.

  put hosp ',' id @;
  do over measures;
    if not missing(measures) then put +(-1) ',' measures= @;
  end;
  put;

Or worse like this?

A,1,AI_Pace=1,PSR=3,,
A,2,AI_Pace=1,BOARD=2,,
B,1,AI_Pace=1,PSR=2,cPWR=2,RTG=2

 For that you might want to keep track of how many non missing you found and then write the correct number of commas at the end.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 674 views
  • 3 likes
  • 2 in conversation