DATA Step, Macro, Functions and more

m rows * 24 columns into 30 categories(variables)

Posts: 66

m rows * 24 columns into 30 categories(variables)


I have a table, containing patient_id, claim_id for each patient. There are more than one row for claims of each patient. Some patients even have more than 99 rows. There are 24 columns for diagnosis DGNS_2,……, DGNS_25 for each claim of each patient. I need to categorize these diagnosis into 30 defined groups to create a new table, in which there will be only one row to show these 30 defined groups for each patient, instead of many rows for different claims of same patient.

To categorize diagnosis into 30 defined groups, some code looks like below. These codes are to check each row and then create 30 variables for each row. However, my task is I need to categorize m claim rows by 24 diagnosis columns  for each patient into 30 defined groups and record these 30 groups(variables) for each patient in only one row in a new table. I am not sure how to do. I don’t think nesting do loops could be used with proc sql to create new table. I am thinking transform long data to wide data for each patient and then do categorization, but it is not efficient because there are just so many rows, like 99 rows for some patient. Thanks so much in advance for your help!

data new;

set old;

ELX_GRP_1=0; ELX_GRP_2=0; ELX_GRP_3=0; ELX_GRP_4=0; ELX_GRP_5=0; ELX_GRP_6=0;ELX_GRP_7=0; ELX_GRP_8=0;

ELX_GRP_9=0; ELX_GRP_10=0; ELX_GRP_11=0; ELX_GRP_12=0;ELX_GRP_13=0; ELX_GRP_14=0; ELX_GRP_15=0; ELX_GRP_16=0;

ELX_GRP_17=0; ELX_GRP_18=0;ELX_GRP_19=0; ELX_GRP_20=0; ELX_GRP_21=0; ELX_GRP_22=0; ELX_GRP_23=0; ELX_GRP_24=0;

ELX_GRP_25=0; ELX_GRP_26=0; ELX_GRP_27=0; ELX_GRP_28=0; ELX_GRP_29=0; ELX_GRP_30=0;

array dx[5] dx1-dx5;

do j=1 to 5;

  if substr(Dx(j),1,4) INSmiley Sad"0932","7463","7464","7465","7466","V422","V433") | substr(Dx(j),1,3) INSmiley Sad"394","395","396","397","424" )
then ELX_GRP_2 = 1;
LABEL ELX_GRP_2='Valvular Disease'

Super User
Posts: 10,501

Re: m rows * 24 columns into 30 categories(variables)

After you have the 30 variables created for a row what rules ACROSS rows do you have for assigning?

I am going to guess that you have 30 variables that are 0 or 1 valued. IF you want "are any of the variables valued 1 for any of the records for the patient" then this may work:

Proc summary data = new nway;

class PatientID;

var ELX_GRP_ : ;

output out=want (drop=type) max= ;


This will have the maximum value for the 30 variables you created for each patient.

There will also be a variable _freq_ that will tell you how many records were involved.

Ask a Question
Discussion stats
  • 1 reply
  • 1 like
  • 2 in conversation