Hi All - I am trying to create dummy variables for values of proc_cd (without typing them out since there are many in a huge dataset I'm working with) by the claim_id variable - let me use the data to show what I am starting with and my want data.
Start data:
data procedures;
input claim_id $ proc_cd $;
cards;
J3 G23
J3 G123
J2 G23
J2 G123
J3 G23
J4 G23
J5 54
J6 54
J22 54
J28 54
;
run;
Then I want to create dummies for each prod_cd by the claim_id variable so that if at any point that claim_id had, say for example, proc_cd = 23, then the value for the dummy for proc_cd = 23 would be 1 for every time that claim_id comes up in the data. But again because my dataset has so many proc_cd, I would like to avoid writing them all out.
So I would end up with:
data procedures_dummy;
input claim_id $ proc_cd $ dum_G23 $ dum_G123 $ dum_54 $;
cards;
J3 G23 1 1 1
J3 G123 1 1 1
J2 G23 1 1 0
J2 G123 1 1 0
J3 54 1 1 1
J4 G23 1 0 0
J5 54 0 0 1
J6 54 0 0 1
J22 54 0 0 1
J28 54 0 0 1
;
run;
If I had that, it would be great!
Just to give some more context - my vision with this is to then perform the following code to get a dataset with only one observation row for each unique claim_id:
proc sort data = procedures_dummy;
by claim_id;
run;
DATA procedures_dummy_collapse;
SET procedures_dummy;
BY claim_id;
IF first.claim_id;
RUN;
Which would yield:
data procedures_dummy_collapse;
input claim_id $ proc_cd $ dum_G23 $ dum_G123 $ dum_54 $;
cards;
J2 G23 1 1 0
J22 54 0 0 1
J28 54 0 0 1
J3 G23 1 1 1
J4 G23 1 0 0
J5 54 0 0 1
J6 54 0 0 1
;
run;
And then at this point I would have another dataset with one observation row for each unique claim_id waiting to merge it with:
data charges;
input claim_id $ total_chrg;
cards;
J3 3324
J2 234
J4 234
J5 3456
J6 34576
J22 5789
J28 345
;
run;
proc sort data = charges;
by claim_id;
run;
DATA charges_proc_cd;
MERGE charges(IN = fromchargex) procedures_dummy_collapse(IN = fromprocx);
BY claim_id;
fromcharge = fromchargex;
fromproc = fromprocx;
RUN;
So basically - this is all to take a dataset that has multiple copies of a unique claim_id, collapse it into a dataset that has one row of observations for each unique claim_id, and then merge it into a dataset that only has one row of observations for each unique claim_id. Then I am going to perform regression models on that. If somebody has a better way to do this - I'd be happy to hear your thoughts too!
Cheers,
Peter
A fully automated way to get your yes/no columns:
data procedures;
input claim_id $ proc_cd $;
n = 1; /* create dummy value */
cards;
J3 G23
J3 G123
J2 G23
J2 G123
J3 G23
J4 G23
J5 54
J6 54
J22 54
J28 54
;
proc sort
data=procedures
out=sum
nodupkey
;
by claim_id proc_cd;
run;
proc transpose
data=sum
out=trans (drop=_name_)
prefix=dum_
;
by claim_id;
id proc_cd;
var n;
run;
proc stdize
data=trans
out=want
missing=0
reponly
;
run;
The last step is only there to replace missing values with 0.
There are several PROCs that can create dummy variables in SAS, so you don't have to write your own code. One is PROC GLMMOD, but other PROCs are available to do this as well. Here is a thread that discusses this issue: https://communities.sas.com/t5/SAS-Procedures/How-can-i-create-dummy-variables-How-can-i-change-my-c...
Furthermore, depending on what you are going to do with the dummy variables, you may not have to create them at all, as many SAS PROCs allow a CLASS statement, so the dummy variables are created internally in the PROC, and you never have to bother creating them. Let SAS do the work.
So, really, don't create dummy variables yourself. It's not worth the effort.
But I also don't see why dummy variables are needed to do a merge or to "collapse" multiple records into one. Really, I'm just not seeing any reason to create dummy variables here.
@PeterBr wrote:
In terms of the collapse method I was explaining - basically I was just trying to get at reshaping my data. I came across PROC TRANSPOSE which is useful but I noticed it is only useful if one is trying to transpose 1 variable. I will continue to read up on how to reshape data for multiple variables in SAS but if you have any recommendations of great resources for that topic that would be great!
Transpose multiple variables using the %TRANSPOSE macro
https://support.sas.com/resources/papers/proceedings13/538-2013.pdf
A fully automated way to get your yes/no columns:
data procedures;
input claim_id $ proc_cd $;
n = 1; /* create dummy value */
cards;
J3 G23
J3 G123
J2 G23
J2 G123
J3 G23
J4 G23
J5 54
J6 54
J22 54
J28 54
;
proc sort
data=procedures
out=sum
nodupkey
;
by claim_id proc_cd;
run;
proc transpose
data=sum
out=trans (drop=_name_)
prefix=dum_
;
by claim_id;
id proc_cd;
var n;
run;
proc stdize
data=trans
out=want
missing=0
reponly
;
run;
The last step is only there to replace missing values with 0.
data procedures;
input claim_id $ proc_cd $;
cards;
J3 G23
J3 G123
J2 G23
J2 G123
J3 G23
J4 G23
J5 54
J6 54
J22 54
J28 54
;
run;
data have;
set procedures;
dummy=1;
run;
proc logistic data=have outdesign=temp outdesignonly;
class proc_cd/param=glm;
model dummy= proc_cd/nofit noint ;
run;
data want;
merge procedures temp(drop=dummy);
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.