Solved: Creating Dummy Variables By Group to Merge

PeterBr · Posted 04-07-2020 06:25 PM

Hi All - I am trying to create dummy variables for values of proc_cd (without typing them out since there are many in a huge dataset I'm working with) by the claim_id variable - let me use the data to show what I am starting with and my want data.

Start data:

data procedures;
input claim_id $ proc_cd $;
cards;
J3 G23
J3 G123
J2 G23
J2 G123
J3 G23
J4 G23
J5 54
J6 54
J22 54
J28 54
;
run;

Then I want to create dummies for each prod_cd by the claim_id variable so that if at any point that claim_id had, say for example, proc_cd = 23, then the value for the dummy for proc_cd = 23 would be 1 for every time that claim_id comes up in the data. But again because my dataset has so many proc_cd, I would like to avoid writing them all out.

So I would end up with:

data procedures_dummy;
input claim_id $ proc_cd $ dum_G23 $ dum_G123 $ dum_54 $;
cards;
J3 G23 1 1 1
J3 G123 1 1 1
J2 G23 1 1 0
J2 G123 1 1 0
J3 54 1 1 1
J4 G23 1 0 0
J5 54 0 0 1
J6 54 0 0 1
J22 54 0 0 1
J28 54 0 0 1
;
run;

If I had that, it would be great!

Just to give some more context - my vision with this is to then perform the following code to get a dataset with only one observation row for each unique claim_id:

proc sort data = procedures_dummy;
by claim_id;
run;

DATA procedures_dummy_collapse;
SET procedures_dummy;
BY claim_id;
IF first.claim_id;
RUN;

Which would yield:

data procedures_dummy_collapse;
input claim_id $ proc_cd $ dum_G23 $ dum_G123 $ dum_54 $;
cards;
J2 G23 1 1 0
J22 54 0 0 1
J28 54 0 0 1
J3 G23 1 1 1
J4 G23 1 0 0
J5 54 0 0 1
J6 54 0 0 1
;
run;

And then at this point I would have another dataset with one observation row for each unique claim_id waiting to merge it with:

data charges;
input claim_id $ total_chrg;
cards;
J3 3324
J2 234
J4 234
J5 3456
J6 34576
J22 5789
J28 345
;
run;

proc sort data = charges;
by claim_id;
run;
DATA charges_proc_cd;
MERGE charges(IN = fromchargex) procedures_dummy_collapse(IN = fromprocx);
BY claim_id;
fromcharge = fromchargex;
fromproc = fromprocx;
RUN;

So basically - this is all to take a dataset that has multiple copies of a unique claim_id, collapse it into a dataset that has one row of observations for each unique claim_id, and then merge it into a dataset that only has one row of observations for each unique claim_id. Then I am going to perform regression models on that. If somebody has a better way to do this - I'd be happy to hear your thoughts too!

Cheers,

Peter

Kurt_Bremser · Posted 04-08-2020 04:45 AM

A fully automated way to get your yes/no columns:

data procedures;
input claim_id $ proc_cd $;
n = 1; /* create dummy value */
cards;
J3 G23
J3 G123
J2 G23
J2 G123
J3 G23
J4 G23
J5 54
J6 54
J22 54
J28 54
;

proc sort
  data=procedures
  out=sum
  nodupkey
;
by claim_id proc_cd;
run;

proc transpose
  data=sum
  out=trans (drop=_name_)
  prefix=dum_
;
by claim_id;
id proc_cd;
var n;
run;

proc stdize
  data=trans
  out=want
  missing=0
  reponly
;
run;

The last step is only there to replace missing values with 0.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

PaigeMiller · Posted 04-07-2020 06:31 PM

There are several PROCs that can create dummy variables in SAS, so you don't have to write your own code. One is PROC GLMMOD, but other PROCs are available to do this as well. Here is a thread that discusses this issue: https://communities.sas.com/t5/SAS-Procedures/How-can-i-create-dummy-variables-How-can-i-change-my-c...

Furthermore, depending on what you are going to do with the dummy variables, you may not have to create them at all, as many SAS PROCs allow a CLASS statement, so the dummy variables are created internally in the PROC, and you never have to bother creating them. Let SAS do the work.

So, really, don't create dummy variables yourself. It's not worth the effort.

But I also don't see why dummy variables are needed to do a merge or to "collapse" multiple records into one. Really, I'm just not seeing any reason to create dummy variables here.

--
Paige Miller

PeterBr · Posted 04-08-2020 12:02 AM

Hi Paige - thanks for referring me to the PROC thread, it was very useful. I'm coming over to SAS from STATA where I had to create my own dummies so that was the default in my mind - very cool that SAS lets you declare a CLASS.

In terms of the collapse method I was explaining - basically I was just trying to get at reshaping my data. I came across PROC TRANSPOSE which is useful but I noticed it is only useful if one is trying to transpose 1 variable. I will continue to read up on how to reshape data for multiple variables in SAS but if you have any recommendations of great resources for that topic that would be great!

Thanks again for your help!
Peter

PaigeMiller · Posted 04-08-2020 07:27 AM

@PeterBr wrote:

In terms of the collapse method I was explaining - basically I was just trying to get at reshaping my data. I came across PROC TRANSPOSE which is useful but I noticed it is only useful if one is trying to transpose 1 variable. I will continue to read up on how to reshape data for multiple variables in SAS but if you have any recommendations of great resources for that topic that would be great!

Transpose multiple variables using the %TRANSPOSE macro

https://support.sas.com/resources/papers/proceedings13/538-2013.pdf

--
Paige Miller

PeterBr · Posted 04-09-2020 10:44 AM

Hi Paige - I ran a proc glmselect with the lasso selection and I specified a few variables in the CLASS option as you suggested. While SAS accepted my code (below), I don't see any of the dummies in the analysis in the results viewer. I'm wondering if I am doing something wrong or missing something because I would like to see them explicitly in the model analysis. The variables in my model 'proc_cd26055--proc_cd93971' are dummies I created myself and I do see those included in the results viewer as part of the analysis. I really appreciate your input!

CODE:
Title '2018 OP Lasso1';
proc glmselect data=Sparcs_2018_op plots(stepaxis=normb)=all;
CLASS gender_cd fac_id hsa prov_npi idP typeP;
MODEL chrg_tot_amt = age proc_cd26055--proc_cd93971 race_ethnic_cdE1--race_ethnic_cdR2_17
/selection = lasso(stop=none choose=cvex);
run;

Kurt_Bremser · Posted 04-08-2020 04:45 AM

A fully automated way to get your yes/no columns:

data procedures;
input claim_id $ proc_cd $;
n = 1; /* create dummy value */
cards;
J3 G23
J3 G123
J2 G23
J2 G123
J3 G23
J4 G23
J5 54
J6 54
J22 54
J28 54
;

proc sort
  data=procedures
  out=sum
  nodupkey
;
by claim_id proc_cd;
run;

proc transpose
  data=sum
  out=trans (drop=_name_)
  prefix=dum_
;
by claim_id;
id proc_cd;
var n;
run;

proc stdize
  data=trans
  out=want
  missing=0
  reponly
;
run;

The last step is only there to replace missing values with 0.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 04-08-2020 07:04 AM


data procedures;
input claim_id $ proc_cd $;
cards;
J3 G23
J3 G123
J2 G23
J2 G123
J3 G23
J4 G23
J5 54
J6 54
J22 54
J28 54
;
run;

data have;
 set procedures;
 dummy=1;
run;
proc logistic data=have outdesign=temp outdesignonly;
class  proc_cd/param=glm;
model dummy= proc_cd/nofit noint ;
run;

data want;
 merge procedures temp(drop=dummy);
run;

Creating Dummy Variables By Group to Merge

Re: Creating Dummy Variables By Group to Merge

Re: Creating Dummy Variables By Group to Merge

Re: Creating Dummy Variables By Group to Merge

Re: Creating Dummy Variables By Group to Merge

Re: Creating Dummy Variables By Group to Merge

Re: Creating Dummy Variables By Group to Merge

Re: Creating Dummy Variables By Group to Merge

Register Today!

SAS Training: Just a Click Away