Hi All, I am working with medical claims data and recreated a small sample below called have. I'd like to run regressions to predict the value variable and take into account whether a claim_id had a certain procedure (proc_cd) or not via dummy variables. My data is in wide format, and if I used the CLASS function, I don't care about the variable proc_cd1, proc_cd2, etc. separately. What I do care about is for a given proc_cd, did the claim_id have that proc_cd associated with it or not. So basically creating dummies across all of the proc_cd values. I have a lot of proc_cd values in my real dataset so I would like to avoid writing them all out to create the dummies manually. Any suggestions on how to resolve this?
data have;
input claim_id $ proc_cd1 $ proc_cd2 $ proc_cd3 $ proc_cd4 $ value age hr;
cards;
2 234 443 J21 J21 234 23 60
7 J1 J302 J232 J454 45645 43 69
3 J204 543 678 . 3456 45 78
5 J21 J22 . . 234 67 89
;
run;
Cheers,
Peter
Yes, you should use the CLASS statement rather than create your own dummy variables.
But, I get the feeling I am missing the point of your question.
I am going to guess that in this case you need to create indicator variables because the value of interest could occur in multiple variables.
Here is an example of one way to search a list of Prod_cd variables for a given value and create a 1/0 coded value for when found or not.
data have; input claim_id $ proc_cd1 $ proc_cd2 $ proc_cd3 $ proc_cd4 $ value age hr; array p proc_cd: ; VJ21 = whichc('J21',of p(*)) > 0; cards; 2 234 443 J21 J21 234 23 60 7 J1 J302 J232 J454 45645 43 69 3 J204 543 678 . 3456 45 78 5 J21 J22 . . 234 67 89 ; run;
The Whichc , or the numeric counterpart Whichn, searches for the first value in a list of variables/values following and returns the numeric position in the list a value match is found or 0 if not found. SAS returns 1 for true and 0 for false for logical comparisons so the > 0 above returns 1 when the match is found when looking for the value 'J21';
This could be done with a temporary array to hold the values and another array to hold results
data have; input claim_id $ proc_cd1 $ proc_cd2 $ proc_cd3 $ proc_cd4 $ value age hr; array p proc_cd: ; array v (3) $ 4 _temporary_ ('J21' 'J232' '678'); array r (3); do i= 1 to dim(v); r[i] = whichc(v[i],of p(*)) > 0; end; drop i; cards; 2 234 443 J21 J21 234 23 60 7 J1 J302 J232 J454 45645 43 69 3 J204 543 678 . 3456 45 78 5 J21 J22 . . 234 67 89 ; run;
It would be up to you to keep track that r1 is related to the value 'J21'.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.