SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
PeterBr
Obsidian | Level 7

Hi All, I am working with medical claims data and recreated a small sample below called have. I'd like to run regressions to predict the value variable and take into account whether a claim_id had a certain procedure (proc_cd) or not via dummy variables. My data is in wide format, and if I used the CLASS function, I don't care about the variable proc_cd1, proc_cd2, etc. separately. What I do care about is for a given proc_cd, did the claim_id have that proc_cd associated with it or not. So basically creating dummies across all of the proc_cd values. I have a lot of proc_cd values in my real dataset so I would like to avoid writing them all out to create the dummies manually. Any suggestions on how to resolve this?

 


data have;
input claim_id $ proc_cd1 $ proc_cd2 $ proc_cd3 $ proc_cd4 $ value age hr;
cards;
2 234 443 J21 J21 234 23 60
7 J1 J302 J232 J454 45645 43 69
3 J204 543 678 . 3456 45 78
5 J21 J22 . . 234 67 89
;
run;

 

Cheers,

Peter

2 REPLIES 2
PaigeMiller
Diamond | Level 26

Yes, you should use the CLASS statement rather than create your own dummy variables.

 

But, I get the feeling I am missing the point of your question.

--
Paige Miller
ballardw
Super User

I am going to guess that in this case you need to create indicator variables because the value of interest could occur in multiple variables.

 

Here is an example of one way to search a list of Prod_cd variables for a given value and create a 1/0 coded value for when found or not.

data have;
input claim_id $ proc_cd1 $ proc_cd2 $ proc_cd3 $ proc_cd4 $ value age hr;
   array p proc_cd: ;
   VJ21 = whichc('J21',of p(*)) > 0;
cards;
2 234 443 J21 J21 234 23 60
7 J1 J302 J232 J454 45645 43 69
3 J204 543 678 . 3456 45 78
5 J21 J22 . . 234 67 89
;
run;

The Whichc , or the numeric counterpart Whichn, searches for the first value in a list of variables/values following and returns the numeric position in the list a value match is found or 0 if not found. SAS returns 1 for true and 0 for false for logical comparisons so the > 0 above returns 1 when the match is found when looking for the value 'J21';

 

This could be done with a temporary array to hold the values and another array to hold results

data have;
input claim_id $ proc_cd1 $ proc_cd2 $ proc_cd3 $ proc_cd4 $ value age hr;
   array p proc_cd: ;
   array v (3) $ 4 _temporary_ ('J21' 'J232' '678');
   array r (3);
   do i= 1 to dim(v);
      r[i] = whichc(v[i],of p(*)) > 0;
   end;
   drop i;
cards;
2 234 443 J21 J21 234 23 60
7 J1 J302 J232 J454 45645 43 69
3 J204 543 678 . 3456 45 78
5 J21 J22 . . 234 67 89
;
run;

It would be up to you to keep track that r1 is related to the value 'J21'.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 568 views
  • 0 likes
  • 3 in conversation