DATA Step, Macro, Functions and more

Create a new variable based on multiple conditions across multiple variables

Reply
Occasional Contributor
Posts: 6

Create a new variable based on multiple conditions across multiple variables

Hello,

 

 

I have a data set with ~10 million records/rows. I want to check for the presence of any one of multiple strings across 9 of the variables in the data set. I then want to create a new variable (dis2) in my data set that will take the value of either 0 or 1. Dis2 should have a value of 1 if at least one of the 9 variables has any one of the strings and 0 if none of the 9 variables have any of the strings.

 

I created a macro statement to hold the strings I want to seach for:

%let dc2 = %str('28730', '28739', '2875','280', '281', '282', '283');

 

I then want to check each of these 9 varibales to see if any one of the strings are in any one of the variables:

c_dx1, c_dx2, c_dx3, c_dx4, c_dx5, c_dx6, c_dx7, c_dx8, c_hcpc

 

I was thinking that I could put the 9 variables in an array but I am not sure how to put everything together and create dis2. Any help is greatly appreciated. If you can provide example code of how to solve this problem that would be ideal. Something like:

 

%let dc2 = %str('28730', '28739', '2875','280', '281', '282', '283');

 

data example_w_dis2;

     set example;

     .......;

     .......;

 

Thanks!

Super User
Posts: 19,878

Re: Create a new variable based on multiple conditions across multiple variables

Posted in reply to ballewnick

A temporary array with a lookup is what you want.

 

https://onlinecourses.science.psu.edu/stat481/node/51

 

Example on page 10 is close to what you want.

http://www.lexjansen.com/nesug/nesug05/pm/pm8.pdf

Occasional Contributor
Posts: 6

Re: Create a new variable based on multiple conditions across multiple variables

Reeza,

 

Thanks for the quick reply. I took a look at the example you directed me to on page 10 and see that as a good starting point. I have started to modify it to suite my needs but don't think I can get it all the way there. There are a few questions I have:

1) The example has a unique key for each item (1 key for each item). I have the same set of 7 keys for all 9 items. Does that make a difference in how the problem needs to be coded?

2) My 7 keys are actually strings. Thus when I search c_dx1, if row 3 has the value 2814, I want that to be recognized as a match (with string 281). In the below example, I don't think 2814 would be scored a 1 (and it needs to be).

 

Here is what I have so far:

 

Array Raw {9} c_dx1, c_dx2, c_dx3, c_dx4, c_dx5, c_dx6, c_dx7, c_dx8, c_hcpc;

Array Key {7} $ _temporary_ ('28730', '28739', '2875','280', '281', '282', '283');

Array Score {9} ; Do i = 1 to 9; if raw{i} eq key{i} then score{i}=1;

else score{i}=0; End;

 

Total = sum( of score1 – score9 );

if total ge 1 then dis2 =1;

else if total dis2 =1;

 

Super User
Posts: 19,878

Re: Create a new variable based on multiple conditions across multiple variables

Posted in reply to ballewnick

That's a good start.

Couple of pointers, 

1. All variables are character so make sure to include the $ in your array declarations.

2. You should loop over the key array/vars to check for those 7 values. 

3. The score is a check for each value, so if score1=1 then the first value was found. You don't know where in the diagnosis it was found, but it's present. You could store the location if desired.

4. WhichC will search through the list of arrays to find the values. 

5. Your last line doesn't make sense...not sure what you wanted to assign in that line.

 

 

Array Raw {9} $ c_dx1, c_dx2, c_dx3, c_dx4, c_dx5, c_dx6, c_dx7, c_dx8, c_hcpc;
Array Key {7} $ _temporary_ ('28730', '28739', '2875','280', '281', '282', '283');
Array Score {7} (7*0) ; 

Do i = 1 to 7;
if whichc(key(i), of raw(*))>0 then score(i)=1;

 End;
 
Total = sum( of score1 – score7 );
if total ge 1 then dis2 =1;
else if total dis2 =1; *doesn't make sense;
Ask a Question
Discussion stats
  • 3 replies
  • 161 views
  • 0 likes
  • 2 in conversation