BookmarkSubscribeRSS Feed
ballewnick
Fluorite | Level 6

Hello,

 

 

I have a data set with ~10 million records/rows. I want to check for the presence of any one of multiple strings across 9 of the variables in the data set. I then want to create a new variable (dis2) in my data set that will take the value of either 0 or 1. Dis2 should have a value of 1 if at least one of the 9 variables has any one of the strings and 0 if none of the 9 variables have any of the strings.

 

I created a macro statement to hold the strings I want to seach for:

%let dc2 = %str('28730', '28739', '2875','280', '281', '282', '283');

 

I then want to check each of these 9 varibales to see if any one of the strings are in any one of the variables:

c_dx1, c_dx2, c_dx3, c_dx4, c_dx5, c_dx6, c_dx7, c_dx8, c_hcpc

 

I was thinking that I could put the 9 variables in an array but I am not sure how to put everything together and create dis2. Any help is greatly appreciated. If you can provide example code of how to solve this problem that would be ideal. Something like:

 

%let dc2 = %str('28730', '28739', '2875','280', '281', '282', '283');

 

data example_w_dis2;

     set example;

     .......;

     .......;

 

Thanks!

3 REPLIES 3
ballewnick
Fluorite | Level 6

Reeza,

 

Thanks for the quick reply. I took a look at the example you directed me to on page 10 and see that as a good starting point. I have started to modify it to suite my needs but don't think I can get it all the way there. There are a few questions I have:

1) The example has a unique key for each item (1 key for each item). I have the same set of 7 keys for all 9 items. Does that make a difference in how the problem needs to be coded?

2) My 7 keys are actually strings. Thus when I search c_dx1, if row 3 has the value 2814, I want that to be recognized as a match (with string 281). In the below example, I don't think 2814 would be scored a 1 (and it needs to be).

 

Here is what I have so far:

 

Array Raw {9} c_dx1, c_dx2, c_dx3, c_dx4, c_dx5, c_dx6, c_dx7, c_dx8, c_hcpc;

Array Key {7} $ _temporary_ ('28730', '28739', '2875','280', '281', '282', '283');

Array Score {9} ; Do i = 1 to 9; if raw{i} eq key{i} then score{i}=1;

else score{i}=0; End;

 

Total = sum( of score1 – score9 );

if total ge 1 then dis2 =1;

else if total dis2 =1;

 

Reeza
Super User

That's a good start.

Couple of pointers, 

1. All variables are character so make sure to include the $ in your array declarations.

2. You should loop over the key array/vars to check for those 7 values. 

3. The score is a check for each value, so if score1=1 then the first value was found. You don't know where in the diagnosis it was found, but it's present. You could store the location if desired.

4. WhichC will search through the list of arrays to find the values. 

5. Your last line doesn't make sense...not sure what you wanted to assign in that line.

 

 

Array Raw {9} $ c_dx1, c_dx2, c_dx3, c_dx4, c_dx5, c_dx6, c_dx7, c_dx8, c_hcpc;
Array Key {7} $ _temporary_ ('28730', '28739', '2875','280', '281', '282', '283');
Array Score {7} (7*0) ; 

Do i = 1 to 7;
if whichc(key(i), of raw(*))>0 then score(i)=1;

 End;
 
Total = sum( of score1 – score7 );
if total ge 1 then dis2 =1;
else if total dis2 =1; *doesn't make sense;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2929 views
  • 0 likes
  • 2 in conversation