Hello,
I have a data set with ~10 million records/rows. I want to check for the presence of any one of multiple strings across 9 of the variables in the data set. I then want to create a new variable (dis2) in my data set that will take the value of either 0 or 1. Dis2 should have a value of 1 if at least one of the 9 variables has any one of the strings and 0 if none of the 9 variables have any of the strings.
I created a macro statement to hold the strings I want to seach for:
%let dc2 = %str('28730', '28739', '2875','280', '281', '282', '283');
I then want to check each of these 9 varibales to see if any one of the strings are in any one of the variables:
c_dx1, c_dx2, c_dx3, c_dx4, c_dx5, c_dx6, c_dx7, c_dx8, c_hcpc
I was thinking that I could put the 9 variables in an array but I am not sure how to put everything together and create dis2. Any help is greatly appreciated. If you can provide example code of how to solve this problem that would be ideal. Something like:
%let dc2 = %str('28730', '28739', '2875','280', '281', '282', '283');
data example_w_dis2;
set example;
.......;
.......;
Thanks!
A temporary array with a lookup is what you want.
https://onlinecourses.science.psu.edu/stat481/node/51
Example on page 10 is close to what you want.
Reeza,
Thanks for the quick reply. I took a look at the example you directed me to on page 10 and see that as a good starting point. I have started to modify it to suite my needs but don't think I can get it all the way there. There are a few questions I have:
1) The example has a unique key for each item (1 key for each item). I have the same set of 7 keys for all 9 items. Does that make a difference in how the problem needs to be coded?
2) My 7 keys are actually strings. Thus when I search c_dx1, if row 3 has the value 2814, I want that to be recognized as a match (with string 281). In the below example, I don't think 2814 would be scored a 1 (and it needs to be).
Here is what I have so far:
Array Raw {9} c_dx1, c_dx2, c_dx3, c_dx4, c_dx5, c_dx6, c_dx7, c_dx8, c_hcpc;
Array Key {7} $ _temporary_ ('28730', '28739', '2875','280', '281', '282', '283');
Array Score {9} ; Do i = 1 to 9; if raw{i} eq key{i} then score{i}=1;
else score{i}=0; End;
Total = sum( of score1 – score9 );
if total ge 1 then dis2 =1;
else if total dis2 =1;
That's a good start.
Couple of pointers,
1. All variables are character so make sure to include the $ in your array declarations.
2. You should loop over the key array/vars to check for those 7 values.
3. The score is a check for each value, so if score1=1 then the first value was found. You don't know where in the diagnosis it was found, but it's present. You could store the location if desired.
4. WhichC will search through the list of arrays to find the values.
5. Your last line doesn't make sense...not sure what you wanted to assign in that line.
Array Raw {9} $ c_dx1, c_dx2, c_dx3, c_dx4, c_dx5, c_dx6, c_dx7, c_dx8, c_hcpc;
Array Key {7} $ _temporary_ ('28730', '28739', '2875','280', '281', '282', '283');
Array Score {7} (7*0) ;
Do i = 1 to 7;
if whichc(key(i), of raw(*))>0 then score(i)=1;
End;
Total = sum( of score1 – score7 );
if total ge 1 then dis2 =1;
else if total dis2 =1; *doesn't make sense;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.