I have a SAS database with a single variable and 450 observations, each one corresponding to an ICD-9 diagnosis code.
I also have a large healthcare claims database in which each observation is a claim with up to 4 diagnosis code fields (dx1-dx4).
I have written code that sets all the 450 observations into global variables. In a macro, I set up a 4-element array consisting of dx1-dx4, and then use a do loop to set an indicator to 1 if any of the elements of the array match the 450 diagnosis codes. If 1, then the observation will output.
This code, however, is taking an extremely long time to run and I am wondering if there is a more efficient alternative. Here it is below. Note that the 450 global variables are called dxcode1, dxcode2...dxcode450.
%let code = dxcode;
%macro search;
data out;
set claims;
array a_dx dx1-dx4;
flag = 0;
%do j = 1 %to 450;
%do i = 1 %to 4;
if a_dx(&i) = "&&&code.&j." then flag = 1;
%end;
%end;
if flag > 0 then output;
run;
%mend search;
Thanks!
Hi, Here is one easy and simple way. I hope I have understood your requirement:-
/*Dataset_450 has the variable with 450 obs and claims has variables dx1-dx4*/
data want;
if (_n_ = 1) then do;
if 0 then set dataset_450;
declare hash myhash(dataset: "dataset_450(keep=variable_450");
rc = myhash.definekey('variable_450');
myhash.definedone();
end;
set claims;
array a_dx(*) dx1-dx4;
do _n_=1 to dim(a_dx);
temp=a_dx(_n_);
if myhash.check(key:temp) = 0 then flag=1;
end;
if flag=1 then output;
drop temp;
run;
Regards,
Naveen Srinivasan
I think that something like the following will run a lot quicker:
data claims; input dx1-dx4; cards; 1 2 3 4 1 3 4 5 1 4 5 6 1 7 8 9 ; %macro search(dxcode); data out; set claims; array a_dx dx1-dx4; flag = 0; if &dxcode in a_dx then do; flag=1; output; end; run; run; %mend search; %search(5)
Art, CEO, AnalystFinder.com
Or load the 450 data set into a temporary array and a single loop with no macro.
https://gist.github.com/statgeek/f052b5223fecca066b1f
Here's an alternative that skips creating macro variables. I'll assume that the 450 observations contained a variable named DXCODE, but the code can easily be modified to use the proper names.
data codelist2;
set codelist end=done;
fmtname='$found';
start = dxcode;
label='1';
output;
if done;
hlo='O';
label='0';
output;
end;
proc format cntlin=codelist2;
run;
This gives you a format that translates each of the 450 codes into "1" and anything else into "0". To keep the code simple, I'll make FLAG a character variable. Not too difficult to change it to numeric though.
Once you have the format, the rest is pretty straightforward.
data want;
set have;
array dx {4};
do i=1 to 4 until (flag='1');
flag = put(dx{i}, $found.);
end;
if flag='1';
run;
I think you can try to use SQL for this problem.
Suppose dataset "lookup" is the one with 450 observations, and dataset "healthcare" is the large healthcare claims database you have.
lookup has a variable called var1
healthcare has 4 variables called dx1-dx4
Then you can create a dataset "selected" outputing all selected observations from healthcare. You might need to replace with your own dataset name or variable names.
Proc sql;
create table selected as
select *
from healthcare
where dx1 = any (select var1 from lookup) or
dx2 = any (select var1 from lookup) or
dx3 = any (select var1 from lookup) or
dx4 = any (select var1 from lookup)
;
quit;
Hi, Here is one easy and simple way. I hope I have understood your requirement:-
/*Dataset_450 has the variable with 450 obs and claims has variables dx1-dx4*/
data want;
if (_n_ = 1) then do;
if 0 then set dataset_450;
declare hash myhash(dataset: "dataset_450(keep=variable_450");
rc = myhash.definekey('variable_450');
myhash.definedone();
end;
set claims;
array a_dx(*) dx1-dx4;
do _n_=1 to dim(a_dx);
temp=a_dx(_n_);
if myhash.check(key:temp) = 0 then flag=1;
end;
if flag=1 then output;
drop temp;
run;
Regards,
Naveen Srinivasan
A couple of notes on things to consider. Your original program generated 1800 IF/THEN statements. It would run faster if you added the word ELSE to 1799 of those statements. That could be done by adding this line after the %do i= statement:
%if &i > 1 or &j > 1 %then else;
And the hashing approach is a good one. If I knew how to program it, I would have. It can be speeded up by using the trick that I used in the format solution, getting the loop to end earlier once a match is found:
do _n_=1 to dim(a_dx) until (flag=1);
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.