DATA Step, Macro, Functions and more

Efficient code for searching through a list

Reply
Frequent Contributor
Posts: 82

Efficient code for searching through a list

I have a SAS database with a single variable and 450 observations, each one corresponding to an ICD-9 diagnosis code.

 

I also have a large healthcare claims database in which each observation is a claim with up to 4 diagnosis code fields (dx1-dx4).

 

I have written code that sets all the 450 observations into global variables.  In a macro, I set up a 4-element array consisting of dx1-dx4, and then use a do loop to set an indicator to 1 if any of the elements of the array match the 450 diagnosis codes.  If 1, then the observation will output.  

 

This code, however, is taking an extremely long time  to run and I am wondering if there is a more efficient alternative.  Here it is below. Note that the 450 global variables are called dxcode1, dxcode2...dxcode450.

 

%let code = dxcode;

 

%macro search;

data out;

set claims;

array a_dx dx1-dx4;

flag = 0;

%do j = 1 %to 450;

       %do i = 1 %to 4;

               if a_dx(&i) = "&&&code.&j." then flag = 1;

        %end;

%end;

if flag > 0 then output;

run;

%mend search;

 

 

 

Thanks!

PROC Star
Posts: 7,471

Re: Efficient code for searching through a list

I think that something like the following will run a lot quicker:

 

data claims;
  input dx1-dx4;
  cards;
1 2 3 4
1 3 4 5
1 4 5 6
1 7 8 9
;

%macro search(dxcode);
  data out;
    set claims;
    array a_dx dx1-dx4;
    flag = 0;
    if &dxcode in a_dx then do;
      flag=1;
      output;
    end;
  run;
run;
%mend search;

%search(5)

Art, CEO, AnalystFinder.com

 

Super User
Posts: 19,785

Re: Efficient code for searching through a list

Or load the 450 data set into a temporary array and a single loop with no macro. 

 

https://gist.github.com/statgeek/f052b5223fecca066b1f

 

 

Super User
Posts: 5,500

Re: Efficient code for searching through a list

[ Edited ]

Here's an alternative that skips creating macro variables.  I'll assume that the 450 observations contained a variable named DXCODE, but the code can easily be modified to use the proper names.

 

data codelist2;

set codelist end=done;

fmtname='$found';

start = dxcode;

label='1';

output;

if done;

hlo='O';

label='0';

output;

end;

 

proc format cntlin=codelist2;

run;

 

This gives you a format that translates each of the 450 codes into "1" and anything else into "0".  To keep the code simple, I'll make FLAG a character variable.  Not too difficult to change it to numeric though.

 

Once you have the format, the rest is pretty straightforward.

 

data want;

set have;

array dx {4};

do i=1 to 4 until (flag='1');

   flag = put(dx{i}, $found.);

end;

if flag='1';

run;

Occasional Contributor
Posts: 9

Re: Efficient code for searching through a list

I think you can try to use SQL for this problem.

 

Suppose dataset "lookup" is the one with 450 observations, and dataset "healthcare" is the large healthcare claims database you have.

 

lookup has a variable called var1

healthcare has 4 variables called dx1-dx4

 

Then you can create a dataset "selected" outputing all selected observations from healthcare. You might need to replace with your own dataset name or variable names.

 

Proc sql; 

 create table selected as 

 select * 

 from healthcare

 where dx1 = any (select var1 from lookup) or 

            dx2 = any (select var1 from lookup) or 

            dx3 = any (select var1 from lookup) or 

            dx4 = any (select var1 from lookup)

  ;

quit;

PROC Star
Posts: 283

Re: Efficient code for searching through a list

[ Edited ]

Hi, Here is one easy and simple way. I hope I have understood your requirement:-

 

 

/*Dataset_450 has the variable with 450 obs and claims has variables dx1-dx4*/


data want;
if (_n_ = 1) then do;
if 0 then set dataset_450;
declare hash myhash(dataset: "dataset_450(keep=variable_450");
rc = myhash.definekey('variable_450');
myhash.definedone();
end;
set claims;
array a_dx(*) dx1-dx4;
do _n_=1 to dim(a_dx);
temp=a_dx(_n_);
if myhash.check(key:temp) = 0 then flag=1;
end;
if flag=1 then output;
drop temp;
run;

 

Regards,

Naveen Srinivasan

Super User
Posts: 5,500

Re: Efficient code for searching through a list

A couple of notes on things to consider.  Your original program generated 1800 IF/THEN statements.  It would run faster if you added the word ELSE to 1799 of those statements.  That could be done by adding this line after the %do i= statement:

 

%if &i > 1 or &j > 1 %then else;

 

And the hashing approach is a good one.  If I knew how to program it, I would have.  It can be speeded up by using the trick that I used in the format solution, getting the loop to end earlier once a match is found:

 

do _n_=1 to dim(a_dx) until (flag=1);

Ask a Question
Discussion stats
  • 6 replies
  • 235 views
  • 3 likes
  • 6 in conversation