BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chuakp
Obsidian | Level 7

I have a SAS database with a single variable and 450 observations, each one corresponding to an ICD-9 diagnosis code.

 

I also have a large healthcare claims database in which each observation is a claim with up to 4 diagnosis code fields (dx1-dx4).

 

I have written code that sets all the 450 observations into global variables.  In a macro, I set up a 4-element array consisting of dx1-dx4, and then use a do loop to set an indicator to 1 if any of the elements of the array match the 450 diagnosis codes.  If 1, then the observation will output.  

 

This code, however, is taking an extremely long time  to run and I am wondering if there is a more efficient alternative.  Here it is below. Note that the 450 global variables are called dxcode1, dxcode2...dxcode450.

 

%let code = dxcode;

 

%macro search;

data out;

set claims;

array a_dx dx1-dx4;

flag = 0;

%do j = 1 %to 450;

       %do i = 1 %to 4;

               if a_dx(&i) = "&&&code.&j." then flag = 1;

        %end;

%end;

if flag > 0 then output;

run;

%mend search;

 

 

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi, Here is one easy and simple way. I hope I have understood your requirement:-

 

 

/*Dataset_450 has the variable with 450 obs and claims has variables dx1-dx4*/


data want;
if (_n_ = 1) then do;
if 0 then set dataset_450;
declare hash myhash(dataset: "dataset_450(keep=variable_450");
rc = myhash.definekey('variable_450');
myhash.definedone();
end;
set claims;
array a_dx(*) dx1-dx4;
do _n_=1 to dim(a_dx);
temp=a_dx(_n_);
if myhash.check(key:temp) = 0 then flag=1;
end;
if flag=1 then output;
drop temp;
run;

 

Regards,

Naveen Srinivasan

View solution in original post

6 REPLIES 6
art297
Opal | Level 21

I think that something like the following will run a lot quicker:

 

data claims;
  input dx1-dx4;
  cards;
1 2 3 4
1 3 4 5
1 4 5 6
1 7 8 9
;

%macro search(dxcode);
  data out;
    set claims;
    array a_dx dx1-dx4;
    flag = 0;
    if &dxcode in a_dx then do;
      flag=1;
      output;
    end;
  run;
run;
%mend search;

%search(5)

Art, CEO, AnalystFinder.com

 

Reeza
Super User

Or load the 450 data set into a temporary array and a single loop with no macro. 

 

https://gist.github.com/statgeek/f052b5223fecca066b1f

 

 

Astounding
PROC Star

Here's an alternative that skips creating macro variables.  I'll assume that the 450 observations contained a variable named DXCODE, but the code can easily be modified to use the proper names.

 

data codelist2;

set codelist end=done;

fmtname='$found';

start = dxcode;

label='1';

output;

if done;

hlo='O';

label='0';

output;

end;

 

proc format cntlin=codelist2;

run;

 

This gives you a format that translates each of the 450 codes into "1" and anything else into "0".  To keep the code simple, I'll make FLAG a character variable.  Not too difficult to change it to numeric though.

 

Once you have the format, the rest is pretty straightforward.

 

data want;

set have;

array dx {4};

do i=1 to 4 until (flag='1');

   flag = put(dx{i}, $found.);

end;

if flag='1';

run;

CiCi
Fluorite | Level 6

I think you can try to use SQL for this problem.

 

Suppose dataset "lookup" is the one with 450 observations, and dataset "healthcare" is the large healthcare claims database you have.

 

lookup has a variable called var1

healthcare has 4 variables called dx1-dx4

 

Then you can create a dataset "selected" outputing all selected observations from healthcare. You might need to replace with your own dataset name or variable names.

 

Proc sql; 

 create table selected as 

 select * 

 from healthcare

 where dx1 = any (select var1 from lookup) or 

            dx2 = any (select var1 from lookup) or 

            dx3 = any (select var1 from lookup) or 

            dx4 = any (select var1 from lookup)

  ;

quit;

novinosrin
Tourmaline | Level 20

Hi, Here is one easy and simple way. I hope I have understood your requirement:-

 

 

/*Dataset_450 has the variable with 450 obs and claims has variables dx1-dx4*/


data want;
if (_n_ = 1) then do;
if 0 then set dataset_450;
declare hash myhash(dataset: "dataset_450(keep=variable_450");
rc = myhash.definekey('variable_450');
myhash.definedone();
end;
set claims;
array a_dx(*) dx1-dx4;
do _n_=1 to dim(a_dx);
temp=a_dx(_n_);
if myhash.check(key:temp) = 0 then flag=1;
end;
if flag=1 then output;
drop temp;
run;

 

Regards,

Naveen Srinivasan

Astounding
PROC Star

A couple of notes on things to consider.  Your original program generated 1800 IF/THEN statements.  It would run faster if you added the word ELSE to 1799 of those statements.  That could be done by adding this line after the %do i= statement:

 

%if &i > 1 or &j > 1 %then else;

 

And the hashing approach is a good one.  If I knew how to program it, I would have.  It can be speeded up by using the trick that I used in the format solution, getting the loop to end earlier once a match is found:

 

do _n_=1 to dim(a_dx) until (flag=1);

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2475 views
  • 3 likes
  • 6 in conversation