BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
GPatel
Pyrite | Level 9
I have attached sample "sample_qc_diab" SAS dataset. Following SAS code uses array to identify "Diabetes" case using Array.  I have more then 14 Mil records and 50 variables in each FY  (from 09-22) and Array process takes on an average 15 to 20 minutes. 
 
Is it possible to get the same results using HASH for efficiency ? 
 
Appreciate any help from the community.
 
libname xyz "D:\temp\xyz"; 
 
 %global lkup lkup2;  

   %let lkup = %str('250' 'E109' 'E111');
   %let lkup2 = %str('27889');


data sample_qc;;
set xyz.sample_qc_diab   ;
run;  
 

data xx;
 set sample_qc(rename=( cpt_4=cpt4 cpt_5=cpt5 cpt_6=cpt6 cpt_7=cpt7
                 cpt_8=cpt8 cpt_9=cpt9 cpt_10=cpt10 cpt_11=cpt11 cpt_12=cpt12 cpt_13=cpt13));
 
   array tem{1:10} dx1-dx10;

  do i=1 to 10;

  if tem(i) ^=" " then do;
  if tem(i) in (&lkup) then flag=1;
     end;
  end;
   array cpt{4:13} cpt4-cpt13;
                        do j= 4 to 13;  
            if not missing(cpt[j])   then do;
              if cpt(j) in (&lkup2) then flag=1;
              end;
              end;    
      if flag=1;      
run;
proc print data=xx(obs=10 );
var cpt: dx: fy flag;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Are you after something like below?

data work.small;
  input event :$5.;
  cards;
250
E109
E111
27889
;
run;

libname sample 'c:\temp';
data subset;
  length flag $1;

  if _n_ = 1 then
    do;
      if 0 then set work.small(keep=event);
      declare hash h(dataset:'small');
      h.defineKey('event');
      h.defineDone();
    end;
  call missing(of _all_);

  set sample.sample_qc_diab;

  array _ev {*} cpt_4-cpt_13 dx1-dx10;
  do _i=1 to dim(_ev);
    if h.find(key:_ev[_i])=0 then
      do;
        flag='Y';
        leave;
      end;
  end;

/*  if missing(event) then flag='N';*/
  if flag='Y' then output;

run;
libname sample clear;

Patrick_0-1662863561071.png

 

View solution in original post

11 REPLIES 11
Tom
Super User Tom
Super User

How would a hash object help? What would it contain? How would that reduce processing time?

 

Explain in words what it is that the program id doing:

Spoiler
Looks to me like you are finding observations that have EITHER the diagnosis code OR the procedure code.

Why are you skipping CPT1 to CPT3 ?

 

You can simplify your code a little.

%let lkup = '250' 'E109' 'E111' ;
%let lkup2 = '27889' ;
data xx;
  set xyz.sample_qc_diab ;
  array diag dx1-dx10;
  array proc cpt_4-cpt_13 ;
  flag=0;
  do index=1 to dim(diag) while (not flag);
    flag = diag[index] in (&lkup) ;
  end;
  do index=1 to dim(proc) while (not flag);
    flag = proc[index] in (&lkup2) ;
  end;
  if flag ;
run;

 

 

GPatel
Pyrite | Level 9
These are considered to be keys '250' 'E109' 'E111' '27889' or look up table.
From big dataset, look for these keys in dx1-dx10 or cpt4-cpt13. If you encounter key, create a flag as 1.
One can create small dataset with four keys '250' ,'E109', 'E111', '27889' , and use these keys to identify and isolate records where you see these values.
Tom
Super User Tom
Super User

@GPatel wrote:
These are considered to be keys '250' 'E109' 'E111' '27889' or look up table.
From big dataset, look for these keys in dx1-dx10 or cpt4-cpt13. If you encounter key, create a flag as 1.
One can create small dataset with four keys '250' ,'E109', 'E111', '27889' , and use these keys to identify and isolate records where you see these values.

You could.  But it would add extra processing time over the current method. 

 

But it would allow two enhancements that might help.

1) The list of diagnosis codes could be larger than want can be put into a single macro variable.  

2) You could possibly try to retrieve some other information from the lookup table.  But that is not likely in this case since you are scanning multiple different variables with diagnosis codes so what possible auxiliary information could you retrieve from the lookup table?

 

If you want to store the diagnosis codes into a lookup table you could just add a step to pull them into the macro variable so that you could continue to use the existing data step to filter.

data codes;
  input type $ code $ ;
cards;
ICD 250
ICD E109
ICD E111
CPT 27889
;

proc sql noprint;
  select quote(trim(code))
    into :lkup separated by ' '
    from codes
    where type='ICD'
  ;
  select quote(trim(code))
    into :lkup2 separated by ' '
    from codes
    where type='CPT'
  ;
quit;

 

GPatel
Pyrite | Level 9
Tom:
My goal is to re-engineer the process.
I would like to use HASH method. Is it feasible?

Tom
Super User Tom
Super User

@GPatel wrote:
Tom:
My goal is to re-engineer the process.
I would like to use HASH method. Is it feasible?


It could work.  But it would be more code to maintain and require longer to run.

 

Essentially you would just end up replacing the IN operator with a call to the FIND() method.

 

 

GPatel
Pyrite | Level 9
Tom:
The end goal is to capture and monitor total visits, associated costs, demographic matrix, and medications and supplies, etc. by FY, Region, Age etc.
I want to compare time it takes by both Array and HASH methods with real data (15-20 Million records) on windows unix platform.
I am sure, HASH will be the choice of the game since it is faster and efficient for table look up activities.
PeterClemmensen
Tourmaline | Level 20

"Is it possible to get the same results using HASH for efficiency?"

 

To me, the answer is two-fold. Yes, it is possible to do the same thing with a hash object. But for efficiency, I don't see how the hash object will do any good.

 

You want to re-engineer the process. How? What values would go into the hash object? That is the real question before we can answer if it even makes sense to use a hash object. 

 

The hash object is very powerful and under the right circumstances, it can boost performance tremendously. However, it is not a magic wand that does wonders to the run-time of SAS processes whenever we use it 🙂

 

Best Regards Peter.

GPatel
Pyrite | Level 9
PeterClemmensen and Tom :

I created following sample program and used HASH. I got the expected results.
It does what is expected. However, with the real data ( all 10 dxi's and 13 cpti's) code will be little bit messy. Can any body shed some more lights to make SAS more dynamic? I have to dedup dxi's and cpti's.


data small;
length event $ 5.;
input event $5.;
cards;
250
E109
E111
27889
;
run;


data sample_qc;;
length event $ 10;
set sample_qc_diab ;
event=dx1; output; event=dx2; output;event=dx3; output; event=dx4; output; event=dx5; output;
event=dx6; output; event=dx7; output;event=dx8; output; event=dx9; output; event=dx10; output;

event=cpt_4; output; event=cpt_5; output;
event=cpt_6; output; event=cpt_7; output;event=cpt_8; output; event=cpt_9; output; event=cpt_10; output;
event=cpt_11; output; event=cpt_12; output;event=cpt_13; output;
run;

data subset;
length flag $ 5.;
if _N_ = 1 then do;
declare hash h(dataset:'small');
h.definekey('EVENT');
H.DEFINEDONE();
END;
set sample_qc ;

If h.find() = 0 then flag='Yes';
If h.find() ^= 0 then flag='No';
if flag='Yes';
run;
Patrick
Opal | Level 21

Are you after something like below?

data work.small;
  input event :$5.;
  cards;
250
E109
E111
27889
;
run;

libname sample 'c:\temp';
data subset;
  length flag $1;

  if _n_ = 1 then
    do;
      if 0 then set work.small(keep=event);
      declare hash h(dataset:'small');
      h.defineKey('event');
      h.defineDone();
    end;
  call missing(of _all_);

  set sample.sample_qc_diab;

  array _ev {*} cpt_4-cpt_13 dx1-dx10;
  do _i=1 to dim(_ev);
    if h.find(key:_ev[_i])=0 then
      do;
        flag='Y';
        leave;
      end;
  end;

/*  if missing(event) then flag='N';*/
  if flag='Y' then output;

run;
libname sample clear;

Patrick_0-1662863561071.png

 

GPatel
Pyrite | Level 9
GM Patrick,
I like and accept your solution to my posting. It is what I was looking for, and it works like a charm. Appreciate your help and time.
One question to ask you : Can small data entries be a macro variable, and then use this macro variable in HASH reference as h.defineKey('event');?

Regards, GPatel
Patrick
Opal | Level 21

@GPatel wrote:
GM Patrick,
I like and accept your solution to my posting. It is what I was looking for, and it works like a charm. Appreciate your help and time.
One question to ask you : Can small data entries be a macro variable, and then use this macro variable in HASH reference as h.defineKey('event');?

Regards, GPatel

If I understand your question right then NO. You need a table/view as source for loading a hash table.

You can always create a macro variable with a string of values and then use that via an IN operator: variable in (&macro_var).

 

If you have a list of key/value pairs then you could also create a format using cntlin= and then use the format in syntax like: if not missint(put(variable,<format>)) then do...
(just define the other case as returning a blank value).

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1692 views
  • 0 likes
  • 4 in conversation