Hello,
I want to use array to find unique icd code in a dataset. The first code is an icd diagnoses code and the second is a procedure code. My dataset has millions of record. The diagnosis code is from 1-12 and the procedure code is from 1-25. For the purposes of the example , i used 1-2 for diagnosis and 1-3 for procedure code. I would like to find any records with either one or multiple codes in the encounter so I can subset to only icd codes of interest. My code below is not doing what I want. Thanks in advance.
 /* Sample data lines */
   data have;
      input patient_id $ 
          proc1 $ 
          proc2 $ 
          diag1 $ 
          diag2 $;
    datalines;
001 A0H7XJZ B4JZZZZ F1020 Z864
002 H0ZXPZZ K4MZZZZ E669 Z733
003 C02DXZZ D0Y0KZZ J45909 F419
;
data medical_records;
set have;
    array proc_code[2] $7 proc1-proc2;    
    array diag_code[3] $10 diag1-diag2;   
    do i = 1 to 2;
 if proc_code [i] in ("H0ZXPZZ", "D0Y0KZZ") then valid_proc=1; else valid_proc=0;
end;
do i = 1 to 3;
if diag_code [i] in ("F1020") then valid_diag=1; else valid_diag =0;
end;
    drop i;
run;
Expected outcome
| Patient_id | proc1 | proc2 | diag1 | diag2 | diag3 | valid_proc | valid_diag | 
| 1 | A0H7XJZ | B4JZZZZ | F1020 | Z864 | Z125 | 0 | 1 | 
| 2 | H0ZXPZZ | K4MZZZZ | E669 | Z733 | J492 | 1 | 0 | 
| 3 | C02DXZZ | D0Y0KZZ | J45909 | F419 | F1020 | 1 | 1 | 
At a minimum remove the ELSE clause. With that the result is always just going to reflect the last variable in the array. Set the flag(s) to FALSE before the loop(s).
Or just set the flag to the result of the test and stop checking once you find a match.
data medical_records;
  set have;
  array proc_code proc1-proc2;    
  array diag_code diag1-diag2;  
  do i = 1 to dim(proc_code) until(valid_proc);
    valid_proc = proc_code[i] in ("H0ZXPZZ" "D0Y0KZZ");
  end;
  do i = 1 to dim(diag_code) until(valid_diag);
    valid_diag = diag_code[i] in ("F1020") ;
  end;
  drop i;
run;
It is just a querying issue. You could get this by PROC FORMAT or Hash Table.
  data have;
      input patient_id $ 
          proc1 $ 
          proc2 $ 
          diag1 $ 
          diag2 $;
    datalines;
001 A0H7XJZ B4JZZZZ F1020 Z864
002 H0ZXPZZ K4MZZZZ E669 Z733
003 C02DXZZ D0Y0KZZ J45909 F419
;
data _proc;
input _proc $;
cards;
H0ZXPZZ 
D0Y0KZZ
;
data _diag;
input _diag $;
cards;
F1020
;
run;
data want;
if _n_=1 then do;
 if 0 then set _proc _diag;
 declare hash p(dataset:'_proc');
 p.definekey('_proc');
 p.definedone();
 declare hash d(dataset:'_diag');
 d.definekey('_diag');
 d.definedone();
end;
set have;
array _p{*} proc:;
array _d{*} diag:;
valid_proc=0;
valid_diag =0;
do i=1 to dim(_p);
 if p.check(key:_p{i})=0 then valid_proc=1;
end;
do j=1 to dim(_d);
 if d.check(key:_d{j})=0 then valid_diag=1;
end;
drop _: i j;
run;It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
