Thank you for the response. I have attached an excel sheet. I Highlighted a couple of examples, Pt_ID 3 and 4, the second test for Pt_ID 3 and Pt_ID 4 are duplicates and I need to tag them as duplicates. *IDENTIFY THE 14-DAY BLOCKS (I USED ACTUAL DATES INSTEAD OF HOURS);
data blood_cultures_blocks;
format block 3.0 lag_dt datetime.;
set Inp_BC_Postv;
by Pt_ID;
lag_dt = lag(collection_date);
retain lag_dt;
if first.Pt_ID then do;
block = 1;
lag_dt = .;
end;
else if datepart(collection_date) - datepart(lag_dt) > 14 then block + 1;
run;
proc sort data=blood_cultures_blocks;
by PT_ID block;
run;
data blood_cultures_collapsed;
set blood_cultures_blocks;
by Pt_ID block;
lag_dt = lag(collection_date);
array bugs[30] $30;
array print_name[5] $30 print_name1-print_name5;
retain bugs lag_dt;
if first.Pt_ID then do;
call missing(of bugs[*]);
a = 1;
end;
if datepart(collection_date) - datepart(lag_dt) > 14 then do;
call missing(of bugs[*]);
end;
do i = 1 to 5;
if print_name[i] not in bugs then do;
bugs[a] = print_name[i];
a + 1;
end;
end;
run; 1) Does "duplicate" mean that all organisms match? Any match? More than one if more than one is present but not all match? For this, Any match is what I am looking for, i.e. if 1st Test for Pt1 has Staphylococcus and Streptococcus, and the 2nd Test for the same pt has Streptococcus within 14 days, then 2nd test is a duplicate test. However, if second test as a separate organism E. Coli along with Streptococcus, then I would consider it as a separate test. 2) Spelling. Have you checked to see that the names of these organisms are spelled the same consistently? "Matching" usually implies values are the same and if you have different people doing data entry you may have quite different spellings if one uses "E Coli", another "Escherichia coli" and a third "E coli". I am sure you can think of potential others. Spell check and other naming conventions are taken care off. 3) Does duplicate mean Printname1 has to be the same as Printname1 or can Printname1 match Printname3 at the other time point as a duplicate? Printname1 can match Printname3. I need to check all combinations from printname1-5. 4) before proceeding at all you want to make sure the dates are SAS date values, numeric with an appropriate format for people to recognize the values. There are several functions that are specifically designed to calculate intervals between dates. I have Date_Culture as date variable only, so looking for 14 days difference (not worried about exact DateTime).
... View more