I have the following scenario with two data sets. I have to find the proportion of persons who tested within 5 to 10 days from exposure date. I deduplicated the datasets, merged them then found the diff_date and flagged it as follows. In the end, I found the percentage of date difference in two ways. What I'm trying to do is get a finer output report by using a better syntax. data Exposure; input Subject_ID Exp_Date : mmddyy10. Name $14-22 Case_ID Exposure $; format exp_date date9.; datalines; 1 12/20/2019 Joe Smith 122 Home 1 09/30/2019 Joe Smith 435 Home 2 01/05/2020 Mary Mill 432 Work 3 11/04/2019 John Doe 129 Work 3 11/04/2019 John Doe 655 School 3 01/25/2020 John Doe 721 Work ;;; run; proc sort data = work.exposure out=exp_nodups nodupkey; by Subject_ID Name Exp_date; run; data Testing; input Subject_ID Test_Date : mmddyy10. Name $14-23; format test_date date9.; datalines; 1 12/29/2019 Joe Smith 2 09/30/2019 Mary Mill 2 01/07/2020 Mary Mill 8 06/30/2019 Mark Roger 8 07/15/2019 Mark Roger 8 08/29/2019 Mark Roger 8 12/03/2019 Mark Roger ;;;; run; proc sort data = work.testing out=test_nodups nodupkey; by Subject_ID Name test_date; run; data exp_; set exp_nodups; day5 = exp_date + 5; day10 = exp_date + 10; run; Proc SQL; Create Table Merge AS Select coalesce(E.Name, T.Name) as Name, E.Exp_Date,E.Case_ID,E.Exposure,day5,day10, coalesce(E.Subject_ID, T.Subject_ID) as Subject_ID, T.Test_Date from exp_ E FULL JOIN test_nodups T on (E.Subject_ID = T.Subject_ID); quit; data merged; set merge; dif_days = intck('day', exp_date, test_date); flag = day5<=test_date<=day10; run; proc freq data = merged; tables flag/norow nocol out= want; title 'Proportion of contacts tested in the 5-10 day window of their quarantine period'; run; proc SQL; Create Table proportion AS Select mean(((dif_days>=5)*(dif_days<=10))) as contact_proportion from merged; quit; Thank you
... View more