03-03-2018 06:45 PM - edited 03-03-2018 06:46 PM
I'm brand new to SAS, so please bear with me if I'm missing something obvious. I've got two datasets, each with patient data, and I need to identify discrepancies between them. The first dataset is structured horizontally, starting with a unique patient ID, and a list of dates corresponding to clinic visits. This is the "gold standard" dataset. There's one row per ID:
Patient_ID Date1 Date2
24 12/2/15 12/23/15
25 12/2/15 12/23/15
The other dataset is structured vertically, and contains information on whether or not forms were filled out at each visit. Each Patient_ID is listed multiple times to denote multiple visits. The "Date of Visit" variable values should correspond to values in the "Date" variables in the first dataset, but I've been told that this latter dataset may have extra or missing entries compared to the first.
Patient_ID Date of Visit Form1 Form2 Form3
24 12/2/15 Yes No Yes
24 12/23/15 Yes Yes Yes
25 12/2/15 Yes No Yes
25 12/23/15 Yes Yes Yes
My first thought was to try to merge the datasets in such a way that I could create an indicator variable to identify instances where date of visit did not = corresponding date from dataset one, but honestly, I can't quite wrap my head around how to structure the merged dataset in such a way that it retains all the necessary data.
Any ideas would be greatly appreciated, thank you!
03-03-2018 07:10 PM - edited 03-03-2018 07:12 PM
I think your best approach is to transform your first dataset so it looks like your second one. That is so you have one row per patient ID per visit date. Then you can merge both tables by patient ID and visit date and where they match, they agree on visit date, and where they don't match they don't agree on visit date. This sample code should enable you to transform the first dataset (replace xx with your highest date number):
data want1; drop date1 - datexx; array visit_dates (*) date1 - datexx; set have1; do i = 1 to dim(visit_dates); if visit_dates(i) ne . then visit_date = visit_dates(i); output; end; run; data want2; merge want1 (in = a) want2 (in = b); by patient_id visit_date; if a and b then Match = 'Y' else Match = 'N'; run;
03-03-2018 11:20 PM
As far as the result goes, since you haven't specified, I'm going to assume that each observation in the second data set should be identified as to whether the date matches or not.
We don't exactly know whether your date variables are actual SAS dates or character strings, but let's at least assume that the format is consistent from one data set to the next.
Assuming that both data sets are sorted by Patient_ID, you could use:
merge have1 have2 (in=keepme);
if date_of_visit=date1 or date_of_visit=date2 then match='Y';
drop date1 date2;
If you actually have a few more than 2 dates to compare, this structure to the program will work well enough. Once you get to many more dates, you could switch over to arrays instead. But for someone who is brand new to SAS, this program should be a good starting point.
03-04-2018 02:00 AM - edited 03-04-2018 02:09 AM
/*This works*/ data have_gold; input Patient_ID:best12. Date1:mmddyy8. Date2:mmddyy8.; format date1 date2 date9.; datalines; 24 12/02/15 12/23/15 25 12/02/15 12/23/15 26 12/03/15 12/24/15 ; run; data have_log; input Patient_ID:best12. Date_of_Visit:mmddyy8. Form1:$10. Form2:$10. Form3:$10.; format Date_of_Visit date9.; datalines; 24 12/02/15 Yes No Yes 24 12/23/15 Yes Yes Yes 25 12/02/15 Yes No Yes 26 12/24/15 No Yes No ; run; proc transpose data=have_gold out=have_gold_trans(drop=_name_ rename=(col1=Date_of_Visit)); by Patient_ID; run; proc sort data=have_gold_trans; by Patient_ID Date_of_Visit; run; proc sort data=have_log; by Patient_ID Date_of_Visit; run; data want(drop=Form:); merge have_gold_trans(in=a) have_log(in=b); by Patient_ID Date_of_Visit; if a and not b then output; run;