DATA Step, Macro, Functions and more

Merging & comparing datasets

Reply
Occasional Learner
Posts: 1

Merging & comparing datasets

[ Edited ]

Hi everyone, 

 

I'm brand new to SAS, so please bear with me if I'm missing something obvious. I've got two datasets, each with patient data, and I need to identify discrepancies between them. The first dataset is structured horizontally, starting with a unique patient ID, and a list of dates corresponding to clinic visits. This is the "gold standard" dataset. There's one row per ID: 

Patient_ID      Date1          Date2     

      24          12/2/15      12/23/15        

      25          12/2/15      12/23/15        

The other dataset is structured vertically, and contains information on whether or not forms were filled out at each visit. Each Patient_ID is listed multiple times to denote multiple visits. The "Date of Visit" variable values should correspond to values in the "Date" variables in the first dataset, but I've been told that this latter dataset may have extra or missing entries compared to the first. 

Patient_ID      Date of Visit         Form1      Form2     Form3    

      24                12/2/15              Yes           No            Yes           

      24                12/23/15            Yes           Yes           Yes 

      25                12/2/15              Yes           No            Yes

      25                12/23/15            Yes           Yes           Yes           

My first thought was to try to merge the datasets in such a way that I could create an indicator variable to identify instances where date of visit did not = corresponding date from dataset one, but honestly, I can't quite wrap my head around how to structure the merged dataset in such a way that it retains all the necessary data. 

 

Any ideas would be greatly appreciated, thank you!

Super User
Posts: 3,857

Re: Merging & comparing datasets

[ Edited ]
Posted in reply to alex_garcia

I think your best approach is to transform your first dataset so it looks like your second one. That is so you have one row per patient ID per visit date. Then you can merge both tables by patient ID and visit date and where they match, they agree on visit date, and where they don't match they don't agree on visit date. This sample code should enable you to transform the first dataset (replace xx with your highest date number):

data want1;
  drop date1 - datexx;
  array visit_dates (*) date1 - datexx;
  set have1;
  do i = 1 to dim(visit_dates);
  if visit_dates(i) ne . then visit_date = visit_dates(i);
  output;
  end;
run;

data want2;
  merge want1 (in = a)
             want2 (in = b);
  by patient_id visit_date;
  if a and b then Match = 'Y'
  else Match = 'N';
run;

 

PROC Star
Posts: 1,583

Re: Merging & comparing datasets

Posted in reply to alex_garcia

Can you also post a sample of your output? please

Super User
Posts: 6,632

Re: Merging & comparing datasets

Posted in reply to alex_garcia

As far as the result goes, since you haven't specified, I'm going to assume that each observation in the second data set should be identified as to whether the date matches or not.

 

We don't exactly know whether your date variables are actual SAS dates or character strings, but let's at least assume that the format is consistent from one data set to the next.

 

Assuming that both data sets are sorted by Patient_ID, you could use:

 

data want;

merge have1 have2 (in=keepme);

by Patient_ID;

if keepme;

if date_of_visit=date1 or date_of_visit=date2 then match='Y';

else match='N';

drop date1 date2;

run;

 

If you actually have a few more than 2 dates to compare, this structure to the program will work well enough.  Once you get to many more dates, you could switch over to arrays instead.  But for someone who is brand new to SAS, this program should be a good starting point.

Frequent Contributor
Posts: 109

Re: Merging & comparing datasets

[ Edited ]
Posted in reply to alex_garcia
/*This works*/
data have_gold;
input Patient_ID:best12.      Date1:mmddyy8.          Date2:mmddyy8.;
format date1 date2 date9.;     
datalines;
24 12/02/15 12/23/15
25 12/02/15 12/23/15
26 12/03/15 12/24/15
;
run;  

data have_log;     
input Patient_ID:best12. Date_of_Visit:mmddyy8. Form1:$10. Form2:$10. Form3:$10.;     
format Date_of_Visit date9.;      
datalines;    
24 12/02/15 Yes No  Yes    
24 12/23/15 Yes Yes Yes    
25 12/02/15 Yes No  Yes    
26 12/24/15 No  Yes No     
;    
run;    

proc transpose data=have_gold out=have_gold_trans(drop=_name_ rename=(col1=Date_of_Visit));
by Patient_ID;
run;

proc sort data=have_gold_trans;
by Patient_ID Date_of_Visit;
run;

proc sort data=have_log;
by Patient_ID Date_of_Visit;
run;

data want(drop=Form:);
merge have_gold_trans(in=a) have_log(in=b);
by Patient_ID Date_of_Visit;
if a and not b then output;
run;
Ask a Question
Discussion stats
  • 4 replies
  • 160 views
  • 0 likes
  • 5 in conversation