DATA Step, Macro, Functions and more

Checking Inconsistency over two data sets

Posts: 20

Checking Inconsistency over two data sets

Hello All, 


I want to check consistency between two data sets, Suppose I receive data sets month on month I want to compare  the present received data sets with the previous received on following attributes

1. To check if there is attribute named 'gender' & 'DOB' for a user and it is consistent in both the months,


The issue over here is for some of the DOB fields for a few users it is been changed from DDMMYY to MMDDYY, suppose user 1 had 12.08.1976 in 2014M02 data now it has changed to 08.12.1976 in 2014M03, I need to Identify those users which have inconistency in 2014M03 data compared with 2014M02 ( the attributes in 2014M02 is correct)


I tried with subquery using proc sql, also some proc compare, However it was not a good result.


Can anyone help me on this.


Thanks in advance






Thanks in advance



Valued Guide
Posts: 558

Betreff: Checking Inconsistency over two data sets

Please post example input and desired output dataset/report.


I don't see a reason why this problem could not be solved by proc sql.

The changed date-format should not cause any trouble, if the dates are dates and not strings.

Posts: 20

Betreff: Checking Inconsistency over two data sets

Posted in reply to andreas_lds

Hello Andres,


I was able to solve using sql, However was not too sure anout the accuray of results I got.


In the attachment, you can see DOC is changed between two data sets for the same user, additonall challenge would be if there is any mismatch for other date attributes or other attributes.


I need to identify only those users who have this descrpency.



Super User
Super User
Posts: 9,599

Re: Checking Inconsistency over two data sets

Totally agree with @andreas_lds here.  Look at your process, identify the key merge variables i.e subject id.  These are the join on values, the  other variables would be the where not equals:

proc sql;
  create table WANT as
  select ...
  from   BASETABLE
  full join COMPARETABLE
  on      <id variables>=<id variables>
  and    <non id values> ne <non id variables>;

You could also do two excepts, i.e. where in a except those in b, and in b except those in a.  Some test data would visualise better.

Ask a Question
Discussion stats
  • 3 replies
  • 3 in conversation