02-10-2016 01:16 AM
I want to check consistency between two data sets, Suppose I receive data sets month on month I want to compare the present received data sets with the previous received on following attributes
1. To check if there is attribute named 'gender' & 'DOB' for a user and it is consistent in both the months,
The issue over here is for some of the DOB fields for a few users it is been changed from DDMMYY to MMDDYY, suppose user 1 had 12.08.1976 in 2014M02 data now it has changed to 08.12.1976 in 2014M03, I need to Identify those users which have inconistency in 2014M03 data compared with 2014M02 ( the attributes in 2014M02 is correct)
I tried with subquery using proc sql, also some proc compare, However it was not a good result.
Can anyone help me on this.
Thanks in advance
Thanks in advance
02-10-2016 04:02 AM
Please post example input and desired output dataset/report.
I don't see a reason why this problem could not be solved by proc sql.
The changed date-format should not cause any trouble, if the dates are dates and not strings.
02-10-2016 07:31 AM
I was able to solve using sql, However was not too sure anout the accuray of results I got.
In the attachment, you can see DOC is changed between two data sets for the same user, additonall challenge would be if there is any mismatch for other date attributes or other attributes.
I need to identify only those users who have this descrpency.
02-10-2016 06:25 AM
Totally agree with @andreas_lds here. Look at your process, identify the key merge variables i.e subject id. These are the join on values, the other variables would be the where not equals:
proc sql; create table WANT as select ... from BASETABLE full join COMPARETABLE on <id variables>=<id variables> and <non id values> ne <non id variables>; quit;
You could also do two excepts, i.e. where in a except those in b, and in b except those in a. Some test data would visualise better.