08-25-2015 04:35 PM
I am comparing two large datasets that should be identical. I was just told by the client that they want an output that will only be generated if there are exceptions. There are 66 variables but there could be more in other similar tests, there are character and numeric variables. The only thing I can think of is to merge on id and date and check each variable. If I add Prev_ to the variable name and then merge I can check each one but that seems horrible. The variable names are not consistent in any way. I'm not sure what additional information to put, just looking for any better solutions.
08-25-2015 05:21 PM
Start with Proc Compare and options nodate novalues. You will get a summary of common variable names with differing characteristics, variables that exist in only one data set, and if the sets are sorted in similar fashion a summary of differing pairs which are marginally useful.
Further step would be using the ID option which specifies variables used to identify the records so you get better matches, and WITH and VAR statements to provide a list variable pairs;
VAR Id date amount;
WITH CustId PurchaseDate PurchaseValue;
would compare values of Id with CustId, date with PurchaseDate and Amount with PurchaseValue.
And if you have similar "candidates" for comparison you do something like
Var Date Date;
With PurchaseDate PaidDate;
It will be an iterative process, so don't expect to get it in one pass. And custom writing your proposed code might take quite a bit longer.