Is there a way for sas to output what records didn't match and why?

jmarnaez · Posted 02-01-2022 01:15 PM

For my work I've been asked to check the data quality of older legacy records against another separate database as there has been concerns on the quality of information when those were brought into the current system. I've currently mostly done match merging based on selected identifiers to provide the team with records that are consistent between the two datasets and separate out records that are failing on one of that list but I've been asked if it's possible to identify where the mismatch in records is occurring as some look like they should be merging properly but aren't. I do not know if it's possible for sas to output it's decision making process during the match but if so that would be very helpful.

Quentin · Posted 02-01-2022 01:58 PM

Can you show the code you are using for the match merging?

If you are using a data step with a MERGE, it's pretty straight forward to output the records that don't match.

I'm not sure what you mean by "why" or the "decision-making process." If you've got a simple match-merge, the decision making process is, do they or do they not match on all the specified BY variables.

A small example of data you are comparing might also help.

jmarnaez · Posted 02-01-2022 05:15 PM

data merged;

set x y;

by a b c d e f;

if in1 and in2;

run;

This is the basic structure of the code I'm running. outputting the ones that don't match isn't the issue. I'm trying to see if I can get SAS to show where things are not matching for the records versus being able to tell that the unmatched records are failing at a, b, c, d, e, or f but not knowing without physically checking which one is the fault point. I'm giving proc compare a try as recommended below but it's tended to crash the program as the files are several hundred thousand records each.

Reeza · Posted 02-01-2022 05:22 PM

Several hundred thousand records should not be an issue, unless you're using Academics on Demand.

You're not really merging here with a SET statement, you're interleaving. Are you sure you have the correct code?
What happens if you change SET to MERGE instead?

jmarnaez · Posted 02-01-2022 05:28 PM

Whoop that is meant to be merge. Thank you for noting that.

Quentin · Posted 02-01-2022 05:25 PM

I suggest you make a little example with, say 5-10 records in work.x and 5-10 records in work.y, with just three by-variables a b c.

Then think about how to operationalize the "fault point".

You would need a rule, like "if a record from work.x doesn't match to a record in work.y by A B C but it does match to a record by A B, then the fault point is C."

Kurt_Bremser · Posted 02-02-2022 03:53 AM

First, keep your non-matches:

data non-match;
merge
  x (in=in1)
  y (in=in2)
;
by a b c d e f;
if in1 ne in2;
run;

Then start by finding multiples when grouping along subsets of key variables:

proc sql;
create table try_b as
  select *, count(*) as count
  from non-match
  group by a,c,d,e,f
  having count > 1
;
quit;

See if this leads you somewhere.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Quentin · Posted 02-02-2022 07:06 AM

I like @Kurt_Bremser 's approach for getting sense of your data, and starting to think about why records don't match.

I think another way to approach this would be as a fuzzy-merging problem. So you would take the mismatches from x and the mismatches from y, and then try to join them by fuzzy-merging by a,b,c,d,e,f. If you go to lexjansen.com and look up fuzzy merge or fuzzy join, you should get plenty of ideas.

Reeza · Posted 02-01-2022 02:20 PM

PROC COMPARE may be what you're looking for with a BY statement.

Is there a way for sas to output what records didn't match and why?

Re: Is there a way for sas to output what records didn't match and why?

Re: Is there a way for sas to output what records didn't match and why?

Re: Is there a way for sas to output what records didn't match and why?

Re: Is there a way for sas to output what records didn't match and why?

Re: Is there a way for sas to output what records didn't match and why?

Re: Is there a way for sas to output what records didn't match and why?

Re: Is there a way for sas to output what records didn't match and why?

Re: Is there a way for sas to output what records didn't match and why?

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away