DATA Step, Macro, Functions and more

How to find the Mismatched records?

Accepted Solution Solved
Reply
PROC Star
Posts: 633
Accepted Solution

How to find the Mismatched records?

I ran the following code to find the records which are available in A and not in B. I've 5918065 records in A dataset and 5902253 records in B dataset and I want to see the missing 15812 records. After running this code, I see only a 12180 records in 'Non_Matching_Feb_2017' dataset instead of 15812. Could someone of you help me with the code to see the missing 15812 records in Dataset B? 

  

Data Matching_Feb_2017 Non_Matching_Feb_2017;

merge Feb_nbrs_without_dt(in=a) source_33_count(in=b);

by POLICY_No;

if a and b then output Matching_Feb_2017;

if a and not b then output Non_Matching_Feb_2017;

run;

 

Log:

  

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 5918065 observations read from the data set WORK.FEB_NBRS_WITHOUT_DT.
NOTE: There were 5902253 observations read from the data set WORK.SOURCE_33_COUNT.
NOTE: The data set WORK.MATCHING_FEB_2017 has 5905885 observations and 34 variables.
NOTE: The data set WORK.NON_MATCHING_FEB_2017 has 12180 observations and 34 variables.

 


Accepted Solutions
Solution
‎03-05-2018 12:33 AM
Super User
Posts: 6,921

Re: How to find the Mismatched records?

Actually, the 12,180 number is correct.  But other issues require attention.

 

First, POLICY_NO is not unique.  It is not unique in the data set A, and not unique in data set B.  That's what this line is telling you:

 

NOTE: MERGE statement has more than one data set with repeats of BY values.

 

Maybe that means you have to fix the data first.  Maybe that means you should be merging by more than one variable instead of just by POLICY_NO.  Maybe that means you need to adjust your expectations about the number of observations.  Ironically, however, the 12,180 number is correct.  There are 12,180 observations where the POLICY_NO appears in A but not in B.  That might represent 12,180 unique policy numbers, or it might represent 1 POLICY_NO that appears 12,180 times in the data set A (or some "in between" mix).  You would need to inspect the output data set to determine that.

 

Take this example:  POLICY_NO "ABC" appears 1,000 times in data set A, but not at all in data set B.  There would be 1,000 more observations in data set A than in data set B.  But only one POLICY_NO that is mismatched.

 

 

View solution in original post


All Replies
Contributor VDD
Contributor
Posts: 27

Re: How to find the Mismatched records?

did you check to see if you have records in b that are not in a?

 

PROC Star
Posts: 633

Re: How to find the Mismatched records?

Done already. it's 0 observations.

PROC Star
Posts: 624

Re: How to find the Mismatched records?

 

Hi,

 

If Table A has 5,918,065 and Table B has 5,902,253 records why are you expecting that all records from table B are present in Table A.

In your condition your are saying that get all the records if Policy_no is present in Table A and Not present in Table B. 

Thanks,
Suryakiran
PROC Star
Posts: 633

Re: How to find the Mismatched records?

Posted in reply to SuryaKiran

I'm not excepting all records from table B should be present in Table A. I would like to see the records which are available in A dataset and not in B dataset irrespective of any condition. Could you please help with the SQL to achieve this?

Contributor VDD
Contributor
Posts: 27

Re: How to find the Mismatched records?

proc sql;

create table abc as

select distinct * from a

where policy_no not in

(select distinct policy_no

from b);

quit;

PROC Star
Posts: 624

Re: How to find the Mismatched records?

Did you check for duplicate values in the data-sets?

Thanks,
Suryakiran
PROC Star
Posts: 633

Re: How to find the Mismatched records?

Posted in reply to SuryaKiran
Yes, there are duplicates but still it is not matching the number what I
expect. I'm not certain whether changing the by variable will work here.
PROC Star
Posts: 624

Re: How to find the Mismatched records?

[ Edited ]

Change the BY variable so that you have all unique values in both the tables. If you don't get what you expected then you may have duplicates through all the by vars.

 

check id proc sort nodupkey removes any duplicates in both the tables for the BY variables.

 

Why are you worried about the counts, the process your trying is giving you the right output and you may not be right about the count your expecting because you are including the duplicate records.

 

Thanks,
Suryakiran
Solution
‎03-05-2018 12:33 AM
Super User
Posts: 6,921

Re: How to find the Mismatched records?

Actually, the 12,180 number is correct.  But other issues require attention.

 

First, POLICY_NO is not unique.  It is not unique in the data set A, and not unique in data set B.  That's what this line is telling you:

 

NOTE: MERGE statement has more than one data set with repeats of BY values.

 

Maybe that means you have to fix the data first.  Maybe that means you should be merging by more than one variable instead of just by POLICY_NO.  Maybe that means you need to adjust your expectations about the number of observations.  Ironically, however, the 12,180 number is correct.  There are 12,180 observations where the POLICY_NO appears in A but not in B.  That might represent 12,180 unique policy numbers, or it might represent 1 POLICY_NO that appears 12,180 times in the data set A (or some "in between" mix).  You would need to inspect the output data set to determine that.

 

Take this example:  POLICY_NO "ABC" appears 1,000 times in data set A, but not at all in data set B.  There would be 1,000 more observations in data set A than in data set B.  But only one POLICY_NO that is mismatched.

 

 

PROC Star
Posts: 633

Re: How to find the Mismatched records?

Posted in reply to Astounding
In order to get the number what I expect, can I merge it with all the
variables?
Super User
Posts: 23,980

Re: How to find the Mismatched records?

Data Matching_Feb_2017 Non_Matching_Feb_2017 CHECK_DATA;

merge Feb_nbrs_without_dt(in=a) source_33_count(in=b);

by POLICY_No;

if a and b then output Matching_Feb_2017;

if a and not b then output Non_Matching_Feb_2017;

if b and not a then output check_data; run;

 

Post the log from the code above. 

 

Second - this type of message usually means your join needs another variable. So which SET of variables uniquely identify each record, it's clearly not just Policy_NO. If it's insurance policy there's usually a start or end date or renewal number that can be used as well to join because my policy number doesn't change every year, but it has a new entry in the data every year. 

 

PROC Star
Posts: 633

Re: How to find the Mismatched records?

Here you go:

 

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 5918065 observations read from the data set WORK.FEB_NBRS_WITHOUT_DT.
NOTE: There were 5902253 observations read from the data set WORK.SOURCE_33_COUNT.
NOTE: The data set WORK.MATCHING_FEB_2017 has 5905885 observations and 34 variables.
NOTE: The data set WORK.NON_MATCHING_FEB_2017 has 12180 observations and 34 variables.
NOTE: The data set WORK.CHECK_DATA has 0 observations and 34 variables.
PROC Star
Posts: 633

Re: How to find the Mismatched records?

With the below code, I'm getting the desired number in the output. Now could you please help me how can I say to the layman why 15812 records are not matching in 'Non_Matching_Feb_2017' dataset? Since I'm new to the banking and insurance domain I find difficult to convey my output verbally to the layman.

 

 

Data Matching_Feb_2017 Non_Matching_Feb_2017;

merge Feb_nbrs_without_dt(in=a) source_33_count(in=b);

by POLICY_No PREMIUM_BEG_DT;

if a and b then output Matching_Feb_2017;

if a and not b then output Non_Matching_Feb_2017;

/*if b and not a then output Non_Matching_Dec_2016;*/

run;

 

Contributor VDD
Contributor
Posts: 27

Re: How to find the Mismatched records?

during Feb 15812 records where not found with a policy number and a premium begin date, This affected 12180 policies.
think this states your meaning.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 19 replies
  • 248 views
  • 7 likes
  • 6 in conversation