You joined on the wrong fields initially. When you merge you need to understand how to uniquely identify records and you didn't identify a unique record correctly, which is why it was a many to many merge. With the new key, date, you've uniquely identified each record.
I suggest taking a smaller subset, merging it and looking at some records manually to understand how this happened.
All the explaining we do won't be as effective as you figuring out how to test it and look at it.
It helps if you always try to understand the problem contextually before you program it and of course, know thy data is the golden rule of analysis.
Something from Banking Domain:
1. Policy are renewed all the time, and in database we create a new record with a new start date for the renewed policy number.
2. IN Database we see Policy as Service, so we usually have a service serno assigned to each record which is usually the Primary key.
3. If you do not know the Primary key then you can certainly use Policy number, Start date/End Date and policy type together for an accurate result.
You start with :
FEB_NBRS_WITHOUT_DT -> 5,918,065
SOURCE_33_COUNT -> 5,902,253
Your assumption: 5, 918,065 - 5902, 253 = 15,812 records that are not in the first data set.
The total number of records remains 5,918,065, of those:
I suggest you post some sample data, smaller fake data so we can illustrate how this can happen, but basically when you have a many to many merge SAS doesn't merge properly and you need to use a SQL merge instead to fix this.
Can you please add some context to that statement. Pretend I can’t see your computer, data, or code and have no idea what you’re talking about.
@Babloo wrote:
5905885 is less than the number of records in source_33_count dataset. How
it is possible?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.