topic Re: De-duplication in Data management Studio in SAS Data Management

De-duplication in Data management Studio

Shakti_Sourav — Thu, 18 Aug 2022 11:31:08 GMT

Dear Team,

I am facing one challenge that I have to find out duplicate records based on Age variable. Age variable should be in one particular range.

Sample:

If beneficiaries's age is 29 and he/she trying to apply further by the different age like 25 or 34. I just want to de duplicate the data in Data Management Studio by the Age +5 largest and -5 smallest.

My Question is : How to Declare de duplication based on Age ?

2. Is it possible to take Age in Match codes ?if yes, then which definition and sensitivity suitable ?

3. how to define Age like +5 and -5 ?

Re: De-duplication in Data management Studio

audrey — Thu, 18 Aug 2022 11:57:04 GMT

Hi,

There are no definition in the QKB to match and cluster records by age.

Also, I thought about your use case and I think it's not a good solution. Let me explain what could happen:

-> customer A is 20 and matches customer B who is 25,

-> but customer C is 30, and therefore matches customer B,

-> and, customer D is 35 and therefore matches customer C.

In the end, A matches D, and you'll end up one big cluster with all of your records.

So I think age should not be used this way. Maybe there are other criteria in your data that would be better.

Hope this helps.

Audrey