DATA Step, Macro, Functions and more

How to remove the names

Reply
New Contributor
Posts: 3

How to remove the names

Hi Team,

 

I need to remove the names if the name has a Same Sound ex option.

 

example:

 

SELECT

SOUND EX('Brendan') AS SOUNDEX1

,SOUND EX('Brendon') AS SOUNDEX2

,SOUND EX('Brandon') AS SOUNDEX3

,SOUND EX('Brenden') AS SOUNDEX4

 

FROM selectone

 

With Ur

 

Returns:

SOUNDEX1

SOUNDEX2

SOUNDEX3

SOUNDEX4

B653

B653

B653

B653


So, all the names are different, Updated code needs to pull all the names.

 

Kindly Suggest the Code to remove the names which are duplicate after using Sound ex keyword.

 

Or refer me any other keyword, that should pull the names if it is like Brandon or Brendon.

 

Thanks,

Braghadish

Trusted Advisor
Posts: 1,831

Re: How to remove the names

Posted in reply to braghadish

Try next code - 

data temp;
  set have;
       sndx = soundex(name); /* adding new variable - the soundex code */
run;

proc sort data=temp out=want nodupkey;
  by sndx;
run;
Super User
Super User
Posts: 9,427

Re: How to remove the names

Posted in reply to braghadish

A restructure of your data will simplify your coding vastly.  Go for something like (and I am just guessing your data as you havent provided any - see how I put test data below):

data have;
  length name $100;
  input name $;
datalines;
Brendan
Brendon
Brandon
Brenden
;
run;

data want;
  set have;
  length coded $100;
  coded=soundex(name);
run;

proc sort data=want nodupkey;
  by coded;
run;
New Contributor
Posts: 3

Re: How to remove the names

Thanks for your reply.

 

SELECT
SOUNDEX('Brendan') AS SOUNDEX1
,SOUNDEX('Brendon') AS SOUNDEX2
,SOUNDEX('Brandon') AS SOUNDEX3
,SOUNDEX('Brenden') AS SOUNDEX4

FROM selectone

With Ur

Returns:
SOUNDEX1    SOUNDEX2    SOUNDEX3    SOUNDEX4
B653    B653    B653    B653

 

This Query will have the Soundex value.

 

from your SAS Code, it will delete the three among four values right(B653).

 

So, I need all the names to be pulled out. whether it may be brandon or brendon or brenden or branden. few examples like Smith or Smithe.

 

All the names to be pulled out. do we have such type of code ??

 

Thanks!!

Super User
Super User
Posts: 9,427

Re: How to remove the names

Posted in reply to braghadish

Sorry, I have no idea what your trying to say.  Start by providing test data, in the form of a datastep, like I did in my code.  Then show what you want to see as output.

New Contributor
Posts: 3

Re: How to remove the names

The purpose of the duplicate logic is to bypass claims which have already been submitted/processed for manual review to limit the risk of duplicate claims loading to an account. We will be using SOUND EX to account for common variations or misspellings that a provider may have entered when checking against claims already submitted.

 

The SOUND EX function is a SQL function which converts an alphanumeric string into a 4 character code based on how the string sounds when spoken.


Here are some examples:

VAR1

SOUNDEX1

VAR2

SOUNDEX2

SOUNDEX_MATCH

Brenda

B653

Brendan

B653

MATCH

Edward

E363

Edwardo

E363

MATCH

Luis

L200

Lois

L200

MATCH

Philip

P410

Phillip

P410

MATCH

Steve

S310

Stephen

S315

MISMATCH

Steven

S315

Stephen

S315

MATCH

Thomas

T520

Tomas

T520

MATCH

Tod

T300

Todd

T300

MATCH

 

How does Soundex function Work?

   Let’s talk through this with Brenda / Brendan to see why this comes back as match

 

Step 1: Generates 4 character code- begins with first letter of word

                        E.g. Brenda    B-_ _ _ _

                               Brandan  B-_ _ _ _

Step 2: Removes A E I O U H W Y excluding the first character

                        E.g. Brenda   => Brnd

                               Brandon => Brndn

Step 3: Use the table to convert letters to numbers until you have.

Number

Represents the Letters

1

B, F, P, V

2

C, G, J, K, Q, S, X, Z

3

D, T

4

L

5

M, N

6

R

            E.G. BRND

                        B- R=6 N=5 D=3

                        Brenda Soundex = B653

                     BRNDN The N is ignored for Brandon because BRND already provides a 4 character string.

                       

                        Brendan Brendon Brandon and Brenden would all result in the same Soundex value.

 

                                                This Query:

WITH selectone as (/*This is just here so we can run the query and return one result*/SELECT PRSN_INTN_ID From PB09754E.PRSN WHERE PRSN_INTN_ID = 476690091)

 

SELECT

SOUNDEX('Brendan') AS SOUNDEX1

,SOUNDEX('Brendon') AS SOUNDEX2

,SOUNDEX('Brandon') AS SOUNDEX3

,SOUNDEX('Brenden') AS SOUNDEX4

 

FROM selectone

 

With Ur

 

Returns:

SOUNDEX1

SOUNDEX2

SOUNDEX3

SOUNDEX4

B653

B653

B653

B653

 

            If you don’t have all 4 characters use 0 (zero)

                        E.G. Tod => TD _ _

                                    T-D=3  Blank = 0  Blank = 0

 

I need the names which are different like above example..

 

Hope you got it ?

 

Thanks!!

Super User
Super User
Posts: 9,427

Re: How to remove the names

[ Edited ]
Posted in reply to braghadish

So, for example how to post a question;

I have the following dataset, I want to add columns for each soundex result and an overall column for if they all match or not.  My test data is:

data have;
  length var1 var1 $100;
  input var1 $ var2 $;
datalines;
Brenda Brendan
Edward Edwardo
;
run;

I want the output to look like:
VAR1  VAR2  SOUND1 SOUND2 MATCH

...

 

---

 

The answer is:

data want (drop=i j);
  set have;
  length match $100;
  array var{2};
  array coded{2} $100;
  do i=1 to 2;
    coded{i}=soundex(var{i});
  end;
  match="MATCH";
  do i=1 to 2;
    do j=i to 2;
      if coded{i} ne coded{j} then match="MISMATCH";
    end;
  end;
run;

(note you can shrink that code, but it shows the working).

 

 

 

 

 

 

 

 

No thats not what I meant.  I am asking you to provide test data, in the form of a datastep, and what you want to see out at the end, I know what the function does, what I do not know is the data you are working with, or what you want to output to look like.  Have a look at this post:
https://communities.sas.com/t5/Base-SAS-Programming/proc-sql-merge-from-begining-date-to-end-date/m-...

The poster shows a datastep with some data items, with that we can run it on our machines and then write code against that to provide a result which looks like the result requested (although in this post they didn't post what the output should look like).  So:

 

Question:

Test data:

data have;
  input ...;
datalines;
...
;
run;

Want:

...

Super User
Posts: 13,338

Re: How to remove the names

Posted in reply to braghadish

braghadish wrote:

The purpose of the duplicate logic is to bypass claims which have already been submitted/processed for manual review to limit the risk of duplicate claims loading to an account. We will be using SOUND EX to account for common variations or misspellings that a provider may have entered when checking against claims already submitted.

 


I hope that your entire organization is better at data entry for names than any of the ones I have worked for.

Or do you have a separate check for the equivalence of nicknames vs given names such as Bob Smith and Robert Smith? or someone who occasionally uses a middle name instead of first name? Or the same claim submitted by husband/wife/partner groupings?

 

I have to say I have loathed practically any process that involved name matching.

 

With that off my chest I will direct you to: https://www.cdc.gov/cancer/npcr/tools/registryplus/lp_tech_info.htm

This is a free tool developed by CDC  matching less than perfect data that will examine a number of data fields and provide probability of match information.

Trusted Advisor
Posts: 1,831

Re: How to remove the names

[ Edited ]

@braghadish, it is absolutely not clear what you want to achieve.

 

In your first post you wrote:  "I need to remove the names if the name has a Same Sound ex ".

Later you wrote:  " I need all the names to be pulled out".

        And          : "The purpose of the duplicate logic is to bypass claims which have already been submitted/processed".

 

It will be helpfull if you post an example of your claims data - few lines with the most relevant variables,

 and post the wanted output  for that example. 

 

The soundex function is just a tool that helps to match similar names. 

You have to clarify your input and wanted output dataset formats.

 

Ask a Question
Discussion stats
  • 8 replies
  • 155 views
  • 2 likes
  • 4 in conversation