Hi Team,
I need to remove the names if the name has a Same Sound ex option.
example:
SELECT
SOUND EX('Brendan') AS SOUNDEX1
,SOUND EX('Brendon') AS SOUNDEX2
,SOUND EX('Brandon') AS SOUNDEX3
,SOUND EX('Brenden') AS SOUNDEX4
FROM selectone
With Ur
Returns:
SOUNDEX1 | SOUNDEX2 | SOUNDEX3 | SOUNDEX4 |
B653 | B653 | B653 | B653 |
So, all the names are different, Updated code needs to pull all the names.
Kindly Suggest the Code to remove the names which are duplicate after using Sound ex keyword.
Or refer me any other keyword, that should pull the names if it is like Brandon or Brendon.
Thanks,
Braghadish
Try next code -
data temp;
set have;
sndx = soundex(name); /* adding new variable - the soundex code */
run;
proc sort data=temp out=want nodupkey;
by sndx;
run;
A restructure of your data will simplify your coding vastly. Go for something like (and I am just guessing your data as you havent provided any - see how I put test data below):
data have; length name $100; input name $; datalines; Brendan Brendon Brandon Brenden ; run; data want; set have; length coded $100; coded=soundex(name); run; proc sort data=want nodupkey; by coded; run;
Thanks for your reply.
SELECT
SOUNDEX('Brendan') AS SOUNDEX1
,SOUNDEX('Brendon') AS SOUNDEX2
,SOUNDEX('Brandon') AS SOUNDEX3
,SOUNDEX('Brenden') AS SOUNDEX4
FROM selectone
With Ur
Returns:
SOUNDEX1 SOUNDEX2 SOUNDEX3 SOUNDEX4
B653 B653 B653 B653
This Query will have the Soundex value.
from your SAS Code, it will delete the three among four values right(B653).
So, I need all the names to be pulled out. whether it may be brandon or brendon or brenden or branden. few examples like Smith or Smithe.
All the names to be pulled out. do we have such type of code ??
Thanks!!
Sorry, I have no idea what your trying to say. Start by providing test data, in the form of a datastep, like I did in my code. Then show what you want to see as output.
The purpose of the duplicate logic is to bypass claims which have already been submitted/processed for manual review to limit the risk of duplicate claims loading to an account. We will be using SOUND EX to account for common variations or misspellings that a provider may have entered when checking against claims already submitted.
The SOUND EX function is a SQL function which converts an alphanumeric string into a 4 character code based on how the string sounds when spoken.
Here are some examples:
VAR1 | SOUNDEX1 | VAR2 | SOUNDEX2 | SOUNDEX_MATCH |
Brenda | B653 | Brendan | B653 | MATCH |
Edward | E363 | Edwardo | E363 | MATCH |
Luis | L200 | Lois | L200 | MATCH |
Philip | P410 | Phillip | P410 | MATCH |
Steve | S310 | Stephen | S315 | MISMATCH |
Steven | S315 | Stephen | S315 | MATCH |
Thomas | T520 | Tomas | T520 | MATCH |
Tod | T300 | Todd | T300 | MATCH |
How does Soundex function Work?
Let’s talk through this with Brenda / Brendan to see why this comes back as match
Step 1: Generates 4 character code- begins with first letter of word
E.g. Brenda B-_ _ _ _
Brandan B-_ _ _ _
Step 2: Removes A E I O U H W Y excluding the first character
E.g. Brenda => Brnd
Brandon => Brndn
Step 3: Use the table to convert letters to numbers until you have.
Number | Represents the Letters |
1 | B, F, P, V |
2 | C, G, J, K, Q, S, X, Z |
3 | D, T |
4 | L |
5 | M, N |
6 | R |
E.G. BRND
B- R=6 N=5 D=3
Brenda Soundex = B653
BRNDN The N is ignored for Brandon because BRND already provides a 4 character string.
Brendan Brendon Brandon and Brenden would all result in the same Soundex value.
This Query:
WITH selectone as (/*This is just here so we can run the query and return one result*/SELECT PRSN_INTN_ID From PB09754E.PRSN WHERE PRSN_INTN_ID = 476690091)
SELECT
SOUNDEX('Brendan') AS SOUNDEX1
,SOUNDEX('Brendon') AS SOUNDEX2
,SOUNDEX('Brandon') AS SOUNDEX3
,SOUNDEX('Brenden') AS SOUNDEX4
FROM selectone
With Ur
Returns:
SOUNDEX1 | SOUNDEX2 | SOUNDEX3 | SOUNDEX4 |
B653 | B653 | B653 | B653 |
If you don’t have all 4 characters use 0 (zero)
E.G. Tod => TD _ _
T-D=3 Blank = 0 Blank = 0
I need the names which are different like above example..
Hope you got it ?
Thanks!!
So, for example how to post a question;
I have the following dataset, I want to add columns for each soundex result and an overall column for if they all match or not. My test data is:
data have; length var1 var1 $100; input var1 $ var2 $; datalines; Brenda Brendan Edward Edwardo ; run;
I want the output to look like:
VAR1 VAR2 SOUND1 SOUND2 MATCH
...
---
The answer is:
data want (drop=i j); set have; length match $100; array var{2}; array coded{2} $100; do i=1 to 2; coded{i}=soundex(var{i}); end; match="MATCH"; do i=1 to 2; do j=i to 2; if coded{i} ne coded{j} then match="MISMATCH"; end; end; run;
(note you can shrink that code, but it shows the working).
No thats not what I meant. I am asking you to provide test data, in the form of a datastep, and what you want to see out at the end, I know what the function does, what I do not know is the data you are working with, or what you want to output to look like. Have a look at this post:
https://communities.sas.com/t5/Base-SAS-Programming/proc-sql-merge-from-begining-date-to-end-date/m-...
The poster shows a datastep with some data items, with that we can run it on our machines and then write code against that to provide a result which looks like the result requested (although in this post they didn't post what the output should look like). So:
Question:
Test data:
data have; input ...; datalines; ... ; run;
Want:
...
@braghadish wrote:
The purpose of the duplicate logic is to bypass claims which have already been submitted/processed for manual review to limit the risk of duplicate claims loading to an account. We will be using SOUND EX to account for common variations or misspellings that a provider may have entered when checking against claims already submitted.
I hope that your entire organization is better at data entry for names than any of the ones I have worked for.
Or do you have a separate check for the equivalence of nicknames vs given names such as Bob Smith and Robert Smith? or someone who occasionally uses a middle name instead of first name? Or the same claim submitted by husband/wife/partner groupings?
I have to say I have loathed practically any process that involved name matching.
With that off my chest I will direct you to: https://www.cdc.gov/cancer/npcr/tools/registryplus/lp_tech_info.htm
This is a free tool developed by CDC matching less than perfect data that will examine a number of data fields and provide probability of match information.
@braghadish, it is absolutely not clear what you want to achieve.
In your first post you wrote: "I need to remove the names if the name has a Same Sound ex ".
Later you wrote: " I need all the names to be pulled out".
And : "The purpose of the duplicate logic is to bypass claims which have already been submitted/processed".
It will be helpfull if you post an example of your claims data - few lines with the most relevant variables,
and post the wanted output for that example.
The soundex function is just a tool that helps to match similar names.
You have to clarify your input and wanted output dataset formats.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.