BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cathryn
Calcite | Level 5

Hi,

I am having issues with the standardization node in DF. Names like Jo Ann or Mary Jo are standardizing as Ann Jo and Jo Mary, it doesn't seem to matter which scheme I use with the definition. We are still using CI 26 by the way. DM Studio 2.6 on SAS 9.2.

 

I have had similar issues in the past  (since we upgraded to 2.6) with addresses standardizing oddly e.g. 'Meadow Farm Road' standardizing as 'Meadow Road (Farm)' or 'PO BOX' as' Box (PO)'. The PO addresses I just send through a branch that only does POs  using a definition and schema . Using a schema seems to take care of the other weirdness.

 

Can anyone tell me why this is happening or what I can do to stop it! Do you need more info to answer?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
RonAgresta
SAS Employee

I think I understand now what you are seeing. Some hints:

 

  • You can certainly apply individual standardization schemes but many of them are used in specific ways inside standardization definitions. Standardization definitions incorporate things like casing, parsing, and regular expressions in addition to standardization schemes to correct your data. If a definition exists for the data type you are working with, it's generally better to use a definition over a scheme.
  • In the case of your example, becuase you are attempting to standardize just given name information, you will want to use the Standardization (Parsed) node. In it you can map your given name to the expected Given Name token and then appropriate standardization will be done for the given name. In some cases, when you send just a given name into the "Name" definition (which is really built for full names), the definition will reorder compound given names, thinking a surname is involved somehow.

 

I attached some screenshots to show how using a standardization definition with the Standardization (Parsed) node will get you the results you are looking for.


Ron


S1.pngS2.pngS3.png

View solution in original post

6 REPLIES 6
RonAgresta
SAS Employee

Hi,

 

I'm curious to know what your expectations are when you standardize people's names. Are you trying to get the casing correct or standardizing prefixes or suffixes like Mister/Mr or JR/Jr.?

 

To help answer your question, can you share the following:

 

  • QKB version
  • QKB definition in use
  • Sample input value
  • Expected output value

Ron

Cathryn
Calcite | Level 5

Hi, Thanks for the response.

 

I am really just expecting to change case, I'd really like to be able to correct typos like Elizabth -> Elizabeth, but was not really expecting that to work. I am expecting LOU ANN to become Lou Ann or LOU ANN not Ann Lou. It does not do this on all compound type names, just some.

 

I am using QKB C1 26. Definition is NAME for all, schemes are:

1. EN Given Name Common Compound (Matching-Low Sens..)

2. EN Given Name Common Compounds (Matching)

3. EN Given Name Spelling

4. ENUSE Given Name Spelling (wtih Freq)

5. EN Given Names Propsercase

6. EN Given Name (Matching-Combination Matching)

 

 

Here is a sample of of what I am seeing:

First column is the incoming name to be standardised.

 

Laura BethLaura BethLaura BethLaura BethLaura BethLaura BethLaura Beth
Lee AnnAnn LeeAnn LeeAnn LeeAnn LeeAnn LeeAnn Lee
Lee AnneAnne LeeAnne LeeAnne LeeAnne LeeAnne LeeAnne Lee
Lila BethLila BethLila BethLila BethLila BethLila BethLila Beth
Lily BelleLily BelleLily BelleLily BelleLily BelleLily BelleLily Belle
Liu XiangXiang LiuXiang LiuXiang LiuXiang LiuXiang LiuXiang Liu
Lou AnnAnn LouAnn LouAnn LouAnn LouAnn LouAnn Lou
Mary AliceAlice MaryAlice MaryAlice MaryAlice MaryAlice MaryAlice Mary

Thanks,

Cathryn

RonAgresta
SAS Employee

I think I understand now what you are seeing. Some hints:

 

  • You can certainly apply individual standardization schemes but many of them are used in specific ways inside standardization definitions. Standardization definitions incorporate things like casing, parsing, and regular expressions in addition to standardization schemes to correct your data. If a definition exists for the data type you are working with, it's generally better to use a definition over a scheme.
  • In the case of your example, becuase you are attempting to standardize just given name information, you will want to use the Standardization (Parsed) node. In it you can map your given name to the expected Given Name token and then appropriate standardization will be done for the given name. In some cases, when you send just a given name into the "Name" definition (which is really built for full names), the definition will reorder compound given names, thinking a surname is involved somehow.

 

I attached some screenshots to show how using a standardization definition with the Standardization (Parsed) node will get you the results you are looking for.


Ron


S1.pngS2.pngS3.png
Cathryn
Calcite | Level 5

Thank you. That worked. I should have joined this forum long ago, our support people took 2 weeks and told me the wrong answer!

 

Cathryn
Calcite | Level 5

One more little thing though, some non-English-y names still come out with odd capitalization, e.g. Salah al din ->Salah AL D I N ,

Yu Ku -> Yu K U but not Yu Huan,  and Bat Yam -> B A T Yam.  I can fix this down the road with an extra step if I have to; but I am wondering if there is a better way to deal with these sorts of name? Thanks again.

RonAgresta
SAS Employee

We're getting into advanced topics now!

 

There's a component in DM Studio called Customize. Using that, you can see where the standardization definition transformation is making the change you see. In the case of "Salah al din" for example, there's a standardization scheme called "EN Given Names (Abbreviations Standardization)" that is being applied to the name. It takes "din" and changes it to "D I N" for some (I'm sure a very good) reason (at least in most cases). So to change the behavior, you could modify the standardization definition and remove the scheme altogether (not advised) or you could edit the scheme to adjust this behavior by removing the transformations that don't make sense for your scenario. Back up the scheme file and definition first if you plan to make changes.

 

Attached images show the step in Customize that made the unwanted change and the scheme value itself.

 

Ron


S4.pngS5.png

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2011 views
  • 1 like
  • 2 in conversation