Hi,
I am having issues with the standardization node in DF. Names like Jo Ann or Mary Jo are standardizing as Ann Jo and Jo Mary, it doesn't seem to matter which scheme I use with the definition. We are still using CI 26 by the way. DM Studio 2.6 on SAS 9.2.
I have had similar issues in the past (since we upgraded to 2.6) with addresses standardizing oddly e.g. 'Meadow Farm Road' standardizing as 'Meadow Road (Farm)' or 'PO BOX' as' Box (PO)'. The PO addresses I just send through a branch that only does POs using a definition and schema . Using a schema seems to take care of the other weirdness.
Can anyone tell me why this is happening or what I can do to stop it! Do you need more info to answer?
Thanks!
I think I understand now what you are seeing. Some hints:
I attached some screenshots to show how using a standardization definition with the Standardization (Parsed) node will get you the results you are looking for.
Ron
Hi,
I'm curious to know what your expectations are when you standardize people's names. Are you trying to get the casing correct or standardizing prefixes or suffixes like Mister/Mr or JR/Jr.?
To help answer your question, can you share the following:
Ron
Hi, Thanks for the response.
I am really just expecting to change case, I'd really like to be able to correct typos like Elizabth -> Elizabeth, but was not really expecting that to work. I am expecting LOU ANN to become Lou Ann or LOU ANN not Ann Lou. It does not do this on all compound type names, just some.
I am using QKB C1 26. Definition is NAME for all, schemes are:
1. EN Given Name Common Compound (Matching-Low Sens..)
2. EN Given Name Common Compounds (Matching)
3. EN Given Name Spelling
4. ENUSE Given Name Spelling (wtih Freq)
5. EN Given Names Propsercase
6. EN Given Name (Matching-Combination Matching)
Here is a sample of of what I am seeing:
First column is the incoming name to be standardised.
Laura Beth | Laura Beth | Laura Beth | Laura Beth | Laura Beth | Laura Beth | Laura Beth |
Lee Ann | Ann Lee | Ann Lee | Ann Lee | Ann Lee | Ann Lee | Ann Lee |
Lee Anne | Anne Lee | Anne Lee | Anne Lee | Anne Lee | Anne Lee | Anne Lee |
Lila Beth | Lila Beth | Lila Beth | Lila Beth | Lila Beth | Lila Beth | Lila Beth |
Lily Belle | Lily Belle | Lily Belle | Lily Belle | Lily Belle | Lily Belle | Lily Belle |
Liu Xiang | Xiang Liu | Xiang Liu | Xiang Liu | Xiang Liu | Xiang Liu | Xiang Liu |
Lou Ann | Ann Lou | Ann Lou | Ann Lou | Ann Lou | Ann Lou | Ann Lou |
Mary Alice | Alice Mary | Alice Mary | Alice Mary | Alice Mary | Alice Mary | Alice Mary |
Thanks,
Cathryn
I think I understand now what you are seeing. Some hints:
I attached some screenshots to show how using a standardization definition with the Standardization (Parsed) node will get you the results you are looking for.
Ron
Thank you. That worked. I should have joined this forum long ago, our support people took 2 weeks and told me the wrong answer!
One more little thing though, some non-English-y names still come out with odd capitalization, e.g. Salah al din ->Salah AL D I N ,
Yu Ku -> Yu K U but not Yu Huan, and Bat Yam -> B A T Yam. I can fix this down the road with an extra step if I have to; but I am wondering if there is a better way to deal with these sorts of name? Thanks again.
We're getting into advanced topics now!
There's a component in DM Studio called Customize. Using that, you can see where the standardization definition transformation is making the change you see. In the case of "Salah al din" for example, there's a standardization scheme called "EN Given Names (Abbreviations Standardization)" that is being applied to the name. It takes "din" and changes it to "D I N" for some (I'm sure a very good) reason (at least in most cases). So to change the behavior, you could modify the standardization definition and remove the scheme altogether (not advised) or you could edit the scheme to adjust this behavior by removing the transformations that don't make sense for your scenario. Back up the scheme file and definition first if you plan to make changes.
Attached images show the step in Customize that made the unwanted change and the scheme value itself.
Ron
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.