Correct first and last name order

AmitParmar · Posted 09-24-2021 01:34 AM

data a;
input id name&$40. age salary;
cards;
1 Andy Murray 28 1500000
2 Stewart Christen 31 2500000
3 Adam Levine 35 800000
4 Bill White 40 1500000
5 Army Grey 20 300000
6 Dawson Robert 30 500000
;
run;

Let's say this is the data and there are 10k such entries. Number 2 and 6 have actual names as Christen Stewart and Robert Dawson.

How can we identify what all names are swapped and how can we correct them?

andreas_lds · Posted 09-24-2021 01:46 AM

This task is unsolvable without having a list of "allowed" first names and last names.

Assume you would have "George Michael" in your data, both words in his name could be first name and last name.

AmitParmar · Posted 09-24-2021 02:11 AM

Actually this was one of the interviewer's questions from BARCLAYS but they insisted that they do it and it's very basic.

Kurt_Bremser · Posted 09-24-2021 01:50 AM

This is impossible to do. Think of a guy named Paul Carl (just type that into Google, and immediately there's a LinkedIn profile for someone of that name).

There are gazillions of funny first names (e.g. Moon Zappa), and lots of surnames that are also used as first names.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AmitParmar · Posted 09-24-2021 02:10 AM

Actually this was one of the interviewer's questions from BARCLAYS but they insisted that they do it and it's very basic.

Kurt_Bremser · Posted 09-24-2021 02:29 AM

Let them show you their code, then feed it names that will make it fail, then charge them for revealing the problem.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AmitParmar · Posted 09-24-2021 02:42 AM

Ya right I wish they would have provided me the code.

ballardw · Posted 09-24-2021 10:21 AM

@AmitParmar wrote:

Actually this was one of the interviewer's questions from BARCLAYS but they insisted that they do it and it's very basic.

One strongly suspects they would be scrubbing against a current client list, i.e. the "allowed list" that @andreas_lds mentions.

Then it could be doable.

Until they have two clients, one with the name "John Smith" and the other "Smith John" or any similar pairing.

Patrick · Posted 09-25-2021 10:25 PM

The SAS Data Quality Server / DataFlux provides OOTB functionality for splitting up names into its components like first name, middle name and last name.

The result of such a process will be better than what you can reasonably code for but it will never be perfect (i.e. George Michael and Michael Jordan).

DataFlux uses a QKB (Quality Knowledge Base) provided as part of the product.

Using DQ functions like DQPARSE is not that hard BUT one also needs to regularly verify the quality of the results and have some data Stewart role in place for maintaining and updating the QKB (....which gets often missed or done badly).

You can see from the answers given by others that DataFlux and the DQ functions are not that widely used. I guess it could become a bit more in Viya.

Looks like your interviewers didn't understand that DataFlux is a beast on its own and not just part of foundation SAS.

AmitParmar · Posted 09-25-2021 11:08 PM

Thanks

Correct first and last name order

Re: SAS Query

Re: SAS Query

Re: SAS Query

Re: SAS Query

Re: SAS Query

Re: SAS Query

Re: SAS Query

Re: SAS Query

Re: SAS Query

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away