DATA Step, Macro, Functions and more

How can I resolve name values and re-assign to the correct fields?

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

How can I resolve name values and re-assign to the correct fields?

 

I have a data set containing FIRST NAME and LAST NAME variables. In order to execute my program correctly, I need a clean dataset. But, the one I'm working with has enough issues to keep me busy for days. What I need for the program to work is to have only FN and LN without any special characters or spaces or suffixes. The table below shows what kinds of issues I am encountering. The asterisk indicates missing value. I would like to move probable names to the correct fields. 

 

FirstName

LastName

George

Washington Carver

George Washington

Carver

*

George W Carver

*

George W

George

Washington-Carver

George

WashingtonCarver

George

Carver, III

 

data testing1b;
set testing1a;
if ANYPUNCT (clientFirstName)=1 then do
	Fname1=scan (clientFirstName,1,", ");
	Fname2=scan (clientFirstName,2,", ");
	Fname3=scan (clientFirstName,3,", ");
	Fname4=scan (clientFirstName,4,", ");
	If Fname1='JR' or 'SR' or 'II' or 'III' or length(Fname1)=1 then Fname1=' ';
	If Fname2='JR' or 'SR' or 'II' or 'III' or length(Fname2)=1 then Fname2=' ';
	If Fname3='JR' or 'SR' or 'II' or 'III' or length(Fname3)=1 then Fname3=' ';
	If Fname4='JR' or 'SR' or 'II' or 'III' or length(Fname4)=1 then Fname4=' ';
end;

 


Accepted Solutions
Solution
‎07-03-2017 12:12 PM
New Contributor
Posts: 4

Re: How can I resolve name values and re-assign to the correct fields?

I solved the problem with the following code: 

data clean_fn;
set testing2b;
last_name = scan(FN,-1,' '); 
mi_name = scan(FN,-2,' ');
first_name = scan(FN,-3,' ');
keep FN LN last_name first_name mi_name;
run;

In this dataset, the FN variable contained first, middle and last name as a value. This worked out very well considering the size of the dataset. 

View solution in original post


All Replies
PROC Star
Posts: 1,760

Re: How can I resolve name values and re-assign to the correct fields?

What is your question?

New Contributor
Posts: 4

Re: How can I resolve name values and re-assign to the correct fields?

I am seeking advice on how to clean up the dataset. There are issues with FN and LN values which I need to resolve. The dataset contains 480,000 records. I need assistance finding an effectient code to resolve the issues. I will then be using this dataset to conduct a match through another SAS program. 

Super User
Posts: 11,343

Re: How can I resolve name values and re-assign to the correct fields?

Will your data be matched to a known data set of "clean" values or do you need to clean this without any outside comparison?

How many records are you dealing with? Some techniques that may work for a couple hundred names in a reasonable amount of time may not work if you're dealing with millions of records.

 

There are a number of search and replace function such as TRANWRD, TRANSLATE and COMPRESS that you could use to reduce some of this

For instanc instead of scanning and looking for JR in mulple places use Tranwrd;

data example;
   name='Smith, John JR';
   cleaname = tranwrd(name,'JR','');
run;

You can use compress to replace some obnoxious characters

 

data example;
   name= '-Smith, * John !';
   cleanname =  compress(name,'*|-/!');
run;
New Contributor
Posts: 4

Re: How can I resolve name values and re-assign to the correct fields?

Hi. Thank you for your response. The dataset I'm working with contains 480,000 records. About 10,000 appear to have issues with FN and LN variables having values in the incorrect column.

 

Yes, I am cleaning this dataset so it can be run through a matching program I created. I'll try your suggestions and see what happens. 

Super User
Posts: 5,435

Re: How can I resolve name values and re-assign to the correct fields?

To retain a good solution that is maintainable (is that word?) over time, you might want to check out SAS Data Management Studio.
Data never sleeps
Solution
‎07-03-2017 12:12 PM
New Contributor
Posts: 4

Re: How can I resolve name values and re-assign to the correct fields?

I solved the problem with the following code: 

data clean_fn;
set testing2b;
last_name = scan(FN,-1,' '); 
mi_name = scan(FN,-2,' ');
first_name = scan(FN,-3,' ');
keep FN LN last_name first_name mi_name;
run;

In this dataset, the FN variable contained first, middle and last name as a value. This worked out very well considering the size of the dataset. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 158 views
  • 0 likes
  • 4 in conversation