BookmarkSubscribeRSS Feed
ILoveKismet
Calcite | Level 5

I am cleaning raw data for matching purposes with another dataset. The K dataset requires a lot of reformatting, including moving the suffices from the FirstName variable to Suffix variable and removing them from the FirstName variable. This code works to move the suffices to a new variable, but will not remove "III" from the Firstname variable. This works for other suffices, but not III. Including SAS programming specific to this section of data only.

 

SAMPLE IMPORTED DATA:

...

input FirstName:$13.;

datalines;

RICHARD

MICHAEL

JR

III

;

 

 

/* ADD SUFFIX VARIABLE AND REMOVE FROM FIRST NAME */
/* REMOVES ALL 'I' n_lastName = COMPRESS(new_lastName,' III'); */
data K5;
set K4;
new_FirstName = FirstName;
	length suffix $ 7;
	do word= 'JR', 'SR', 'II', 'III', 'IV' ;
		new_FirstName = tranwrd(' '||new_FirstName, ' '||strip(word)||' ', ' ');
	new_FirstName = compbl(new_FirstName);
	if  findw(FirstName, 'JR')>0 then do;
		suffix='JR';
		end;
	if  findw(FirstName, 'SR')>0 then do;
		suffix='SR';
		end;
	if  findw(FirstName, 'II')>0 then do;
		suffix='II';
		end;
	if  findw(FirstName, 'III')>0 then do;
		suffix='III';
		end;
	if  findw(FirstName, 'IV')>0 then do;
		suffix='IV';
		end;
	end;
	drop word; 
	*newlastName newlName last_name first_name middle_name lastName firstName middleName;
run;
/* NOTE: III IS STILL PRESENT IN new_FirstName */

SAMPLE K5 OUTPUT

FirstName new_FirstName suffix

RICHARD RICHARD

MICHAEL MICHAEL

JR   JR

III III III

 

/* MOVE FIRST NAME FROM MIDDLENAME VARIABLE */
/* need to drop iii */
data K6;
set K5;
if new_FirstName = "III" then do;
new_FirstName = "";
end;
If missing(new_FirstName) then new_FirstName=MiddleName;
run;

SAMPLE K6 OUTPUT

FirstName new_FirstName suffix

RICHARD RICHARD

MICHAEL MICHAEL

JR WILLIAM JR

III III III

 

How do I successfully drop the "III" from the name variable?

4 REPLIES 4
Shmuel
Garnet | Level 18

1) I ran next code. check the lo - there is no word='III' - only 'II'.

data K5;
 set K4;
    new_FirstName = FirstName;
	*length suffix $ 7;
	do word= 'JR', 'SR', 'II', 'III', 'IV' ;
		new_FirstName = tranwrd(' '||strip(new_FirstName), ' '||strip(word)||' ', ' ');
	put word=  new_FirstName=;   
	    new_FirstName = compbl(new_FirstName);
	end;
run;	

    The reason - first word='JR' - only 2 characters, therefore length of word is 2.

     Add the statement: length word $3;  or the max length of all words.

 

2) The syntax of tranwrd() function is:

TRANWRD(source,target,replacement)

     I ran next code:

data _NULL_;
   length FirstName new_FirstName $13 word $3;
   FirstName = 'III';
   word = 'III';
   new_FirstName = tranwrd(' '||strip(new_FirstName), ' '||strip(word)||' ', ' ');
   put FirstName= new_FirstName=;
run;

the result in log is: FirstName=III new_FirstName=  

Is that what you want ? Pay attention to strip() added for New_FirstName,

and to LENGTH applying to WORD and to New_FirstName.

 

3) In all your DO;...END; parts there is only one statement to do;

    Why not just: if findw(FirstName, 'JR')>0 then suffix='JR';

    the same for all other suffixes.

 

4) I don't understand why you call it "suffix" if you relate to the whole string of

    FirstName ?

 

ILoveKismet
Calcite | Level 5

Thank you! I removed the extra/unnecessary statements as you suggested.

 

Yes, I do want the following: 
the result in log is: FirstName=III new_FirstName=  

 

The original FULLNAME variable was divided by space. 

If FULLNAME=BROWN JR JOHN

Then LastName=BROWN FirstName=JR MiddleName=JOHN

I wanted to add the suffix variable to remove suffices from the FirstName variable.

 

In my next step I move the actual first name from MiddleName when a suffix is present:

/* MOVE FIRST NAME FROM MIDDLENAME VARIABLE */
data K5;
set K4;
if missing(new_FirstName) then new_FirstName=MiddleName;
run;
SASJedi
SAS Super FREQ

This would probably be easier if you do it when you read in the original data, something like this.

/* Make test data *.
data have;
  infile datalines truncover dlm=',';
  input FULLNAME:$40.;
datalines;
BROWN JR JOHN K
SPRAT III JACK CLIVE
LAMB MARY H
;;;;

data want;
   set have;
   array name[*] $ 20 Last First Middle  Suffix;
   drop i element;
   element=0;
   do i=1 to dim(name);
      name[i-element]=scan(fullname,i,' ');
      if upcase(compress(name[i-element],,'ka')) in ('II','III','IV','JR','SR') then do;
         Suffix=name[i-element];
         element+1;
      end;
   end;
run;
 

 

Check out my Jedi SAS Tricks for SAS Users
SASJedi
SAS Super FREQ

The order in which you use TRANWRD to remove values is removing the first two characters from "III" leaving just a single "I" behind before the attempt to find "III", so a match will never be found for "III". 

Changing the order should resolve that issue:

	do word= 'JR', 'SR', 'III', 'II', 'IV' ;
		new_FirstName = tranwrd(' '||new_FirstName, ' '||strip(word)||' ', ' ');
Check out my Jedi SAS Tricks for SAS Users

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1598 views
  • 3 likes
  • 3 in conversation