Re: Remove "III" suffix from name variable

ILoveKismet · Posted 02-15-2021 09:24 AM

I am cleaning raw data for matching purposes with another dataset. The K dataset requires a lot of reformatting, including moving the suffices from the FirstName variable to Suffix variable and removing them from the FirstName variable. This code works to move the suffices to a new variable, but will not remove "III" from the Firstname variable. This works for other suffices, but not III. Including SAS programming specific to this section of data only.

SAMPLE IMPORTED DATA:

...

input FirstName:$13.;

datalines;

RICHARD

MICHAEL

JR

III

;

/* ADD SUFFIX VARIABLE AND REMOVE FROM FIRST NAME */
/* REMOVES ALL 'I' n_lastName = COMPRESS(new_lastName,' III'); */
data K5;
set K4;
new_FirstName = FirstName;
	length suffix $ 7;
	do word= 'JR', 'SR', 'II', 'III', 'IV' ;
		new_FirstName = tranwrd(' '||new_FirstName, ' '||strip(word)||' ', ' ');
	new_FirstName = compbl(new_FirstName);
	if  findw(FirstName, 'JR')>0 then do;
		suffix='JR';
		end;
	if  findw(FirstName, 'SR')>0 then do;
		suffix='SR';
		end;
	if  findw(FirstName, 'II')>0 then do;
		suffix='II';
		end;
	if  findw(FirstName, 'III')>0 then do;
		suffix='III';
		end;
	if  findw(FirstName, 'IV')>0 then do;
		suffix='IV';
		end;
	end;
	drop word; 
	*newlastName newlName last_name first_name middle_name lastName firstName middleName;
run;
/* NOTE: III IS STILL PRESENT IN new_FirstName */

SAMPLE K5 OUTPUT

FirstName new_FirstName suffix

RICHARD RICHARD

MICHAEL MICHAEL

JR JR

III III III

/* MOVE FIRST NAME FROM MIDDLENAME VARIABLE */
/* need to drop iii */
data K6;
set K5;
if new_FirstName = "III" then do;
new_FirstName = "";
end;
If missing(new_FirstName) then new_FirstName=MiddleName;
run;

SAMPLE K6 OUTPUT

FirstName new_FirstName suffix

RICHARD RICHARD

MICHAEL MICHAEL

JR WILLIAM JR

III III III

How do I successfully drop the "III" from the name variable?

Shmuel · Posted 02-15-2021 11:03 AM

1) I ran next code. check the lo - there is no word='III' - only 'II'.

data K5;
 set K4;
    new_FirstName = FirstName;
	*length suffix $ 7;
	do word= 'JR', 'SR', 'II', 'III', 'IV' ;
		new_FirstName = tranwrd(' '||strip(new_FirstName), ' '||strip(word)||' ', ' ');
	put word=  new_FirstName=;   
	    new_FirstName = compbl(new_FirstName);
	end;
run;

The reason - first word='JR' - only 2 characters, therefore length of word is 2.

Add the statement: length word $3; or the max length of all words.

2) The syntax of tranwrd() function is:

TRANWRD(source,target,replacement)

I ran next code:

data _NULL_;
   length FirstName new_FirstName $13 word $3;
   FirstName = 'III';
   word = 'III';
   new_FirstName = tranwrd(' '||strip(new_FirstName), ' '||strip(word)||' ', ' ');
   put FirstName= new_FirstName=;
run;

the result in log is: FirstName=III new_FirstName=

Is that what you want ? Pay attention to strip() added for New_FirstName,

and to LENGTH applying to WORD and to New_FirstName.

3) In all your DO;...END; parts there is only one statement to do;

Why not just: if findw(FirstName, 'JR')>0 then suffix='JR';

the same for all other suffixes.

4) I don't understand why you call it "suffix" if you relate to the whole string of

FirstName ?

ILoveKismet · Posted 02-15-2021 12:53 PM

Thank you! I removed the extra/unnecessary statements as you suggested.

Yes, I do want the following:
the result in log is: FirstName=III new_FirstName=

The original FULLNAME variable was divided by space.

If FULLNAME=BROWN JR JOHN

Then LastName=BROWN FirstName=JR MiddleName=JOHN

I wanted to add the suffix variable to remove suffices from the FirstName variable.

In my next step I move the actual first name from MiddleName when a suffix is present:

/* MOVE FIRST NAME FROM MIDDLENAME VARIABLE */
data K5;
set K4;
if missing(new_FirstName) then new_FirstName=MiddleName;
run;

SASJedi · Posted 02-15-2021 02:14 PM

This would probably be easier if you do it when you read in the original data, something like this.

/* Make test data *.
data have;
  infile datalines truncover dlm=',';
  input FULLNAME:$40.;
datalines;
BROWN JR JOHN K
SPRAT III JACK CLIVE
LAMB MARY H
;;;;

data want;
   set have;
   array name[*] $ 20 Last First Middle  Suffix;
   drop i element;
   element=0;
   do i=1 to dim(name);
      name[i-element]=scan(fullname,i,' ');
      if upcase(compress(name[i-element],,'ka')) in ('II','III','IV','JR','SR') then do;
         Suffix=name[i-element];
         element+1;
      end;
   end;
run;

Check out my Jedi SAS Tricks for SAS Users

SASJedi · Posted 02-15-2021 11:06 AM

The order in which you use TRANWRD to remove values is removing the first two characters from "III" leaving just a single "I" behind before the attempt to find "III", so a match will never be found for "III".

Changing the order should resolve that issue:

	do word= 'JR', 'SR', 'III', 'II', 'IV' ;
		new_FirstName = tranwrd(' '||new_FirstName, ' '||strip(word)||' ', ' ');

Check out my Jedi SAS Tricks for SAS Users

Remove "III" suffix from name variable