I am cleaning raw data for matching purposes with another dataset. The K dataset requires a lot of reformatting, including moving the suffices from the FirstName variable to Suffix variable and removing them from the FirstName variable. This code works to move the suffices to a new variable, but will not remove "III" from the Firstname variable. This works for other suffices, but not III. Including SAS programming specific to this section of data only.
SAMPLE IMPORTED DATA:
...
input FirstName:$13.;
datalines;
RICHARD
MICHAEL
JR
III
;
/* ADD SUFFIX VARIABLE AND REMOVE FROM FIRST NAME */ /* REMOVES ALL 'I' n_lastName = COMPRESS(new_lastName,' III'); */ data K5; set K4; new_FirstName = FirstName; length suffix $ 7; do word= 'JR', 'SR', 'II', 'III', 'IV' ; new_FirstName = tranwrd(' '||new_FirstName, ' '||strip(word)||' ', ' '); new_FirstName = compbl(new_FirstName); if findw(FirstName, 'JR')>0 then do; suffix='JR'; end; if findw(FirstName, 'SR')>0 then do; suffix='SR'; end; if findw(FirstName, 'II')>0 then do; suffix='II'; end; if findw(FirstName, 'III')>0 then do; suffix='III'; end; if findw(FirstName, 'IV')>0 then do; suffix='IV'; end; end; drop word; *newlastName newlName last_name first_name middle_name lastName firstName middleName; run; /* NOTE: III IS STILL PRESENT IN new_FirstName */
SAMPLE K5 OUTPUT
FirstName new_FirstName suffix
RICHARD RICHARD
MICHAEL MICHAEL
JR JR
III III III
/* MOVE FIRST NAME FROM MIDDLENAME VARIABLE */ /* need to drop iii */ data K6; set K5; if new_FirstName = "III" then do; new_FirstName = ""; end; If missing(new_FirstName) then new_FirstName=MiddleName; run;
SAMPLE K6 OUTPUT
FirstName new_FirstName suffix
RICHARD RICHARD
MICHAEL MICHAEL
JR WILLIAM JR
III III III
How do I successfully drop the "III" from the name variable?
1) I ran next code. check the lo - there is no word='III' - only 'II'.
data K5;
set K4;
new_FirstName = FirstName;
*length suffix $ 7;
do word= 'JR', 'SR', 'II', 'III', 'IV' ;
new_FirstName = tranwrd(' '||strip(new_FirstName), ' '||strip(word)||' ', ' ');
put word= new_FirstName=;
new_FirstName = compbl(new_FirstName);
end;
run;
The reason - first word='JR' - only 2 characters, therefore length of word is 2.
Add the statement: length word $3; or the max length of all words.
2) The syntax of tranwrd() function is:
TRANWRD(source,target,replacement) |
I ran next code:
data _NULL_;
length FirstName new_FirstName $13 word $3;
FirstName = 'III';
word = 'III';
new_FirstName = tranwrd(' '||strip(new_FirstName), ' '||strip(word)||' ', ' ');
put FirstName= new_FirstName=;
run;
the result in log is: FirstName=III new_FirstName=
Is that what you want ? Pay attention to strip() added for New_FirstName,
and to LENGTH applying to WORD and to New_FirstName.
3) In all your DO;...END; parts there is only one statement to do;
Why not just: if findw(FirstName, 'JR')>0 then suffix='JR';
the same for all other suffixes.
4) I don't understand why you call it "suffix" if you relate to the whole string of
FirstName ?
Thank you! I removed the extra/unnecessary statements as you suggested.
Yes, I do want the following:
the result in log is: FirstName=III new_FirstName=
The original FULLNAME variable was divided by space.
If FULLNAME=BROWN JR JOHN
Then LastName=BROWN FirstName=JR MiddleName=JOHN
I wanted to add the suffix variable to remove suffices from the FirstName variable.
In my next step I move the actual first name from MiddleName when a suffix is present:
/* MOVE FIRST NAME FROM MIDDLENAME VARIABLE */
data K5;
set K4;
if missing(new_FirstName) then new_FirstName=MiddleName;
run;
This would probably be easier if you do it when you read in the original data, something like this.
/* Make test data *.
data have;
infile datalines truncover dlm=',';
input FULLNAME:$40.;
datalines;
BROWN JR JOHN K
SPRAT III JACK CLIVE
LAMB MARY H
;;;;
data want;
set have;
array name[*] $ 20 Last First Middle Suffix;
drop i element;
element=0;
do i=1 to dim(name);
name[i-element]=scan(fullname,i,' ');
if upcase(compress(name[i-element],,'ka')) in ('II','III','IV','JR','SR') then do;
Suffix=name[i-element];
element+1;
end;
end;
run;
The order in which you use TRANWRD to remove values is removing the first two characters from "III" leaving just a single "I" behind before the attempt to find "III", so a match will never be found for "III".
Changing the order should resolve that issue:
do word= 'JR', 'SR', 'III', 'II', 'IV' ;
new_FirstName = tranwrd(' '||new_FirstName, ' '||strip(word)||' ', ' ');
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.