The best way to do this is to import the names separately. That is you manually identify the separate name components and line the data up starting in a particular column. Then your INPUT statement would look something like this:
input @1 fname $20. @22 mname $20. @44 lname $20.;
1. COUNTW() to count the number of words in the string.
2. If number of words is 2, then do the first and last only
3. If number of words is >2, then get the first and last as the first/last and the others as middle.
SCAN() will allow you to retrieve the correct parts.
This will not be correct of course. Some names have 2 parts to the first or last component but there's no way for me to know which is which.
Your code doesn't run correctly, please ensure you test it before posting to ensure it reads your data correctly.
unless the count of words is > 3, then you need to start building some rules around what is excitable and not excitable.
Updated the input statement.
data names;
infile datalines;
input name $30.;
datalines;
arun krishna
gopal rao
venu vardhan reddy
rames krishna rao sunkara
kiran
;
run;
data names;
infile datalines;
input name $30.;
nwords = countw(name);
first_name = scan(name, 1);
if nwords >1 then last_name = scan(name, -1);
length middle_name $30.;
do i=2 to nwords-1;
middle_name = catx(" ", trim(middle_name), scan(name, i));
end;
datalines;
arun krishna
gopal rao
venu vardhan reddy
rames krishna rao sunkara
kiran
;
run;
If you've got the SAS Data Quality Server / DataFlux licensed then tokenizing a name string would be a "school book" example.
data names;
infile datalines;
input name $30.;
datalines;
arun krishna
gopal rao
venu vardhan reddy
rames krishna rao sunkara
kiran
;
run;
data want;
set names;
call scan(name,1,p1,l1,' ');
call scan(name,-1,p2,l2,' ');
first=scan(name,1,' ');
if p2>p1 then last=scan(name,-1,' ');
if p2>p1+l1+1 then middle=substr(name,p1+l1,p2-p1-l1);
drop p1 p2 l1 l2;
run;
@Anmolkhandelwal wrote:
I have data something like below,
data names;
infile datalines;
input name $;
datalines;
arun Krishna < which is first name, which is middle, which is last?
gopal rao
venu vardhan reddy
rames krishna rao sunkara < which is first name, which is middle, which is last
kiran < which is first name, which is middle, which is last
;
run;
I want to 3 more variables like
first name
middle name
Last name
if the variable does not have the middle name or last name then it should be blank please can you tell me the how to solve that particular problem
If there are not exactly 3 names given then you need to provide rules as to how to treat 1, 2 or more name elements.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.