I need to concatenate the first name and the last name whenever the last 3 digits of the first name column and last name column matches. For example, FirstName001 has to concatenate with LastName001 and then create a new column Student001. My actual dataset has more than 100 columns, so I can’t list them out one by one.
Have:
Center | FirstName001 | FirstName002 | FirstName003 | LastName001 | LastName002 | LastName003 |
A | Jason | Katie | Peter | Berkowitz. | Basel | Hirsch |
B | Dawn | Betty | Chris | Crane | Barhaghi | Choe |
C | Andy | Ben | Donald | Nassar | Amidi | Naomi |
D | Andrew | Ronald | Chirsty | Lee | Hesham | JoAnn |
E | Lee | Jennifer | Louis | Banooni | Kimelman | Douglas |
Want:
Center | Student001 | Student002 | Student003 |
A | Jason Berkowitz. | Katie Basel | Peter Hirsch |
B | Dawn Crane | Betty Barhaghi | Chris Choe |
C | Andy Nassar | Ben Amidi | Donald Naomi |
D | Andrew Lee | Ronald Hesham | Chirsty JoAnn |
E | Lee Banooni | Jennifer Kimelman | Louis Douglas |
data have;
input Center $ FirstName001 $ FirstName002 $ FirstName003 $ LastName001 $ LastName002 $ LastName003 $;
cards;
A Jason Katie Peter Berkowitz. Basel Hirsch
B Dawn Betty Chris Crane Barhaghi Choe
C Andy Ben Donald Nassar Amidi Naomi
D Andrew Ronald Chirsty Lee Hesham JoAnn
E Lee Jennifer Louis Banooni Kimelman Douglas
run;
data want;
set have;
student001=catx("",FirstName001,LastName001);
student002=catx("",FirstName002,LastName002);
student003=catx("",FirstName003,LastName003);
keep center student:;
run;
are there only few columns then
student001=catx(' ', firstname001, lastname001);
student002 =catx(' ',firstname002, lastname002)
student003 = catx(' ', firstname003,lastname003)
@ernie86 Are you wanting to make it trickier with the subject of conditional concatenation? Well, if your columns are in pattern of sets as in firstname set of 3 precedes last name set as shown in your example, all you need is number of elements in one set to compute the automation.
So no matter what the total number of columns divided/2 will give you one set
data have;
input Center $ FirstName001 $ FirstName002 $ FirstName003 $ LastName001 $ LastName002 $ LastName003 $;
cards;
A Jason Katie Peter Berkowitz. Basel Hirsch
B Dawn Betty Chris Crane Barhaghi Choe
C Andy Ben Donald Nassar Amidi Naomi
D Andrew Ronald Chirsty Lee Hesham JoAnn
E Lee Jennifer Louis Banooni Kimelman Douglas
run;
data want;
set have;
array t(*) FirstName001--LastName003;
array t1(*)$25 Student001 Student002 Student003;
_n=dim(t)/2;
do _n_=1 to _n;
t1(_n_)=catx(' ',t(_n_),t(_n_+_n));
end;
drop _:;
run;
@ernie86 you wrote- "There are more than 100 columns."
This makes it 50 students.
you can just change the second array subscript to 50 like
array student(50) $25; /* this generates your student number list that want to derive*/
or something like this
data want;
set have;
array new(*) $30 student001-student003;
array firstname(*) $ FirstName001-FirstName003;
array lastname(*) $ lastName001-lastName003;
do i = 1 to dim(firstname);
if substr(vname(firstname[i]),10,3) = substr(vname(lastname[i]),9,3)
then new{i}=cats(' ',firstname{i},lastname{i});
end;
run;
/*or if you have counterpart for every first and last name then*/
data want;
set have;
array new(*) $20 student001-student003;
array firstname(*) $ FirstName001-FirstName003;
array lastname(*) $ lastName001-lastName003;
do i = 1 to dim(firstname);
new{i}=cats(' ',firstname{i},lastname{i});
end;
run;
@kiranv_ I like the fact your approach is much simpler to understand however I noticed there can be a few redundancies in your code that can be eliminated like this-->
data want;
set have;
array firstname(*) $ FirstName:;
array lastname(*) $ lastName:;
array student(*) $30 student001-student003;
do i = 1 to dim(firstname);
student(i)=catx(' ', firstname(i), lastname(i));
end;
run;
you are right @novinosrin, fixed it
Here is a solution that will work for any number of columns
proc transpose data=have out=long(where=(_name_ ne "Center"));
by center notsorted;
var _character_;
run;
data temp;
set long;
if _name_ =: "First" then part = 1;
else if _name_ =: "Last" then part = 2;
id = scan(_name_, 1, "", "ab");
drop _name_;
run;
proc sql;
create table longWant as
select a.center, a.id, catx(' ', a.col1, b.col1) as name
from
temp as a inner join
temp as b on a.center=b.center and a.id=b.id
where a.part=1 and b.part=2
order by center, id;
quit;
proc transpose data=longWant out=want(drop=_name_) prefix=Student;
by center;
var name;
id id;
run;
data q1;
infile datalines ;
input Center:$ FirstName001:$ FirstName002:$ FirstName003:$ LastName001:$10. LastName002:$ LastName003:$;
datalines;
A Jason Katie Peter Berkowitz. Basel Hirsch
B Dawn Betty Chris Crane Barhaghi Choe
C Andy Ben Donald Nassar Amidi Naomi
D Andrew Ronald Chirsty Lee Hesham JoAnn
E Lee Jennifer Louis Banooni Kimelman Douglas
run;
proc print data=q1;
run;
data sol (keep=Center Student001 Student002 Student003);
set q1;
Student001=catx("",FirstName001,LastName001);
Student002=catx("",FirstName002,LastName002);
Student003=catx("",FirstName003,LastName003);
run;
proc print data=sol;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.