DATA Step, Macro, Functions and more

Conditional Cancatenation

Reply
Contributor
Posts: 27

Conditional Cancatenation

[ Edited ]

I need to concatenate the first name and the last name whenever the last 3 digits of the first name column and last name column matches. For example, FirstName001 has to concatenate with LastName001 and then create a new column Student001. My actual dataset has more than 100 columns, so I can’t list them out one by one.

 

Have:

Center FirstName001 FirstName002 FirstName003 LastName001 LastName002 LastName003
A Jason Katie Peter Berkowitz. Basel Hirsch
B Dawn Betty Chris Crane Barhaghi Choe
C Andy Ben Donald Nassar Amidi Naomi
D Andrew Ronald Chirsty Lee Hesham JoAnn
E Lee Jennifer Louis Banooni Kimelman Douglas

Want:

Center Student001 Student002 Student003
A Jason Berkowitz. Katie Basel Peter Hirsch
B Dawn Crane Betty Barhaghi Chris Choe
C Andy Nassar Ben Amidi Donald Naomi
D Andrew Lee Ronald Hesham Chirsty JoAnn
E Lee Banooni Jennifer Kimelman Louis Douglas
Frequent Contributor
Posts: 93

Re: Conditional Cancatenation

data have;
input Center $ FirstName001 $ FirstName002 $ FirstName003 $ LastName001 $ LastName002 $ LastName003 $;
cards;
A Jason Katie Peter Berkowitz. Basel Hirsch
B Dawn Betty Chris Crane Barhaghi Choe
C Andy Ben Donald Nassar Amidi Naomi
D Andrew Ronald Chirsty Lee Hesham JoAnn
E Lee Jennifer Louis Banooni Kimelman Douglas
run;


data want;
set have;
student001=catx("",FirstName001,LastName001);
student002=catx("",FirstName002,LastName002);
student003=catx("",FirstName003,LastName003);

keep center student:;
run;

PROC Star
Posts: 549

Re: Conditional Cancatenation

[ Edited ]

are there only few columns then 

 

 student001=catx(' ', firstname001, lastname001);
student002 =catx(' ',firstname002, lastname002)
student003 = catx(' ', firstname003,lastname003)

Contributor
Posts: 27

Re: Conditional Cancatenation

There are more than 100 columns.
Super User
Posts: 2,061

Re: Conditional Cancatenation

@ernie86  Are you wanting to make it trickier with the subject of conditional concatenation? Well, if your columns are in pattern of sets as in firstname set of 3 precedes last name set as shown in your example, all you need is number of elements in one set to compute the automation.

So no matter what the total number of columns divided/2 will give you one set 

 

data have;
input Center $ FirstName001 $ FirstName002 $ FirstName003 $ LastName001 $ LastName002 $ LastName003 $;
cards;
A Jason Katie Peter Berkowitz. Basel Hirsch
B Dawn Betty Chris Crane Barhaghi Choe
C Andy Ben Donald Nassar Amidi Naomi
D Andrew Ronald Chirsty Lee Hesham JoAnn
E Lee Jennifer Louis Banooni Kimelman Douglas
run;

data want;
set have;
array t(*) FirstName001--LastName003;
array t1(*)$25 Student001	Student002	Student003;
_n=dim(t)/2;
do _n_=1 to _n;
t1(_n_)=catx(' ',t(_n_),t(_n_+_n));
end;
drop _:;
run;

 

Super User
Posts: 2,061

Re: Conditional Cancatenation

Posted in reply to novinosrin

@ernie86 you wrote- "There are more than 100 columns."

 

This makes it 50 students.

 

you can just change the second array subscript to 50 like

 

array student(50) $25; /* this generates your student number list that want to derive*/

PROC Star
Posts: 549

Re: Conditional Cancatenation

[ Edited ]

or something like this

data want;
set have;

array new(*) $30 student001-student003;
array firstname(*) $ FirstName001-FirstName003;
array lastname(*) $ lastName001-lastName003;
do i = 1 to dim(firstname);
if substr(vname(firstname[i]),10,3) = substr(vname(lastname[i]),9,3)
then new{i}=cats(' ',firstname{i},lastname{i});
end;
run;
/*or if you have counterpart for every first and last name then*/


data want;
set have;

array new(*) $20 student001-student003;
array firstname(*) $ FirstName001-FirstName003;
array lastname(*) $ lastName001-lastName003;
do i = 1 to dim(firstname);
 new{i}=cats(' ',firstname{i},lastname{i});
end;
run;

 

Super User
Posts: 2,061

Re: Conditional Cancatenation

@kiranv_  I like the fact your approach is much simpler to understand however I noticed there can be a few redundancies in your code that can be eliminated like this-->

 

data want;
set have;
array firstname(*) $ FirstName:;
array lastname(*) $ lastName:;
array student(*)  $30  student001-student003;
do i = 1 to dim(firstname);
student(i)=catx(' ', firstname(i), lastname(i));
end;
run;
PROC Star
Posts: 549

Re: Conditional Cancatenation

[ Edited ]
Posted in reply to novinosrin

you are right @novinosrin, fixed it 

Esteemed Advisor
Posts: 5,626

Re: Conditional Cancatenation

Here is a solution that will work for any number of columns

 


proc transpose data=have out=long(where=(_name_ ne "Center"));
by center notsorted;
var _character_;
run;

data temp;
set long;
if _name_ =: "First" then part = 1; 
else if _name_ =: "Last" then part = 2;
id = scan(_name_, 1, "", "ab");
drop _name_;
run;

proc sql;
create table longWant as
select a.center, a.id, catx(' ', a.col1, b.col1) as name
from 
    temp as a inner join 
    temp as b on a.center=b.center and a.id=b.id
where a.part=1 and b.part=2
order by center, id;
quit;

proc transpose data=longWant out=want(drop=_name_) prefix=Student;
by center;
var name;
id id;
run;
PG
New Contributor
Posts: 2

Re: Conditional Cancatenation

[ Edited ]
data q1;
infile datalines ;
input Center:$ FirstName001:$ FirstName002:$ FirstName003:$ LastName001:$10. LastName002:$ LastName003:$;
datalines;
A  Jason   Katie    Peter   Berkowitz.	Basel	  Hirsch
B  Dawn	   Betty    Chris   Crane	Barhaghi  Choe
C  Andy	   Ben	    Donald  Nassar	Amidi	  Naomi
D  Andrew  Ronald   Chirsty Lee	        Hesham	  JoAnn
E  Lee	   Jennifer Louis   Banooni   	Kimelman  Douglas
run;
proc print data=q1;
run;

data sol (keep=Center Student001 Student002 Student003);
set q1;
Student001=catx("",FirstName001,LastName001);
Student002=catx("",FirstName002,LastName002);
Student003=catx("",FirstName003,LastName003);
run;
proc print data=sol;
run;
Ask a Question
Discussion stats
  • 10 replies
  • 224 views
  • 2 likes
  • 6 in conversation