Hello,
I have two datasets that I am trying to merge based on a condition. Esstenially trying to combine Merge and If-Then-Else statements
Data 1 = Identifier Segment Data 2 = Identifier CC CR Need Data = Identifier Segment New_Field
01 CC 01 1 1 01 CC 1
02 CC 02 2 3 02 CC 2
03 CC 03 3 4 03 CC 3
01 CR 01 CR 1
02 CR 02 CR 3
I could merge the two dataset first and then use IF-THEN to create the new field. Is there a more efficient way of doing this by combining these two steps.
Thanks for the suggestions.
Hello @sai_ch,
So, essentially, you have these two datasets:
data have1;
input id $ segment $;
cards;
01 CC
02 CC
03 CC
01 CR
02 CR
;
data have2;
input id $ CC CR;
cards;
01 1 1
02 2 3
03 3 4
;
And you're thinking about doing something like this:
proc sort data=have1 out=s1;
by id;
run;
data want;
merge s1 have2;
by id;
if segment='CC' then new_field=cc;
else if segment='CR' then new_field=cr;
drop cc cr;
run;
proc sort data=want;
by segment id;
run;
This is fine, but it takes three steps.
With PROC SQL you could do it in one step:
proc sql;
create table want as
select a.*, choosen(whichc(segment, 'CC', 'CR'), CC, CR) as new_field
from have1 a, have2 b
where a.id=b.id
order by segment, a.id;
quit;
The functions CHOOSEN and WHICHC could have been used in the data step solution as well.
Edit: But in the data step they could have been replaced by VVALUEX in conjunction with INPUT:
new_field=input(vvaluex(segment), 12.)
Edit 2: A shorter expression for the definition of NEW_FIELD, applicable to both solutions, is:
ifn(segment='CC', CC, CR)
I apologize for the layout of the datasets. Let me try again
Data 1 = Identifier Segment Data 2 = Identifier CC CR
01 CC 01 1 1
02 CC 02 2 3
03 CC 03 3 4
01 CR
02 CR
Need Data = Identifier Segment New_Field
01 CC 1
02 CC 2
03 CC 3
01 CR 1
02 CR 3
Thanks again
Hello @sai_ch,
So, essentially, you have these two datasets:
data have1;
input id $ segment $;
cards;
01 CC
02 CC
03 CC
01 CR
02 CR
;
data have2;
input id $ CC CR;
cards;
01 1 1
02 2 3
03 3 4
;
And you're thinking about doing something like this:
proc sort data=have1 out=s1;
by id;
run;
data want;
merge s1 have2;
by id;
if segment='CC' then new_field=cc;
else if segment='CR' then new_field=cr;
drop cc cr;
run;
proc sort data=want;
by segment id;
run;
This is fine, but it takes three steps.
With PROC SQL you could do it in one step:
proc sql;
create table want as
select a.*, choosen(whichc(segment, 'CC', 'CR'), CC, CR) as new_field
from have1 a, have2 b
where a.id=b.id
order by segment, a.id;
quit;
The functions CHOOSEN and WHICHC could have been used in the data step solution as well.
Edit: But in the data step they could have been replaced by VVALUEX in conjunction with INPUT:
new_field=input(vvaluex(segment), 12.)
Edit 2: A shorter expression for the definition of NEW_FIELD, applicable to both solutions, is:
ifn(segment='CC', CC, CR)
Another SQL solution would be:
proc sql;
create table want as
select have1.*, CC as new_field from have1 natural join have2 where segment="CC"
union all
select have1.*, CR as new_field from have1 natural join have2 where segment="CR"
order by segment, id;
quit;
data have1;
input id $ segment $;
cards;
01 CC
02 CC
03 CC
01 CR
02 CR
;
data have2;
input id $ CC CR;
cards;
01 1 1
02 2 3
03 3 4
;
proc transpose data=have2 out=x(index=(segment id) rename=(col1=New_Field _name_=segment));
by id;
var CC CR;
run;
data want;
ina=0;inb=0;
merge have1(in=ina ) x(in=inb) ;
by segment id;
if ina and inb;
run;
Thank you all for improving my SAS skills.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.