Hello,
I have two datasets that I am trying to merge based on a condition. Esstenially trying to combine Merge and If-Then-Else statements
Data 1 = Identifier Segment Data 2 = Identifier CC CR Need Data = Identifier Segment New_Field
01 CC 01 1 1 01 CC 1
02 CC 02 2 3 02 CC 2
03 CC 03 3 4 03 CC 3
01 CR 01 CR 1
02 CR 02 CR 3
I could merge the two dataset first and then use IF-THEN to create the new field. Is there a more efficient way of doing this by combining these two steps.
Thanks for the suggestions.
Hello @sai_ch,
So, essentially, you have these two datasets:
data have1;
input id $ segment $;
cards;
01 CC
02 CC
03 CC
01 CR
02 CR
;
data have2;
input id $ CC CR;
cards;
01 1 1
02 2 3
03 3 4
;
And you're thinking about doing something like this:
proc sort data=have1 out=s1;
by id;
run;
data want;
merge s1 have2;
by id;
if segment='CC' then new_field=cc;
else if segment='CR' then new_field=cr;
drop cc cr;
run;
proc sort data=want;
by segment id;
run;
This is fine, but it takes three steps.
With PROC SQL you could do it in one step:
proc sql;
create table want as
select a.*, choosen(whichc(segment, 'CC', 'CR'), CC, CR) as new_field
from have1 a, have2 b
where a.id=b.id
order by segment, a.id;
quit;
The functions CHOOSEN and WHICHC could have been used in the data step solution as well.
Edit: But in the data step they could have been replaced by VVALUEX in conjunction with INPUT:
new_field=input(vvaluex(segment), 12.)
Edit 2: A shorter expression for the definition of NEW_FIELD, applicable to both solutions, is:
ifn(segment='CC', CC, CR)
I apologize for the layout of the datasets. Let me try again
Data 1 = Identifier Segment Data 2 = Identifier CC CR
01 CC 01 1 1
02 CC 02 2 3
03 CC 03 3 4
01 CR
02 CR
Need Data = Identifier Segment New_Field
01 CC 1
02 CC 2
03 CC 3
01 CR 1
02 CR 3
Thanks again
Hello @sai_ch,
So, essentially, you have these two datasets:
data have1;
input id $ segment $;
cards;
01 CC
02 CC
03 CC
01 CR
02 CR
;
data have2;
input id $ CC CR;
cards;
01 1 1
02 2 3
03 3 4
;
And you're thinking about doing something like this:
proc sort data=have1 out=s1;
by id;
run;
data want;
merge s1 have2;
by id;
if segment='CC' then new_field=cc;
else if segment='CR' then new_field=cr;
drop cc cr;
run;
proc sort data=want;
by segment id;
run;
This is fine, but it takes three steps.
With PROC SQL you could do it in one step:
proc sql;
create table want as
select a.*, choosen(whichc(segment, 'CC', 'CR'), CC, CR) as new_field
from have1 a, have2 b
where a.id=b.id
order by segment, a.id;
quit;
The functions CHOOSEN and WHICHC could have been used in the data step solution as well.
Edit: But in the data step they could have been replaced by VVALUEX in conjunction with INPUT:
new_field=input(vvaluex(segment), 12.)
Edit 2: A shorter expression for the definition of NEW_FIELD, applicable to both solutions, is:
ifn(segment='CC', CC, CR)
Another SQL solution would be:
proc sql;
create table want as
select have1.*, CC as new_field from have1 natural join have2 where segment="CC"
union all
select have1.*, CR as new_field from have1 natural join have2 where segment="CR"
order by segment, id;
quit;
data have1;
input id $ segment $;
cards;
01 CC
02 CC
03 CC
01 CR
02 CR
;
data have2;
input id $ CC CR;
cards;
01 1 1
02 2 3
03 3 4
;
proc transpose data=have2 out=x(index=(segment id) rename=(col1=New_Field _name_=segment));
by id;
var CC CR;
run;
data want;
ina=0;inb=0;
merge have1(in=ina ) x(in=inb) ;
by segment id;
if ina and inb;
run;
Thank you all for improving my SAS skills.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.