Hi everybody
I am always confuse with IN option. We use IN option in MERGE. How idoes it works? What's going internally if we mention IN?
data x;
merge dsn1(in=a) dsn2(in=b);
by var;
run;
what happens if we give if a=1 or b=0
a=1 and b=1
b=0 or b=0
a=0 and b=0
.................................................................
Can anybody help me to understand this concept?
Thanks in advance
Regards
Rawindarreddy
The IN statements allow you to do a match merge. Using combinations of the IN statements you can replicate LEFT, RIGHT and FULL join logic that is found in SQL.
In your statement above, a or b =1 if a matching record is found.
if a=1 or b=0: Record found in A and Not in B
a=1 and b=1: Full Join - records found in each table
b=0 or b=0: No record found in B - not sure why you would merge
a=0 and b=0: Records not found in either
Also from @Tom:
Here is a paper that talks about SAS merging. That might help.
http://www2.sas.com/proceedings/sugi22/ADVTUTOR/PAPER39.PDF
The IN= variables are indications of whether the dataset contributed information to the current merge. You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.
For left or right join you just need to reference A or B , respectively. Whether the value was in the other datasets doesn't matter.
The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group. SQL JOIN will generate all of the combinations. The merge statement will just match the records in the order that they come. If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are matched to the rest of the records from the other dataset.
The IN statements allow you to do a match merge. Using combinations of the IN statements you can replicate LEFT, RIGHT and FULL join logic that is found in SQL.
In your statement above, a or b =1 if a matching record is found.
if a=1 or b=0: Record found in A and Not in B
a=1 and b=1: Full Join - records found in each table
b=0 or b=0: No record found in B - not sure why you would merge
a=0 and b=0: Records not found in either
Also from @Tom:
Here is a paper that talks about SAS merging. That might help.
http://www2.sas.com/proceedings/sugi22/ADVTUTOR/PAPER39.PDF
The IN= variables are indications of whether the dataset contributed information to the current merge. You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.
For left or right join you just need to reference A or B , respectively. Whether the value was in the other datasets doesn't matter.
The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group. SQL JOIN will generate all of the combinations. The merge statement will just match the records in the order that they come. If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are matched to the rest of the records from the other dataset.
thanks jcbell.
u said if a=1 or b=0: Record found in A and Not in B
but it do left join
whole observations from leftside matching observations from rightside datset.
thats why iam confusing
is a=1 means whole observations
b=0 means matching observations
plz clarify my doubt
Thanks & Regards
Rawindarreddy
Hi Rawindarreddy,
if you run the code below you will see the difference.
data have1;
input id age;
cards;
2 20
3 30
6 60
4 40
5 50
;
run;
proc sort data=have1;by id;run;
data have2;
input id sex $;
cards;
1 f
3 m
6 m
5 f
8 m
7 f
;
proc sort data=have2;by id;run;
run;
data merged1;
merge have1(in=a) have2(in=b);
by id;
if a and b;
/* this is the same as "if a=1 and b=1" */
run;
proc print;title IDs in both datasets;run;
data merged2;
merge have1(in=a) have2(in=b);
by id;
if a and not b;
/* this is the same as "if a=1 and b=0" */
run;
proc print; title IDs that in have1 but not in have2;run;
data merged3;
merge have1(in=a) have2(in=b);
by id;
if not a and b;
/* this is the same as "if b=1 and a=0" */
run;
proc print; title IDs that in have2 but not in have1;run;
data merged4;
merge have1(in=a) have2(in=b);
/* this is the same as "if a or b"*/;
by id;
run;
proc print; title all IDs in either have1 or have2;run;
/*output*/
IDs in both datasets
Obs id age sex
1 3 30 m
2 5 50 f
3 6 60 m
IDs that in have1 but not in have2
Obs id age sex
1 2 20
2 4 40
IDs that in have2 but not in have1
Obs id age sex
1 1 . f
2 7 . f
3 8 . m
all IDs in either have1 or have2
Obs id age sex
1 1 . f
2 2 20
3 3 30 m
4 4 40
5 5 50 f
6 6 60 m
7 7 . f
8 8 . m
thank.......u Linlin
i understood that a=1 and b=0 gives obs from left dataset which not exists in right dataset
if we give a=1 or b=0 it gives whole obs from left and matching obs from right side means LEFT JOIN
but b=0 means observations not in a
i am confusing .Can u give me clarity how it works
Thanks & Regards
Rawindarreddy
Here is a paper that talks about SAS merging. That might help.
http://www2.sas.com/proceedings/sugi22/ADVTUTOR/PAPER39.PDF
The IN= variables are indications of whether the dataset contributed information to the current merge. You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.
For left or right join you just need to reference A or B , respectively. Whether the value was in the other datasets doesn't matter.
The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group. SQL JOIN will generate all of the combinations. The merge statement will just match the records in the order that they come. If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are match to the rest of the records from the other dataset.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.