Solved
Contributor
Posts: 70

# IN Dataset option

[ Edited ]

Hi everybody

I am always confuse with IN option. We use IN option in MERGE. How idoes it works?  What's going internally if we mention IN?

``````data x;
merge dsn1(in=a) dsn2(in=b);
by var;
run;
``````

what happens if we give if a=1 or b=0

a=1 and  b=1

b=0 or b=0

a=0 and b=0

.................................................................

Can anybody help me to understand this concept?

Regards

Rawindarreddy

Accepted Solutions
Solution
‎07-17-2017 10:55 AM
Contributor
Posts: 42

## IN Dataset option

[ Edited ]

The IN statements allow you to do a match merge.  Using combinations of the IN statements you can replicate LEFT, RIGHT and FULL join logic that is found in SQL.

In your statement above, a or b =1 if a matching record is found.

if a=1 or b=0: Record found in A and Not in B

a=1 and  b=1: Full Join - records found in each table

b=0 or b=0: No record found in B - not sure why you would merge

Also from @Tom:

Here is a paper that talks about SAS merging. That might help.

The IN= variables are indications of whether the dataset contributed information to the current merge.  You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.

For left or right join you just need to reference A or B , respectively.  Whether the value was in the other datasets doesn't matter.

The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group.  SQL JOIN will generate all of the combinations.  The merge statement will just match the records in the order that they come.  If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are matched to the rest of the records from the other dataset.

All Replies
Solution
‎07-17-2017 10:55 AM
Contributor
Posts: 42

## IN Dataset option

[ Edited ]

The IN statements allow you to do a match merge.  Using combinations of the IN statements you can replicate LEFT, RIGHT and FULL join logic that is found in SQL.

In your statement above, a or b =1 if a matching record is found.

if a=1 or b=0: Record found in A and Not in B

a=1 and  b=1: Full Join - records found in each table

b=0 or b=0: No record found in B - not sure why you would merge

Also from @Tom:

Here is a paper that talks about SAS merging. That might help.

The IN= variables are indications of whether the dataset contributed information to the current merge.  You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.

For left or right join you just need to reference A or B , respectively.  Whether the value was in the other datasets doesn't matter.

The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group.  SQL JOIN will generate all of the combinations.  The merge statement will just match the records in the order that they come.  If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are matched to the rest of the records from the other dataset.

Contributor
Posts: 70

## Re: IN Dataset option

thanks jcbell.

u said if a=1 or b=0: Record found in A and Not in B

but it do left join

whole observations from leftside matching observations from rightside datset.

thats why iam confusing

is a=1 means whole observations

b=0 means matching observations

plz clarify my doubt

Thanks & Regards

Rawindarreddy

Super Contributor
Posts: 1,636

## IN Dataset option

Hi Rawindarreddy,

if you run the code below you will see the difference.

data have1;

input id age;

cards;

2 20

3 30

6 60

4 40

5 50

;

run;

proc sort data=have1;by id;run;

data have2;

input id sex \$;

cards;

1 f

3 m

6 m

5 f

8 m

7 f

;

proc sort data=have2;by id;run;

run;

data merged1;

merge have1(in=a) have2(in=b);

by id;

if a and b;

/* this is the same as "if a=1 and b=1" */

run;

proc print;title  IDs in both datasets;run;

data merged2;

merge have1(in=a) have2(in=b);

by id;

if a and not b;

/* this is the same as "if a=1 and b=0" */

run;

proc print; title IDs that in have1 but not in have2;run;

data merged3;

merge have1(in=a) have2(in=b);

by id;

if not a and b;

/* this is the same as "if b=1 and a=0" */

run;

proc print; title IDs that in have2 but not in have1;run;

data merged4;

merge have1(in=a) have2(in=b);

/* this is the same as "if a or b"*/;

by id;

run;

proc print; title all IDs in either  have1 or  have2;run;

/*output*/

IDs in both datasets

Obs    id    age    sex

1      3     30     m

2      5     50     f

3      6     60     m

IDs that in have1 but not in have2

Obs    id    age    sex

1      2     20

2      4     40

IDs that in have2 but not in have1

Obs    id    age    sex

1      1     .      f

2      7     .      f

3      8     .      m

all IDs in either  have1 or  have2

Obs    id    age    sex

1      1      .     f

2      2     20

3      3     30     m

4      4     40

5      5     50     f

6      6     60     m

7      7      .     f

8      8      .     m

Contributor
Posts: 70

## IN Dataset option

thank.......u Linlin

i understood that a=1 and b=0 gives obs from left dataset which not exists in right dataset

if we give a=1 or b=0 it gives whole obs from left and matching obs from right side means LEFT JOIN

but b=0 means observations not in a

i am confusing .Can u give me clarity  how it works

Thanks & Regards

Rawindarreddy

Super User
Posts: 8,092

## Re: IN Dataset option

Here is a paper that talks about SAS merging. That might help.

The IN= variables are indications of whether the dataset contributed information to the current merge.  You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.

For left or right join you just need to reference A or B , respectively.  Whether the value was in the other datasets doesn't matter.

The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group.  SQL JOIN will generate all of the combinations.  The merge statement will just match the records in the order that they come.  If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are match to the rest of the records from the other dataset.

🔒 This topic is solved and locked.