Solved: IN Dataset option

rawindar · Posted 03-24-2012 04:01 AM

Hi everybody

I am always confuse with IN option. We use IN option in MERGE. How idoes it works? What's going internally if we mention IN?

data x;
 merge dsn1(in=a) dsn2(in=b);
 by var;
run;

what happens if we give if a=1 or b=0

a=1 and b=1

b=0 or b=0

a=0 and b=0

.................................................................

Can anybody help me to understand this concept?

Thanks in advance

Regards

Rawindarreddy

jcbell · Posted 03-24-2012 07:36 AM

The IN statements allow you to do a match merge. Using combinations of the IN statements you can replicate LEFT, RIGHT and FULL join logic that is found in SQL.

In your statement above, a or b =1 if a matching record is found.

if a=1 or b=0: Record found in A and Not in B

a=1 and b=1: Full Join - records found in each table

b=0 or b=0: No record found in B - not sure why you would merge

a=0 and b=0: Records not found in either

Also from @Tom:

Here is a paper that talks about SAS merging. That might help.

http://www2.sas.com/proceedings/sugi22/ADVTUTOR/PAPER39.PDF

The IN= variables are indications of whether the dataset contributed information to the current merge. You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.

For left or right join you just need to reference A or B , respectively. Whether the value was in the other datasets doesn't matter.

The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group. SQL JOIN will generate all of the combinations. The merge statement will just match the records in the order that they come. If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are matched to the rest of the records from the other dataset.

View solution in original post

jcbell · Posted 03-24-2012 07:36 AM

The IN statements allow you to do a match merge. Using combinations of the IN statements you can replicate LEFT, RIGHT and FULL join logic that is found in SQL.

In your statement above, a or b =1 if a matching record is found.

if a=1 or b=0: Record found in A and Not in B

a=1 and b=1: Full Join - records found in each table

b=0 or b=0: No record found in B - not sure why you would merge

a=0 and b=0: Records not found in either

Also from @Tom:

Here is a paper that talks about SAS merging. That might help.

http://www2.sas.com/proceedings/sugi22/ADVTUTOR/PAPER39.PDF

The IN= variables are indications of whether the dataset contributed information to the current merge. You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.

For left or right join you just need to reference A or B , respectively. Whether the value was in the other datasets doesn't matter.

The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group. SQL JOIN will generate all of the combinations. The merge statement will just match the records in the order that they come. If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are matched to the rest of the records from the other dataset.

rawindar · Posted 03-24-2012 11:35 PM

thanks jcbell.

u said if a=1 or b=0: Record found in A and Not in B

but it do left join

whole observations from leftside matching observations from rightside datset.

thats why iam confusing

is a=1 means whole observations

b=0 means matching observations

plz clarify my doubt

Thanks & Regards

Rawindarreddy

Linlin · Posted 03-24-2012 08:12 AM

Hi Rawindarreddy,

if you run the code below you will see the difference.

data have1;

input id age;

cards;

2 20

3 30

6 60

4 40

5 50

;

run;

proc sort data=have1;by id;run;

data have2;

input id sex $;

cards;

1 f

3 m

6 m

5 f

8 m

7 f

;

proc sort data=have2;by id;run;

run;

data merged1;

merge have1(in=a) have2(in=b);

by id;

if a and b;

/* this is the same as "if a=1 and b=1" */

run;

proc print;title IDs in both datasets;run;

data merged2;

merge have1(in=a) have2(in=b);

by id;

if a and not b;

/* this is the same as "if a=1 and b=0" */

run;

proc print; title IDs that in have1 but not in have2;run;

data merged3;

merge have1(in=a) have2(in=b);

by id;

if not a and b;

/* this is the same as "if b=1 and a=0" */

run;

proc print; title IDs that in have2 but not in have1;run;

data merged4;

merge have1(in=a) have2(in=b);

/* this is the same as "if a or b"*/;

by id;

run;

proc print; title all IDs in either have1 or have2;run;

/*output*/

IDs in both datasets

Obs id age sex

1 3 30 m

2 5 50 f

3 6 60 m

IDs that in have1 but not in have2

Obs id age sex

1 2 20

2 4 40

IDs that in have2 but not in have1

Obs id age sex

1 1 . f

2 7 . f

3 8 . m

all IDs in either have1 or have2

Obs id age sex

1 1 . f

2 2 20

3 3 30 m

4 4 40

5 5 50 f

6 6 60 m

7 7 . f

8 8 . m

rawindar · Posted 03-24-2012 11:56 PM

thank.......u Linlin

i understood that a=1 and b=0 gives obs from left dataset which not exists in right dataset

if we give a=1 or b=0 it gives whole obs from left and matching obs from right side means LEFT JOIN

but b=0 means observations not in a

i am confusing .Can u give me clarity how it works

Thanks & Regards

Rawindarreddy

Tom · Posted 03-25-2012 09:42 AM

Here is a paper that talks about SAS merging. That might help.

http://www2.sas.com/proceedings/sugi22/ADVTUTOR/PAPER39.PDF

The IN= variables are indications of whether the dataset contributed information to the current merge. You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.

For left or right join you just need to reference A or B , respectively. Whether the value was in the other datasets doesn't matter.

The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group. SQL JOIN will generate all of the combinations. The merge statement will just match the records in the order that they come. If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are match to the rest of the records from the other dataset.

IN Dataset option

IN Dataset option

IN Dataset option

Re: IN Dataset option

IN Dataset option

IN Dataset option

Re: IN Dataset option

Classroom Training Available!