DATA Step, Macro, Functions and more

IN Dataset option

Accepted Solution Solved
Reply
Contributor
Posts: 70
Accepted Solution

IN Dataset option

[ Edited ]

Hi everybody

 

I am always confuse with IN option. We use IN option in MERGE. How idoes it works?  What's going internally if we mention IN?

 

data x;
 merge dsn1(in=a) dsn2(in=b);
 by var;
run;

 

what happens if we give if a=1 or b=0

                                    a=1 and  b=1

                                    b=0 or b=0

                                     a=0 and b=0

.................................................................

 

 

Can anybody help me to understand this concept?

 

Thanks in advance

Regards

Rawindarreddy


Accepted Solutions
Solution
‎07-17-2017 10:55 AM
Contributor
Posts: 42

IN Dataset option

[ Edited ]

The IN statements allow you to do a match merge.  Using combinations of the IN statements you can replicate LEFT, RIGHT and FULL join logic that is found in SQL.

 

In your statement above, a or b =1 if a matching record is found. 

 

    if a=1 or b=0: Record found in A and Not in B

       a=1 and  b=1: Full Join - records found in each table

       b=0 or b=0: No record found in B - not sure why you would merge

       a=0 and b=0: Records not found in either

 

Also from @Tom:

Here is a paper that talks about SAS merging. That might help.

http://www2.sas.com/proceedings/sugi22/ADVTUTOR/PAPER39.PDF

 

The IN= variables are indications of whether the dataset contributed information to the current merge.  You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.

 

For left or right join you just need to reference A or B , respectively.  Whether the value was in the other datasets doesn't matter.

 

The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group.  SQL JOIN will generate all of the combinations.  The merge statement will just match the records in the order that they come.  If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are matched to the rest of the records from the other dataset.



 

View solution in original post


All Replies
Solution
‎07-17-2017 10:55 AM
Contributor
Posts: 42

IN Dataset option

[ Edited ]

The IN statements allow you to do a match merge.  Using combinations of the IN statements you can replicate LEFT, RIGHT and FULL join logic that is found in SQL.

 

In your statement above, a or b =1 if a matching record is found. 

 

    if a=1 or b=0: Record found in A and Not in B

       a=1 and  b=1: Full Join - records found in each table

       b=0 or b=0: No record found in B - not sure why you would merge

       a=0 and b=0: Records not found in either

 

Also from @Tom:

Here is a paper that talks about SAS merging. That might help.

http://www2.sas.com/proceedings/sugi22/ADVTUTOR/PAPER39.PDF

 

The IN= variables are indications of whether the dataset contributed information to the current merge.  You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.

 

For left or right join you just need to reference A or B , respectively.  Whether the value was in the other datasets doesn't matter.

 

The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group.  SQL JOIN will generate all of the combinations.  The merge statement will just match the records in the order that they come.  If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are matched to the rest of the records from the other dataset.



 

Contributor
Posts: 70

Re: IN Dataset option

thanks jcbell.

u said if a=1 or b=0: Record found in A and Not in B

but it do left join

whole observations from leftside matching observations from rightside datset.

thats why iam confusing

is a=1 means whole observations

   b=0 means matching observations

plz clarify my doubt

Thanks & Regards

Rawindarreddy

Super Contributor
Posts: 1,636

IN Dataset option

Hi Rawindarreddy,

if you run the code below you will see the difference.

data have1;

input id age;

cards;

2 20

3 30

6 60

4 40

5 50

;

run;

proc sort data=have1;by id;run;

data have2;

input id sex $;

cards;

1 f

3 m

6 m

5 f

8 m

7 f

;

proc sort data=have2;by id;run;

run;

data merged1;

   merge have1(in=a) have2(in=b);

   by id;

   if a and b;

   /* this is the same as "if a=1 and b=1" */

  run;

  proc print;title  IDs in both datasets;run;

data merged2;

    merge have1(in=a) have2(in=b);

   by id;

   if a and not b;

   /* this is the same as "if a=1 and b=0" */

   run;

   proc print; title IDs that in have1 but not in have2;run;

  

data merged3;

    merge have1(in=a) have2(in=b);

   by id;

   if not a and b;

   /* this is the same as "if b=1 and a=0" */

   run;

   proc print; title IDs that in have2 but not in have1;run;

  data merged4;

  merge have1(in=a) have2(in=b);

  /* this is the same as "if a or b"*/;

  by id;

  run;

  proc print; title all IDs in either  have1 or  have2;run;

/*output*/

                               IDs in both datasets                            

                              Obs    id    age    sex

                               1      3     30     m

                               2      5     50     f

                               3      6     60     m

                        IDs that in have1 but not in have2                      

                              Obs    id    age    sex

                               1      2     20

                               2      4     40

                        IDs that in have2 but not in have1                     

                              Obs    id    age    sex

                               1      1     .      f

                               2      7     .      f

                               3      8     .      m

                        all IDs in either  have1 or  have2                      

                              Obs    id    age    sex

                               1      1      .     f

                               2      2     20

                               3      3     30     m

                               4      4     40

                               5      5     50     f

                               6      6     60     m

                               7      7      .     f

                               8      8      .     m

Contributor
Posts: 70

IN Dataset option

thank.......u Linlin

i understood that a=1 and b=0 gives obs from left dataset which not exists in right dataset

              if we give a=1 or b=0 it gives whole obs from left and matching obs from right side means LEFT JOIN

              but b=0 means observations not in a

i am confusing .Can u give me clarity  how it works

Thanks & Regards

Rawindarreddy

Super User
Super User
Posts: 7,083

Re: IN Dataset option

Here is a paper that talks about SAS merging. That might help.

http://www2.sas.com/proceedings/sugi22/ADVTUTOR/PAPER39.PDF

The IN= variables are indications of whether the dataset contributed information to the current merge.  You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.

For left or right join you just need to reference A or B , respectively.  Whether the value was in the other datasets doesn't matter.

The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group.  SQL JOIN will generate all of the combinations.  The merge statement will just match the records in the order that they come.  If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are match to the rest of the records from the other dataset.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 9880 views
  • 0 likes
  • 4 in conversation