BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rawindar
Calcite | Level 5

Hi everybody

 

I am always confuse with IN option. We use IN option in MERGE. How idoes it works?  What's going internally if we mention IN?

 

data x;
 merge dsn1(in=a) dsn2(in=b);
 by var;
run;

 

what happens if we give if a=1 or b=0

                                    a=1 and  b=1

                                    b=0 or b=0

                                     a=0 and b=0

.................................................................

 

 

Can anybody help me to understand this concept?

 

Thanks in advance

Regards

Rawindarreddy

1 ACCEPTED SOLUTION

Accepted Solutions
jcbell
Obsidian | Level 7

The IN statements allow you to do a match merge.  Using combinations of the IN statements you can replicate LEFT, RIGHT and FULL join logic that is found in SQL.

 

In your statement above, a or b =1 if a matching record is found. 

 

    if a=1 or b=0: Record found in A and Not in B

       a=1 and  b=1: Full Join - records found in each table

       b=0 or b=0: No record found in B - not sure why you would merge

       a=0 and b=0: Records not found in either

 

Also from @Tom:

Here is a paper that talks about SAS merging. That might help.

http://www2.sas.com/proceedings/sugi22/ADVTUTOR/PAPER39.PDF

 

The IN= variables are indications of whether the dataset contributed information to the current merge.  You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.

 

For left or right join you just need to reference A or B , respectively.  Whether the value was in the other datasets doesn't matter.

 

The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group.  SQL JOIN will generate all of the combinations.  The merge statement will just match the records in the order that they come.  If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are matched to the rest of the records from the other dataset.



 

View solution in original post

5 REPLIES 5
jcbell
Obsidian | Level 7

The IN statements allow you to do a match merge.  Using combinations of the IN statements you can replicate LEFT, RIGHT and FULL join logic that is found in SQL.

 

In your statement above, a or b =1 if a matching record is found. 

 

    if a=1 or b=0: Record found in A and Not in B

       a=1 and  b=1: Full Join - records found in each table

       b=0 or b=0: No record found in B - not sure why you would merge

       a=0 and b=0: Records not found in either

 

Also from @Tom:

Here is a paper that talks about SAS merging. That might help.

http://www2.sas.com/proceedings/sugi22/ADVTUTOR/PAPER39.PDF

 

The IN= variables are indications of whether the dataset contributed information to the current merge.  You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.

 

For left or right join you just need to reference A or B , respectively.  Whether the value was in the other datasets doesn't matter.

 

The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group.  SQL JOIN will generate all of the combinations.  The merge statement will just match the records in the order that they come.  If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are matched to the rest of the records from the other dataset.



 

rawindar
Calcite | Level 5

thanks jcbell.

u said if a=1 or b=0: Record found in A and Not in B

but it do left join

whole observations from leftside matching observations from rightside datset.

thats why iam confusing

is a=1 means whole observations

   b=0 means matching observations

plz clarify my doubt

Thanks & Regards

Rawindarreddy

Linlin
Lapis Lazuli | Level 10

Hi Rawindarreddy,

if you run the code below you will see the difference.

data have1;

input id age;

cards;

2 20

3 30

6 60

4 40

5 50

;

run;

proc sort data=have1;by id;run;

data have2;

input id sex $;

cards;

1 f

3 m

6 m

5 f

8 m

7 f

;

proc sort data=have2;by id;run;

run;

data merged1;

   merge have1(in=a) have2(in=b);

   by id;

   if a and b;

   /* this is the same as "if a=1 and b=1" */

  run;

  proc print;title  IDs in both datasets;run;

data merged2;

    merge have1(in=a) have2(in=b);

   by id;

   if a and not b;

   /* this is the same as "if a=1 and b=0" */

   run;

   proc print; title IDs that in have1 but not in have2;run;

  

data merged3;

    merge have1(in=a) have2(in=b);

   by id;

   if not a and b;

   /* this is the same as "if b=1 and a=0" */

   run;

   proc print; title IDs that in have2 but not in have1;run;

  data merged4;

  merge have1(in=a) have2(in=b);

  /* this is the same as "if a or b"*/;

  by id;

  run;

  proc print; title all IDs in either  have1 or  have2;run;

/*output*/

                               IDs in both datasets                            

                              Obs    id    age    sex

                               1      3     30     m

                               2      5     50     f

                               3      6     60     m

                        IDs that in have1 but not in have2                      

                              Obs    id    age    sex

                               1      2     20

                               2      4     40

                        IDs that in have2 but not in have1                     

                              Obs    id    age    sex

                               1      1     .      f

                               2      7     .      f

                               3      8     .      m

                        all IDs in either  have1 or  have2                      

                              Obs    id    age    sex

                               1      1      .     f

                               2      2     20

                               3      3     30     m

                               4      4     40

                               5      5     50     f

                               6      6     60     m

                               7      7      .     f

                               8      8      .     m

rawindar
Calcite | Level 5

thank.......u Linlin

i understood that a=1 and b=0 gives obs from left dataset which not exists in right dataset

              if we give a=1 or b=0 it gives whole obs from left and matching obs from right side means LEFT JOIN

              but b=0 means observations not in a

i am confusing .Can u give me clarity  how it works

Thanks & Regards

Rawindarreddy

Tom
Super User Tom
Super User

Here is a paper that talks about SAS merging. That might help.

http://www2.sas.com/proceedings/sugi22/ADVTUTOR/PAPER39.PDF

The IN= variables are indications of whether the dataset contributed information to the current merge.  You can think of it as an indication of whether there are any observations in that source dataset for this BY variable grouping.

For left or right join you just need to reference A or B , respectively.  Whether the value was in the other datasets doesn't matter.

The big difference between merge and SQL join is when both datasets have multiple observations in the same BY variable group.  SQL JOIN will generate all of the combinations.  The merge statement will just match the records in the order that they come.  If one dataset runs out of records to contribute for that BY variable grouping then the values from the last observation read stay unmodified and are match to the rest of the records from the other dataset.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 30961 views
  • 1 like
  • 4 in conversation