data seta; input apple banana; datalines; 1 1 2 2 3 3 run; proc sort data=seta; by apple; run; data setb; input apple orange lemon pear; datalines; 11 11 11 11 22 22 22 22 1 2 3 4 run; proc sort data=setb; by apple; run; data outina; merge seta (in=a) setb; by apple; run; data outnoth; merge seta setb; by apple; run; proc print data=seta; run; proc print data=setb; run; proc print data=outina; run; proc print data=outnoth; run;
I am getting exact same result for with or without (in=a), why?
I was believing (in=a) is like a left join... where only observations with apple =1,2,3 would be included...
@HeatherNewton Good question.
The IN= Data Set Option "Creates a Boolean variable that indicates whether the data set contributed data to the current observation." (Straight from the documentation). So in your example, the in=a data set option creates a 0/1 variable (automatically dropped). You can assign the value to another variable to see it in your output like below.
data outina;
merge seta (in=a) setb;
by apple;
ina = a;
run;
Result:
apple banana orange lemon pear ina 1 1 2 3 4 1 2 2 . . . 1 3 3 . . . 1 11 . 11 11 11 0 22 . 22 22 22 0
Therefore, to replicate an SQL Left Join, you should use a Subsetting If Statement like below to output only the observations where the seta data set contributes.
data outina;
merge seta (in=a) setb;
by apple;
if a;
run;
Result:
apple banana orange lemon pear 1 1 2 3 4 2 2 . . . 3 3 . . .
@HeatherNewton , did this answer your question?
yes thx
Good. Please remember to close the thread 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.