data seta; input apple banana; datalines; 1 1 2 2 3 3 run; proc sort data=seta; by apple; run; data setb; input apple orange lemon pear; datalines; 11 11 11 11 22 22 22 22 1 2 3 4 run; proc sort data=setb; by apple; run; data outina; merge seta (in=a) setb; by apple; run; data outnoth; merge seta setb; by apple; run; proc print data=seta; run; proc print data=setb; run; proc print data=outina; run; proc print data=outnoth; run;
I am getting exact same result for with or without (in=a), why?
I was believing (in=a) is like a left join... where only observations with apple =1,2,3 would be included...
@HeatherNewton Good question.
The IN= Data Set Option "Creates a Boolean variable that indicates whether the data set contributed data to the current observation." (Straight from the documentation). So in your example, the in=a data set option creates a 0/1 variable (automatically dropped). You can assign the value to another variable to see it in your output like below.
data outina;
merge seta (in=a) setb;
by apple;
ina = a;
run;
Result:
apple banana orange lemon pear ina 1 1 2 3 4 1 2 2 . . . 1 3 3 . . . 1 11 . 11 11 11 0 22 . 22 22 22 0
Therefore, to replicate an SQL Left Join, you should use a Subsetting If Statement like below to output only the observations where the seta data set contributes.
data outina;
merge seta (in=a) setb;
by apple;
if a;
run;
Result:
apple banana orange lemon pear 1 1 2 3 4 2 2 . . . 3 3 . . .
@HeatherNewton , did this answer your question?
yes thx
Good. Please remember to close the thread 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.