Re: left join and inner join in proc sql

GN0001 · Posted 02-27-2022 05:53 PM

Hello all,

I have 64 rows and I need to find them in a bigger data set. Here, my inquiry is about the logic that I have applied.

I did a left outer join with missing on the second dataset and I was able to find those rows that exist in the first/left dataset, but it doesn't exist in the second/right dataset. The result came 4 rows.

Then I did inner join between first and second dataset with Distinct values and came 54 rows.

If I do without distinct, the result is 169 rows.

54+4 equal to 58 and deduct it from 64, we have 6 rows left, which I don't know what is going about them.

Can we say since we put distinct, we removed duplicates, in fact those 6 rows are among those 54 rows, they are just taken away by Distinct because they are duplicates.

Here, I am looking for logic. I need to know the way I elaborate it is correct. I need to present it to my boss.

Regards,

Blue&Blue

Blue Blue

Tom · Posted 02-27-2022 06:48 PM

How many of the 64 are distinct?

What do you mean by distinct? Normally in SQL that means unique observations. Two observations could have the same value of the key variable but still be unique because they have different values of other variables.

GN0001 · Posted 02-27-2022 09:48 PM

Hello,
Thanks for the reply,
Then if the join values are same but they have different values in other variables, that means the observations are unique. 54 are unique out of 64.
Regards,
Blue & Blue

Blue Blue

Tom · Posted 02-27-2022 10:34 PM

If you want control use data step instead of SQL.

For example you might MERGE the two dataset by the key variable and use the IN= dataset option to be able to track which dataset contributed to this observations.

So if you have two datasets named ONE and TWO and the key variable is named ID then the code might look like this:

data want;
  merge one(in=in1) two(in=in2);
  by id;
  if in1 and in2 then source='BOTH';
  else if in1 then source='ONE ';
  else source='TWO';
  unique=first.id;
run;

proc summary data=want;
  class source;
  var unique;
  output out=summary N=COUNT sum(unique)=UNIQUE ;
run;

proc print data=summary;
   var source count unique;
run;

GN0001 · Posted 02-27-2022 11:35 PM

Hello Tom,

Thanks for the code.

Can you explain about class and source?

proc summary data=want;
  class source;
  var unique;
  output out=summary N=COUNT sum(unique)=UNIQUE ;
run;

Regards,

blue&blue

Blue Blue

Tom · Posted 02-27-2022 11:38 PM

SOURCE is the variable name.

CLASS is a common SAS statement that is supported in various ways by a lot of SAS procedure. It basically is for telling the procedure to group the data by the levels of that variable.

For more information read the documentation on PROC SUMMARY (also known as PROC MEANS).

left join and inner join in proc sql