DATA Step, Macro, Functions and more

The order of SQL Join Result

Accepted Solution Solved
Reply
Contributor
Posts: 67
Accepted Solution

The order of SQL Join Result

[ Edited ]

Hi, guys

 

Recently, I've met a strange problem. I can't understand that. Hopefully, you guys can explain to me. Here is it:

 

Let's see an basic result:

/* 2 variables and 1 observation in table1 */
data table1;
	id="001";
	num=uniform(1);
run;

/* 2 variables and 5 observation in table2*/
data table2;
   	input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;

/* Inner Join table1 and table2*/
proc sql;
	create table result as
	select a.*,b.score
	from table1 a inner join table2 b
	on a.id=b.id;
quit;
proc print;run;

 

The Data Set Result is:

result.png

 

Notice the order of variable SCORE, is just same as the original order in table2.

 

But when the number of variables in table1 increases, magical thing happened. Let's see:

 

/* 12 variables and 1 observation in table1 */
data table1;
	id="001";
	array num{10} ;
	do i=1 to 10;
		num{i}=uniform(i)+i;
	end;
run;

/* 2 variables and 5 observation in table2 */
data table2;
	input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;

/* Inner Join table1 and table2*/
proc sql;
	create table result as
	select a.*,b.score
	from table1 a inner join table2 b
	on a.id=b.id;
quit;
proc print;run;

 

The order of variable SCORE in Data Set RESULT has changed:

result2.png

 

So, I'm confused. Why did it change?

 

Next, I run a test, to see the relation between the number of the variables in data set table1 and the order of varialbe SCORE in data set RESULT. Here is the code:

 


%macro create_table1(count);
%local i;
data table1;
	id="001";
	%do i=1 %to &count;
		num&i=uniform(&i);
	%end;
run;
%mend create_table1;


%macro test_join;

proc sql;
	drop table differ_all;
quit;

%local i;
%do i=1 %to 20;

	%create_table1(&i)

	proc sql;
		create table result&i as
		select a.*,b.score
		from table1 a inner join table2 b
		on a.id=b.id;
	quit;


	data differ&i;
		merge result(keep=score rename=(score=base)) result&i(keep=score rename=(score=new)) end=last;
		length score_base score_new $ 20 differ_flag $ 5;
		retain score_base score_new;
		score_base=catx(',',score_base,base);
		score_new=catx(',',score_new,new);
		

		if last then do;
			var_count=&i+1;/*plus the variable ID*/
			differ_flag=ifc(score_base=score_new,'true','false');
			output;
		end;
		keep var_count score_base score_new differ_flag;
	run;


	proc append base=differ_all data=differ&i force;
	run;
%end;

proc print data=differ_all;run;

%mend test_join;



data table1;
	id="001";
	num=uniform(1);
run;
data table2;
	input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;
proc sql;
	create table result as
	select a.*,b.score
	from table1 a inner join table2 b
	on a.id=b.id;
quit;


%test_join

And the result is:

result3.png

 

When the number of variables of data set table1 equals 11, then order of SCORE in join table has changed. I test a few times, it's the same.

 

So, anyone can tell me the reason? And I can't find anything about that in SAS Help and Documents.

 

 


Accepted Solutions
Solution
‎06-27-2017 10:40 AM
Super User
Super User
Posts: 7,942

Re: The order of SQL Join Result

TL;DR;

Anwyays from your top bit, SQL is intrinsically an unsorted procedure - i.e. data does not need to be sorted when passed into it (unlike SAS procedures).  When the SQL parser runs it creates sort blocks, merge blocks etc. in what is called the SQL plan.  This is bigger than the code you write.  If you don't explicitly state certian things, such as output order, then it is left to the SQL parser to decide on what to do.  You can see what methods are chose using the _method option and see the whole process using _tree:

proc sql _method _tree; 
...
quit;

View solution in original post


All Replies
PROC Star
Posts: 325

Re: The order of SQL Join Result

for your first question

 

The order of variable SCORE in Data Set RESULT has changed:

 

Order is not preserved in a sql join. So results are pretty as much expected.

Contributor
Posts: 67

Re: The order of SQL Join Result

Thx.Smiley Happy

Solution
‎06-27-2017 10:40 AM
Super User
Super User
Posts: 7,942

Re: The order of SQL Join Result

TL;DR;

Anwyays from your top bit, SQL is intrinsically an unsorted procedure - i.e. data does not need to be sorted when passed into it (unlike SAS procedures).  When the SQL parser runs it creates sort blocks, merge blocks etc. in what is called the SQL plan.  This is bigger than the code you write.  If you don't explicitly state certian things, such as output order, then it is left to the SQL parser to decide on what to do.  You can see what methods are chose using the _method option and see the whole process using _tree:

proc sql _method _tree; 
...
quit;
Contributor
Posts: 67

Re: The order of SQL Join Result

Thanks, that will help!

Super User
Posts: 11,343

Re: The order of SQL Join Result

For added fun examine LEFT JOIN

 

proc sql;
	create table result2 as
	select a.*,b.score
	from table1 a left join table2 b
	on a.id=b.id;
quit;

and compare the results with inner join.

 

 

Basically if you care about the resulting order include an ORDER BY clause to get what you want.

Contributor
Posts: 67

Re: The order of SQL Join Result

Thx, the order by clause is needed.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 146 views
  • 2 likes
  • 4 in conversation