- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, guys
Recently, I've met a strange problem. I can't understand that. Hopefully, you guys can explain to me. Here is it:
Let's see an basic result:
/* 2 variables and 1 observation in table1 */
data table1;
id="001";
num=uniform(1);
run;
/* 2 variables and 5 observation in table2*/
data table2;
input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;
/* Inner Join table1 and table2*/
proc sql;
create table result as
select a.*,b.score
from table1 a inner join table2 b
on a.id=b.id;
quit;
proc print;run;
The Data Set Result is:
Notice the order of variable SCORE, is just same as the original order in table2.
But when the number of variables in table1 increases, magical thing happened. Let's see:
/* 12 variables and 1 observation in table1 */
data table1;
id="001";
array num{10} ;
do i=1 to 10;
num{i}=uniform(i)+i;
end;
run;
/* 2 variables and 5 observation in table2 */
data table2;
input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;
/* Inner Join table1 and table2*/
proc sql;
create table result as
select a.*,b.score
from table1 a inner join table2 b
on a.id=b.id;
quit;
proc print;run;
The order of variable SCORE in Data Set RESULT has changed:
So, I'm confused. Why did it change?
Next, I run a test, to see the relation between the number of the variables in data set table1 and the order of varialbe SCORE in data set RESULT. Here is the code:
%macro create_table1(count);
%local i;
data table1;
id="001";
%do i=1 %to &count;
num&i=uniform(&i);
%end;
run;
%mend create_table1;
%macro test_join;
proc sql;
drop table differ_all;
quit;
%local i;
%do i=1 %to 20;
%create_table1(&i)
proc sql;
create table result&i as
select a.*,b.score
from table1 a inner join table2 b
on a.id=b.id;
quit;
data differ&i;
merge result(keep=score rename=(score=base)) result&i(keep=score rename=(score=new)) end=last;
length score_base score_new $ 20 differ_flag $ 5;
retain score_base score_new;
score_base=catx(',',score_base,base);
score_new=catx(',',score_new,new);
if last then do;
var_count=&i+1;/*plus the variable ID*/
differ_flag=ifc(score_base=score_new,'true','false');
output;
end;
keep var_count score_base score_new differ_flag;
run;
proc append base=differ_all data=differ&i force;
run;
%end;
proc print data=differ_all;run;
%mend test_join;
data table1;
id="001";
num=uniform(1);
run;
data table2;
input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;
proc sql;
create table result as
select a.*,b.score
from table1 a inner join table2 b
on a.id=b.id;
quit;
%test_join
And the result is:
When the number of variables of data set table1 equals 11, then order of SCORE in join table has changed. I test a few times, it's the same.
So, anyone can tell me the reason? And I can't find anything about that in SAS Help and Documents.
- Tags:
- Join Table
- order
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
TL;DR;
Anwyays from your top bit, SQL is intrinsically an unsorted procedure - i.e. data does not need to be sorted when passed into it (unlike SAS procedures). When the SQL parser runs it creates sort blocks, merge blocks etc. in what is called the SQL plan. This is bigger than the code you write. If you don't explicitly state certian things, such as output order, then it is left to the SQL parser to decide on what to do. You can see what methods are chose using the _method option and see the whole process using _tree:
proc sql _method _tree; ... quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
for your first question
The order of variable SCORE in Data Set RESULT has changed:
Order is not preserved in a sql join. So results are pretty as much expected.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thx.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
TL;DR;
Anwyays from your top bit, SQL is intrinsically an unsorted procedure - i.e. data does not need to be sorted when passed into it (unlike SAS procedures). When the SQL parser runs it creates sort blocks, merge blocks etc. in what is called the SQL plan. This is bigger than the code you write. If you don't explicitly state certian things, such as output order, then it is left to the SQL parser to decide on what to do. You can see what methods are chose using the _method option and see the whole process using _tree:
proc sql _method _tree; ... quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, that will help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For added fun examine LEFT JOIN
proc sql; create table result2 as select a.*,b.score from table1 a left join table2 b on a.id=b.id; quit;
and compare the results with inner join.
Basically if you care about the resulting order include an ORDER BY clause to get what you want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thx, the order by clause is needed.