BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Slash
Quartz | Level 8

Hi, guys

 

Recently, I've met a strange problem. I can't understand that. Hopefully, you guys can explain to me. Here is it:

 

Let's see an basic result:

/* 2 variables and 1 observation in table1 */
data table1;
	id="001";
	num=uniform(1);
run;

/* 2 variables and 5 observation in table2*/
data table2;
   	input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;

/* Inner Join table1 and table2*/
proc sql;
	create table result as
	select a.*,b.score
	from table1 a inner join table2 b
	on a.id=b.id;
quit;
proc print;run;

 

The Data Set Result is:

result.png

 

Notice the order of variable SCORE, is just same as the original order in table2.

 

But when the number of variables in table1 increases, magical thing happened. Let's see:

 

/* 12 variables and 1 observation in table1 */
data table1;
	id="001";
	array num{10} ;
	do i=1 to 10;
		num{i}=uniform(i)+i;
	end;
run;

/* 2 variables and 5 observation in table2 */
data table2;
	input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;

/* Inner Join table1 and table2*/
proc sql;
	create table result as
	select a.*,b.score
	from table1 a inner join table2 b
	on a.id=b.id;
quit;
proc print;run;

 

The order of variable SCORE in Data Set RESULT has changed:

result2.png

 

So, I'm confused. Why did it change?

 

Next, I run a test, to see the relation between the number of the variables in data set table1 and the order of varialbe SCORE in data set RESULT. Here is the code:

 


%macro create_table1(count);
%local i;
data table1;
	id="001";
	%do i=1 %to &count;
		num&i=uniform(&i);
	%end;
run;
%mend create_table1;


%macro test_join;

proc sql;
	drop table differ_all;
quit;

%local i;
%do i=1 %to 20;

	%create_table1(&i)

	proc sql;
		create table result&i as
		select a.*,b.score
		from table1 a inner join table2 b
		on a.id=b.id;
	quit;


	data differ&i;
		merge result(keep=score rename=(score=base)) result&i(keep=score rename=(score=new)) end=last;
		length score_base score_new $ 20 differ_flag $ 5;
		retain score_base score_new;
		score_base=catx(',',score_base,base);
		score_new=catx(',',score_new,new);
		

		if last then do;
			var_count=&i+1;/*plus the variable ID*/
			differ_flag=ifc(score_base=score_new,'true','false');
			output;
		end;
		keep var_count score_base score_new differ_flag;
	run;


	proc append base=differ_all data=differ&i force;
	run;
%end;

proc print data=differ_all;run;

%mend test_join;



data table1;
	id="001";
	num=uniform(1);
run;
data table2;
	input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;
proc sql;
	create table result as
	select a.*,b.score
	from table1 a inner join table2 b
	on a.id=b.id;
quit;


%test_join

And the result is:

result3.png

 

When the number of variables of data set table1 equals 11, then order of SCORE in join table has changed. I test a few times, it's the same.

 

So, anyone can tell me the reason? And I can't find anything about that in SAS Help and Documents.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

TL;DR;

Anwyays from your top bit, SQL is intrinsically an unsorted procedure - i.e. data does not need to be sorted when passed into it (unlike SAS procedures).  When the SQL parser runs it creates sort blocks, merge blocks etc. in what is called the SQL plan.  This is bigger than the code you write.  If you don't explicitly state certian things, such as output order, then it is left to the SQL parser to decide on what to do.  You can see what methods are chose using the _method option and see the whole process using _tree:

proc sql _method _tree; 
...
quit;

View solution in original post

6 REPLIES 6
kiranv_
Rhodochrosite | Level 12

for your first question

 

The order of variable SCORE in Data Set RESULT has changed:

 

Order is not preserved in a sql join. So results are pretty as much expected.

Slash
Quartz | Level 8

Thx.Smiley Happy

RW9
Diamond | Level 26 RW9
Diamond | Level 26

TL;DR;

Anwyays from your top bit, SQL is intrinsically an unsorted procedure - i.e. data does not need to be sorted when passed into it (unlike SAS procedures).  When the SQL parser runs it creates sort blocks, merge blocks etc. in what is called the SQL plan.  This is bigger than the code you write.  If you don't explicitly state certian things, such as output order, then it is left to the SQL parser to decide on what to do.  You can see what methods are chose using the _method option and see the whole process using _tree:

proc sql _method _tree; 
...
quit;
Slash
Quartz | Level 8

Thanks, that will help!

ballardw
Super User

For added fun examine LEFT JOIN

 

proc sql;
	create table result2 as
	select a.*,b.score
	from table1 a left join table2 b
	on a.id=b.id;
quit;

and compare the results with inner join.

 

 

Basically if you care about the resulting order include an ORDER BY clause to get what you want.

Slash
Quartz | Level 8

Thx, the order by clause is needed.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 6112 views
  • 2 likes
  • 4 in conversation