Hi, guys
Recently, I've met a strange problem. I can't understand that. Hopefully, you guys can explain to me. Here is it:
Let's see an basic result:
/* 2 variables and 1 observation in table1 */
data table1;
id="001";
num=uniform(1);
run;
/* 2 variables and 5 observation in table2*/
data table2;
input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;
/* Inner Join table1 and table2*/
proc sql;
create table result as
select a.*,b.score
from table1 a inner join table2 b
on a.id=b.id;
quit;
proc print;run;
The Data Set Result is:
Notice the order of variable SCORE, is just same as the original order in table2.
But when the number of variables in table1 increases, magical thing happened. Let's see:
/* 12 variables and 1 observation in table1 */
data table1;
id="001";
array num{10} ;
do i=1 to 10;
num{i}=uniform(i)+i;
end;
run;
/* 2 variables and 5 observation in table2 */
data table2;
input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;
/* Inner Join table1 and table2*/
proc sql;
create table result as
select a.*,b.score
from table1 a inner join table2 b
on a.id=b.id;
quit;
proc print;run;
The order of variable SCORE in Data Set RESULT has changed:
So, I'm confused. Why did it change?
Next, I run a test, to see the relation between the number of the variables in data set table1 and the order of varialbe SCORE in data set RESULT. Here is the code:
%macro create_table1(count);
%local i;
data table1;
id="001";
%do i=1 %to &count;
num&i=uniform(&i);
%end;
run;
%mend create_table1;
%macro test_join;
proc sql;
drop table differ_all;
quit;
%local i;
%do i=1 %to 20;
%create_table1(&i)
proc sql;
create table result&i as
select a.*,b.score
from table1 a inner join table2 b
on a.id=b.id;
quit;
data differ&i;
merge result(keep=score rename=(score=base)) result&i(keep=score rename=(score=new)) end=last;
length score_base score_new $ 20 differ_flag $ 5;
retain score_base score_new;
score_base=catx(',',score_base,base);
score_new=catx(',',score_new,new);
if last then do;
var_count=&i+1;/*plus the variable ID*/
differ_flag=ifc(score_base=score_new,'true','false');
output;
end;
keep var_count score_base score_new differ_flag;
run;
proc append base=differ_all data=differ&i force;
run;
%end;
proc print data=differ_all;run;
%mend test_join;
data table1;
id="001";
num=uniform(1);
run;
data table2;
input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;
proc sql;
create table result as
select a.*,b.score
from table1 a inner join table2 b
on a.id=b.id;
quit;
%test_join
And the result is:
When the number of variables of data set table1 equals 11, then order of SCORE in join table has changed. I test a few times, it's the same.
So, anyone can tell me the reason? And I can't find anything about that in SAS Help and Documents.
See SQL Joins -- The Long and The Short of It, available at https://support.sas.com/techsup/technote/ts553.html.
If you add _method to the PROC SQL statement, you'll get notes in the log show the SQL execution methods chosen.
The notes in the log for the first 9 iterations of your test_join macro look like this:
NOTE: SQL execution methods chosen are: sqxcrta sqxjhsh sqxsrc( WORK.TABLE2(alias = B) ) sqxsrc( WORK.TABLE1(alias = A) ) NOTE: Table WORK.RESULT1 created, with 5 rows and 3 columns.
The notes in the log for the 10th through 20th iterations of your test_join macro look like this:
NOTE: SQL execution methods chosen are: sqxcrta sqxjhsh sqxsrc( WORK.TABLE1(alias = A) ) sqxsrc( WORK.TABLE2(alias = B) ) NOTE: Table WORK.RESULT10 created, with 5 rows and 12 columns.
Thanks! That is very helpful.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.