Hi, guys
Recently, I've met a strange problem. I can't understand that. Hopefully, you guys can explain to me. Here is it:
Let's see an basic result:
/* 2 variables and 1 observation in table1 */
data table1;
id="001";
num=uniform(1);
run;
/* 2 variables and 5 observation in table2*/
data table2;
input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;
/* Inner Join table1 and table2*/
proc sql;
create table result as
select a.*,b.score
from table1 a inner join table2 b
on a.id=b.id;
quit;
proc print;run;
The Data Set Result is:
Notice the order of variable SCORE, is just same as the original order in table2.
But when the number of variables in table1 increases, magical thing happened. Let's see:
/* 12 variables and 1 observation in table1 */
data table1;
id="001";
array num{10} ;
do i=1 to 10;
num{i}=uniform(i)+i;
end;
run;
/* 2 variables and 5 observation in table2 */
data table2;
input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;
/* Inner Join table1 and table2*/
proc sql;
create table result as
select a.*,b.score
from table1 a inner join table2 b
on a.id=b.id;
quit;
proc print;run;
The order of variable SCORE in Data Set RESULT has changed:
So, I'm confused. Why did it change?
Next, I run a test, to see the relation between the number of the variables in data set table1 and the order of varialbe SCORE in data set RESULT. Here is the code:
%macro create_table1(count);
%local i;
data table1;
id="001";
%do i=1 %to &count;
num&i=uniform(&i);
%end;
run;
%mend create_table1;
%macro test_join;
proc sql;
drop table differ_all;
quit;
%local i;
%do i=1 %to 20;
%create_table1(&i)
proc sql;
create table result&i as
select a.*,b.score
from table1 a inner join table2 b
on a.id=b.id;
quit;
data differ&i;
merge result(keep=score rename=(score=base)) result&i(keep=score rename=(score=new)) end=last;
length score_base score_new $ 20 differ_flag $ 5;
retain score_base score_new;
score_base=catx(',',score_base,base);
score_new=catx(',',score_new,new);
if last then do;
var_count=&i+1;/*plus the variable ID*/
differ_flag=ifc(score_base=score_new,'true','false');
output;
end;
keep var_count score_base score_new differ_flag;
run;
proc append base=differ_all data=differ&i force;
run;
%end;
proc print data=differ_all;run;
%mend test_join;
data table1;
id="001";
num=uniform(1);
run;
data table2;
input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;
proc sql;
create table result as
select a.*,b.score
from table1 a inner join table2 b
on a.id=b.id;
quit;
%test_join
And the result is:
When the number of variables of data set table1 equals 11, then order of SCORE in join table has changed. I test a few times, it's the same.
So, anyone can tell me the reason? And I can't find anything about that in SAS Help and Documents.
See SQL Joins -- The Long and The Short of It, available at https://support.sas.com/techsup/technote/ts553.html.
If you add _method to the PROC SQL statement, you'll get notes in the log show the SQL execution methods chosen.
The notes in the log for the first 9 iterations of your test_join macro look like this:
NOTE: SQL execution methods chosen are: sqxcrta sqxjhsh sqxsrc( WORK.TABLE2(alias = B) ) sqxsrc( WORK.TABLE1(alias = A) ) NOTE: Table WORK.RESULT1 created, with 5 rows and 3 columns.
The notes in the log for the 10th through 20th iterations of your test_join macro look like this:
NOTE: SQL execution methods chosen are: sqxcrta sqxjhsh sqxsrc( WORK.TABLE1(alias = A) ) sqxsrc( WORK.TABLE2(alias = B) ) NOTE: Table WORK.RESULT10 created, with 5 rows and 12 columns.
Thanks! That is very helpful.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.