BookmarkSubscribeRSS Feed
Slash
Quartz | Level 8

Hi, guys

 

Recently, I've met a strange problem. I can't understand that. Hopefully, you guys can explain to me. Here is it:

 

Let's see an basic result:

 

/* 2 variables and 1 observation in table1 */
data table1;
	id="001";
	num=uniform(1);
run;

/* 2 variables and 5 observation in table2*/
data table2;
   	input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;

/* Inner Join table1 and table2*/
proc sql;
	create table result as
	select a.*,b.score
	from table1 a inner join table2 b
	on a.id=b.id;
quit;
proc print;run;

 

 

The Data Set Result is:

result.png

 

Notice the order of variable SCORE, is just same as the original order in table2.

 

But when the number of variables in table1 increases, magical thing happened. Let's see:

 

/* 12 variables and 1 observation in table1 */
data table1;
	id="001";
	array num{10} ;
	do i=1 to 10;
		num{i}=uniform(i)+i;
	end;
run;

/* 2 variables and 5 observation in table2 */
data table2;
	input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;

/* Inner Join table1 and table2*/
proc sql;
	create table result as
	select a.*,b.score
	from table1 a inner join table2 b
	on a.id=b.id;
quit;
proc print;run;

 

 

The order of variable SCORE in Data Set RESULT has changed:

result2.png

 

So, I'm confused. Why did it change?

 

Next, I run a test, to see the relation between the number of the variables in data set table1 and the order of varialbe SCORE in data set RESULT. Here is the code:

 


%macro create_table1(count);
%local i;
data table1;
	id="001";
	%do i=1 %to &count;
		num&i=uniform(&i);
	%end;
run;
%mend create_table1;


%macro test_join;

proc sql;
	drop table differ_all;
quit;

%local i;
%do i=1 %to 20;

	%create_table1(&i)

	proc sql;
		create table result&i as
		select a.*,b.score
		from table1 a inner join table2 b
		on a.id=b.id;
	quit;


	data differ&i;
		merge result(keep=score rename=(score=base)) result&i(keep=score rename=(score=new)) end=last;
		length score_base score_new $ 20 differ_flag $ 5;
		retain score_base score_new;
		score_base=catx(',',score_base,base);
		score_new=catx(',',score_new,new);
		

		if last then do;
			var_count=&i+1;/*plus the variable ID*/
			differ_flag=ifc(score_base=score_new,'true','false');
			output;
		end;
		keep var_count score_base score_new differ_flag;
	run;


	proc append base=differ_all data=differ&i force;
	run;
%end;

proc print data=differ_all;run;

%mend test_join;



data table1;
	id="001";
	num=uniform(1);
run;
data table2;
	input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;
proc sql;
	create table result as
	select a.*,b.score
	from table1 a inner join table2 b
	on a.id=b.id;
quit;


%test_join

And the result is:

result3.png

 

When the number of variables of data set table1 equals 11, then order of SCORE in join table has changed. I test a few times, it's the same.

 

So, anyone can tell me the reason? And I can't find anything about that in SAS Help and Documents.

 

 

2 REPLIES 2
SuzanneDorinski
Lapis Lazuli | Level 10

See SQL Joins -- The Long and The Short of It, available at https://support.sas.com/techsup/technote/ts553.html.  

 

If you add _method to the PROC SQL statement, you'll get notes in the log show the SQL execution methods chosen.

 

The notes in the log for the first 9 iterations of your test_join macro look like this:

 

 NOTE: SQL execution methods chosen are:
 
       sqxcrta
           sqxjhsh
               sqxsrc( WORK.TABLE2(alias = B) )
               sqxsrc( WORK.TABLE1(alias = A) )
 NOTE: Table WORK.RESULT1 created, with 5 rows and 3 columns.

The notes in the log for the 10th through 20th iterations of your test_join macro look like this:

 

 NOTE: SQL execution methods chosen are:
 
       sqxcrta
           sqxjhsh
               sqxsrc( WORK.TABLE1(alias = A) )
               sqxsrc( WORK.TABLE2(alias = B) )
 NOTE: Table WORK.RESULT10 created, with 5 rows and 12 columns.
Slash
Quartz | Level 8

Thanks! That is very helpful.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1055 views
  • 1 like
  • 2 in conversation