BookmarkSubscribeRSS Feed
Slash
Quartz | Level 8

Hi, guys

 

Recently, I've met a strange problem. I can't understand that. Hopefully, you guys can explain to me. Here is it:

 

Let's see an basic result:

 

/* 2 variables and 1 observation in table1 */
data table1;
	id="001";
	num=uniform(1);
run;

/* 2 variables and 5 observation in table2*/
data table2;
   	input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;

/* Inner Join table1 and table2*/
proc sql;
	create table result as
	select a.*,b.score
	from table1 a inner join table2 b
	on a.id=b.id;
quit;
proc print;run;

 

 

The Data Set Result is:

result.png

 

Notice the order of variable SCORE, is just same as the original order in table2.

 

But when the number of variables in table1 increases, magical thing happened. Let's see:

 

/* 12 variables and 1 observation in table1 */
data table1;
	id="001";
	array num{10} ;
	do i=1 to 10;
		num{i}=uniform(i)+i;
	end;
run;

/* 2 variables and 5 observation in table2 */
data table2;
	input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;

/* Inner Join table1 and table2*/
proc sql;
	create table result as
	select a.*,b.score
	from table1 a inner join table2 b
	on a.id=b.id;
quit;
proc print;run;

 

 

The order of variable SCORE in Data Set RESULT has changed:

result2.png

 

So, I'm confused. Why did it change?

 

Next, I run a test, to see the relation between the number of the variables in data set table1 and the order of varialbe SCORE in data set RESULT. Here is the code:

 


%macro create_table1(count);
%local i;
data table1;
	id="001";
	%do i=1 %to &count;
		num&i=uniform(&i);
	%end;
run;
%mend create_table1;


%macro test_join;

proc sql;
	drop table differ_all;
quit;

%local i;
%do i=1 %to 20;

	%create_table1(&i)

	proc sql;
		create table result&i as
		select a.*,b.score
		from table1 a inner join table2 b
		on a.id=b.id;
	quit;


	data differ&i;
		merge result(keep=score rename=(score=base)) result&i(keep=score rename=(score=new)) end=last;
		length score_base score_new $ 20 differ_flag $ 5;
		retain score_base score_new;
		score_base=catx(',',score_base,base);
		score_new=catx(',',score_new,new);
		

		if last then do;
			var_count=&i+1;/*plus the variable ID*/
			differ_flag=ifc(score_base=score_new,'true','false');
			output;
		end;
		keep var_count score_base score_new differ_flag;
	run;


	proc append base=differ_all data=differ&i force;
	run;
%end;

proc print data=differ_all;run;

%mend test_join;



data table1;
	id="001";
	num=uniform(1);
run;
data table2;
	input id $ score;
cards;
001 80
001 80
001 90
001 60
001 85
;
run;
proc sql;
	create table result as
	select a.*,b.score
	from table1 a inner join table2 b
	on a.id=b.id;
quit;


%test_join

And the result is:

result3.png

 

When the number of variables of data set table1 equals 11, then order of SCORE in join table has changed. I test a few times, it's the same.

 

So, anyone can tell me the reason? And I can't find anything about that in SAS Help and Documents.

 

 

2 REPLIES 2
SuzanneDorinski
Lapis Lazuli | Level 10

See SQL Joins -- The Long and The Short of It, available at https://support.sas.com/techsup/technote/ts553.html.  

 

If you add _method to the PROC SQL statement, you'll get notes in the log show the SQL execution methods chosen.

 

The notes in the log for the first 9 iterations of your test_join macro look like this:

 

 NOTE: SQL execution methods chosen are:
 
       sqxcrta
           sqxjhsh
               sqxsrc( WORK.TABLE2(alias = B) )
               sqxsrc( WORK.TABLE1(alias = A) )
 NOTE: Table WORK.RESULT1 created, with 5 rows and 3 columns.

The notes in the log for the 10th through 20th iterations of your test_join macro look like this:

 

 NOTE: SQL execution methods chosen are:
 
       sqxcrta
           sqxjhsh
               sqxsrc( WORK.TABLE1(alias = A) )
               sqxsrc( WORK.TABLE2(alias = B) )
 NOTE: Table WORK.RESULT10 created, with 5 rows and 12 columns.
Slash
Quartz | Level 8

Thanks! That is very helpful.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 818 views
  • 1 like
  • 2 in conversation