BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Junyong
Pyrite | Level 9

Is it valid to use MONOTONIC() during JOIN in SQL? Suppose two following data sets.

data a;
input a @@;
b=rannor(1);
cards;
1 2 3 4
;
run;
data b;
input a @@;
c=rannor(1);
cards;
5 3 2 1
;
run;

Here the data set a is ascending in a but the data set b is descending in a. The following code performs full join in SQL.

proc sql;
	create table c as select coalesce(a.a,b.a) as a,b,c from a full join b on a.a=b.a order by monotonic();
	create table d as select coalesce(a.a,b.a) as a,b,c from b full join a on a.a=b.a order by monotonic();
quit;

The only difference is the order of FROM FULL JOIN. I thought that SQL will keep the order of the data set b in the second case as located first, but both produces data sets c and d, respectively, and both data sets are sorted in an ascending order. Which order does MONOTONIC() use when accompanied with JOIN then? Must this be avoided? Thanks.

P.S. I also found that SQL keeps the descending order of b when the second SQL statement uses just JOIN rather than FULL JOIN.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The monotonic() function (or any other function really) has nothing to do with how SAS assigns values to the _N_ automatic variable in a data step.  Also the _N_ automatic variable has nothing to do with observations. It is a just count of the number of times the data step has iterated.  Check the values output to the log for _N_  from these two ways to make a copy of SASHELP.CLASS.

data test1;
  put _n_=;
  set sashelp.class;
run;
data test2;
  do while (1=1);
    put _n_=;
    set sashelp.class;
    output;
  end;
run;

You cannot have any confidence in what values MONOTONIC() will return in an SQL query because you don't have any confidence in what order SAS will do in processing the query.  It is free to optimize the query in any way that it wants.   Since monotonic is undocumented you are not even sure it will count form 1 instead of starting somewhere else.

When I ran these queries the last result gererates 6 to 10 instead of 1 to 5.

proc sql ;
title 'No Order - a full join b';
select monotonic() as Ord,* from a full join b on a.a=b.a;
title 'No Order - b full join a';
select monotonic() as Ord,* from b full join a on a.a=b.a;
title 'Order by A_ROW,B_ROW';
select monotonic() as Ord,* from a full join b on a.a=b.a
order by a_row,b_row
;
title 'Order by B_ROW,A_ROW';
select monotonic() as Ord,* from b full join a on a.a=b.a
order by b_row,a_row
;
quit;
title;

 

 

View solution in original post

13 REPLIES 13
Tom
Super User Tom
Super User

What are you trying to do? Why would you think that ordering by the sequential values 1,2,3,4 would change the order in any way?

You basically told SAS to run this query:

proc sql;
title 'c';
select monotonic() as col1,* 
  from (select coalesce(a.a,b.a) as a,b,c from a full join b on a.a=b.a)
  order by col1
;
title 'd';
select monotonic() as col1,* 
  from (select coalesce(a.a,b.a) as a,b,c from b full join a on a.a=b.a)
  order by col1
;
quit;
title;

 

Junyong
Pyrite | Level 9

What is MONOTONIC() doing here then? When it comes to JOIN rather than FULL JOIN, SQL displays the opposite order as follows.

proc sql;
	create table c as select coalesce(a.a,b.a) as a,b,c from a join b on a.a=b.a order by monotonic();
	create table d as select coalesce(a.a,b.a) as a,b,c from b join a on a.a=b.a order by monotonic();
quit;

The order of the data set c is a=1,2,3, but that of the data set d is 3,2,1 because the data set b is in a descending order. Since MONOTONIC() assigns _n_ to each observation, shouldn't the data set d keep the original order a=5,3,2,1 of the data set b?

Tom
Super User Tom
Super User

The SQL language was purposely built to treat data as logical SETS. There is no concept of an order of the elements of a set. The process that is executing the SQL query is free to return the results in any order. That is why you can only add an ORDER BY clause to the outer query.  It is ordering the observations in the result set AFTER the set has been generated.  So using MONOTONIC in an ORDER BY clause will have no impact on changing the order since the first observation it looks at will get 1 the second on 2 etc. 

 

If you care about the order that the observations appear in the input dataset then either use data step code which will process the observations one by one or add you own sequential variable to your dataset so that you can use that variable in your ORDER BY clause.

Junyong
Pyrite | Level 9

I know SQL does random access. I only wondered what MONOTONIC() does when a query incorporates multiple data sets. It seems MONOTONIC() is unstable in these situations and only reasonable when a query with only one data set. Thanks.

Tom
Super User Tom
Super User
All is does is return a value one larger than what it returned the last time it executed. It is pretty deterministic in the queries you posted since it returns 1 for the first observation and 2 for the second, etc. So sorting by the value it returns means that the order is not changed at all.
The reason it might seem pseudo random in more complex queries is because you don't know how SAS is going convert your SQL set logic into actual executable code. So where the function really gets called you cannot control.
Junyong
Pyrite | Level 9

Here are the data sets a, b, c, and d in a clockwise order.

monotonic.png

And here are the queries.

proc sql;
	create table c as
	select coalesce(a.a,b.a) as a,b,c
	FROM A FULL JOIN B
	on a.a=b.a
	order by monotonic();
	create table d as
	select coalesce(a.a,b.a) as a,b,c
	FROM B FULL JOIN A
	on a.a=b.a
	order by monotonic();
quit;

There is no problem in the first query. The first observation for this query is the first observation of the data set a—i.e. (1,1.8048229506) because the FULL JOIN uses A first and then B.

The problem is the second query. The FULL JOIN in the second query uses B first and then A, so the first observation for this second query should be the first observation of the data set b—i.e. (5,1.8048229506). However, SQL put the last observation of the data set b (1,-1.083317655) as the first observation despite the ORDER BY MONOTONIC() at the end.

With the data set b only, the MONOTONIC() assigns 1 for (5,1.8048), 2 for (3,-0.0799), 3 for (2,0.3966), and 4 for (1,-1.0833), and the outer ORDER BY sorts the observations in this order. Why did SAS flip this order when the data set a is incorporated through FULL JOIN? This doesn't happen when not FULL JOIN but just JOIN—in this case, MONOTONIC() assigns 1 for (3,-0.0799), 2 for (2,0.3966), and 3 for (1,-1.0833) orderly, so the resulting data set respects their order as well.

Tom
Super User Tom
Super User

You do you NOT KNOW how SAS is going to process the data.  The only thing that is required to of an SQL implementation is that it returns the LOGICAL combination of the data the set logic in your SQL query describes.  SAS can take any route that it wants to satisgy that query as long as the result matches what you requested. Changing the order that you reference the datasets in the FROM clause does not impose any restrictions on how SAS decides to implement your query.  If it wants to it can read the dataset from back to front. Or read every even observation first and then read the odd observations.

 

If you want to control the order out based on the order in the original dataset then add a variable into the original dataset that can be used to order the data coming out of the query.

data a;
  a_row +1;
  input a @@;
  b=rannor(1);
cards;
1 2 3 4
;
data b;
  b_row+1;
  input a @@;
  c=rannor(1);
cards;
5 3 2 1
;

proc sql ;
title 'Order by A_ROW';
select * from a full join b on a.a=b.a 
order by a_row
;
title 'Order by B_ROW';
select * from a full join b on a.a=b.a 
order by b_row
;
quit;
Order by A_ROW 

   a_row         a         b     b_row         a         c
----------------------------------------------------------
       .         .         .         1         5  1.804823
       1         1  1.804823         4         1  -1.08332
       2         2  -0.07992         3         2  0.396577
       3         3  0.396577         2         3  -0.07992
       4         4  -1.08332         .         .         .

Order by B_ROW

   a_row         a         b     b_row         a         c
----------------------------------------------------------
       4         4  -1.08332         .         .         .
       .         .         .         1         5  1.804823
       3         3  0.396577         2         3  -0.07992
       2         2  -0.07992         3         2  0.396577
       1         1  1.804823         4         1  -1.08332

 

Junyong
Pyrite | Level 9

"Changing the order that you reference the datasets in the FROM clause does not impose any restrictions on how SAS decides to implement your query." Changing the order affects the results if not FULL JOIN but just JOIN. Let me reuse your code.

data a;
/*  a_row +1;*/
  input a @@;
  b=rannor(1);
cards;
1 2 3 4
;
data b;
/*  b_row+1;*/
  input a @@;
  c=rannor(1);
cards;
5 3 2 1
;

proc sql ;
/*title 'Order by A_ROW';*/ title 'A JOIN B';
select * from a join b on a.a=b.a 
order by monotonic()
;
/*title 'Order by B_ROW';*/ title 'B JOIN A';
select * from b join a on a.a=b.a 
order by monotonic()
;
quit;

Everything but the order is identical. Here are the results. They show the opposite sorts. What I want to know is why MONOTONIC() with FULL JOIN and MONOTONIC() with just JOIN behave in a different way. I know I can achieve the results by adding two DATA steps and going back and forth, but I want to understand what MONOTONIC() does exactly and save code lines if possible.

                                            A JOIN B

                                    a         b         a         c
                             
                                    1  1.804823         1  -1.08332
                                    2  -0.07992         2  0.396577
                                    3  0.396577         3  -0.07992

                                            B JOIN A

                                    a         c         a         b
                             
                                    3  -0.07992         3  0.396577
                                    2  0.396577         2  -0.07992
                                    1  -1.08332         1  1.804823

As stated here, MONOTONIC() is an undocumented function and the way it assigns each number to each observation is not stated when it comes to aggregating multiple data sets—"If the MONOTONIC function is used in an SQL procedure that aggregates data then the function may return non-sequential or missing results."

Tom
Super User Tom
Super User

You don't seem to be hearing what I am saying. Why would you think that asking SAS to calculate a sequence number while it is in the final step of delivering the data cause the order to change?

 

Run your same queries with and without the ORDER BY MONONOTIC() clause and you will notice that the results are not changed by adding the order by clause.

 

SAS's implementation TODAY of your two different ways of asking for the same information is returning the results for in different order.  But they could make a change to how they implement the query in some future release (or in SAS running on a different operating system) and the results could come out in same the same order for both or each could be in some other order and it would not mean that SAS was "broken".  There is no contract between you can the SQL implementation to return the values in any particular order.

Junyong
Pyrite | Level 9

I didn't say SAS was "broken." I was more asking the function of MONOTONIC() rather than ORDER BY MONOTONIC(), but it seems I need to more pinpoint my curiosity down. According to your replies,

1. MONOTONIC() is the function that gives 1 to the first observation it processes, 2 to the second observation it processes, and so forth.

2. The order that SQL follows to read observations is completely unknown.

3. Therefore, the resulting value that MONOTONIC() assigns to each observation is unpredictable in SQL.

If these three are correct then,

a. In a DATA step, MONOTONIC() returns orderly values to the observations.

b. In a SQL query, MONOTONIC() doesn't necessarily return orderly values to the observations—even in a query with only one data, the orderly values for the observations are not guaranteed.

c. In a DATA step, just adding an _N_ column is identical to just adding a MONOTONIC() column.

d. Just adding a MONOTONIC() column in a DATA step and the same thing in a SQL query are not guaranteed to be identical.

Am I correct?

fdsaaaa
Obsidian | Level 7

When you are saying ORDER BY MONOTONIC() it is essentially saying "sort the data in the output dataset by the order that it is already in" . 

 

I think you are getting confused , and assuming that it would sort the data by the order that the input dataset is in . 

 

 

Junyong
Pyrite | Level 9

Yes. I should have asked MONOTONIC() rather than ORDER BY MONOTONIC(). And essentially I was asking the order that PROC SQL follows to read the observations, which is random.

I was confused because I thought that MONOTONIC() is something readily available before SQL joins multiple data sets hopefully—such as a.MONOTONIC() and b.MONOTONIC()—but actually SQL assigns the values for MONOTONIC() after joining processes, which also differs from how _N_ behaves.

data a;
do i=1 to 100;
x=rannor(1);
output;
end;
run;
data bc;
set a;
if i>50;
b=_n_;
c=monotonic();
run;

I considered both _N_ and MONOTONIC() as something similar to ROW() in Excel—and my understanding was (1) SQL processes observations in a random order and (2) ORDER BY MONOTONIC() enables one to preserve the original order, which was why I put the nonsensical statement—but it seems I need to be careful when the order matters.

Tom
Super User Tom
Super User

The monotonic() function (or any other function really) has nothing to do with how SAS assigns values to the _N_ automatic variable in a data step.  Also the _N_ automatic variable has nothing to do with observations. It is a just count of the number of times the data step has iterated.  Check the values output to the log for _N_  from these two ways to make a copy of SASHELP.CLASS.

data test1;
  put _n_=;
  set sashelp.class;
run;
data test2;
  do while (1=1);
    put _n_=;
    set sashelp.class;
    output;
  end;
run;

You cannot have any confidence in what values MONOTONIC() will return in an SQL query because you don't have any confidence in what order SAS will do in processing the query.  It is free to optimize the query in any way that it wants.   Since monotonic is undocumented you are not even sure it will count form 1 instead of starting somewhere else.

When I ran these queries the last result gererates 6 to 10 instead of 1 to 5.

proc sql ;
title 'No Order - a full join b';
select monotonic() as Ord,* from a full join b on a.a=b.a;
title 'No Order - b full join a';
select monotonic() as Ord,* from b full join a on a.a=b.a;
title 'Order by A_ROW,B_ROW';
select monotonic() as Ord,* from a full join b on a.a=b.a
order by a_row,b_row
;
title 'Order by B_ROW,A_ROW';
select monotonic() as Ord,* from b full join a on a.a=b.a
order by b_row,a_row
;
quit;
title;

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 9574 views
  • 1 like
  • 3 in conversation