BookmarkSubscribeRSS Feed
mbertol4
Calcite | Level 5

Hi,

   we are dealing with an infrastructure change that is migrating our SAS environment from a Windows environment to a Linux environment while migrating procedures we handed up in having troubles in properly validate migration activities due to datasets that look to be different from a PROC COMPARE while being generated starting with the same code.  From the documentation below, it looks like sorting of the datasets could become nondeterministic in some circumstances, I'm wondering how this is usually handled. 

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/sqlproc/n0gwogdxntzooun1azrzwrwrqvzq.htm

 

Thanks.

10 REPLIES 10
Kurt_Bremser
Super User

Wherever a specific sort order is needed, properly written code has to explicitly force that order when a dataset is created.

Since PROC COMPARE depends on the order, you need sorting and use ID when comparing datasets.

s_lassen
Meteorite | Level 14

The problem typically comes when you have duplicate keys in the tables. ORDER BY in SQL is indeterministic when there are duplicates of the ORDER variables.

 

The fast and dirty way to get around this is to use PROC SORT instead (or after the SQL extract), and sort by all variables, e.g.:

proc sort data=lib1.table2;
  by key date _ALL_;
run;

If the variables are in the same order on the tables that you are trying to compare, that ought to do the trick. 

 

 

mbertol4
Calcite | Level 5

Hi,  I see detail on PROC SQL and not deterministic behavior is described in https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/sqlproc/n0gwogdxntzooun1azrzwrwrqvzq.htm  my doubts come from the fact essentially that in the SAS log there is no warning message generated when order by in the proc SQL is not deterministic.  I'm wondering why this potential issue is not properly highlighted in the log.

 

Tom
Super User Tom
Super User

@mbertol4 wrote:

Hi,  I see detail on PROC SQL and not deterministic behavior is described in https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/sqlproc/n0gwogdxntzooun1azrzwrwrqvzq.htm  my doubts come from the fact essentially that in the SAS log there is no warning message generated when order by in the proc SQL is not deterministic.  I'm wondering why this potential issue is not properly highlighted in the log.

 


It is not flagged as it is part of the basic definition of the SQL language.  It is a relational algebra for working on SETS.  A set does not have any order, just members.

mbertol4
Calcite | Level 5

I see it's "basic definition of the SQL language" but that has impacts on code portability.  Isn't it good programming practice to avoid not deterministic code ?

Tom
Super User Tom
Super User

@mbertol4 wrote:

I see it's "basic definition of the SQL language" but that has impacts on code portability.  Isn't it good programming practice to avoid not deterministic code ?


You picked that word to describe the behavior.  It is deterministic.  It returns the set of observations that match the program you gave. 

SASKiwi
PROC Star

When we migrated from SAS 9.3 to SAS 9.4 we found that the PROC SQL result set order was sometimes different between the two versions, in the absence of an ORDER BY clause. Apparently enhancements to the SQL interpreter in 9.4 caused this. The simple addition of an explicit ORDER BY cured the problem, and ensured that our SAS 9.4 PROC SQL gave identical results to 9.3, including the result set order.

 

 

Kurt_Bremser
Super User

The not deterministic code is the code you fed to the interpreter. If you expect a certain order, your code has to force it, period. If a follow-up step (ie your PROC COMPARE) requires a specific order, set it.

mbertol4
Calcite | Level 5

It depends on the data, not on the code itself; that's the reason why I was expecting some sort of warning message.   As explained in  https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/sqlproc/n0gwogdxntzooun1azrzwrwrqvzq.htm   "However, if the ORDER BY clause does not reference a joint combination of unique values, then the order of rows is not deterministic if ORDER BY is unstable."

Kurt_Bremser
Super User

@mbertol4 wrote:

"However, if the ORDER BY clause does not reference a joint combination of unique values, then the order of rows is not deterministic if ORDER BY is unstable."


Exactly. That's why it is YOUR duty to care for the correct order. And since the SQL step CAN do what you want (= what you tell it to do through your code) perfectly well, there's no need for a message. A WARNING or other message is only needed when the interpreter cannot exactly do what you tell it through the code (your expectations are NOT part of any code).

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 710 views
  • 9 likes
  • 5 in conversation