Hi,
we are dealing with an infrastructure change that is migrating our SAS environment from a Windows environment to a Linux environment while migrating procedures we handed up in having troubles in properly validate migration activities due to datasets that look to be different from a PROC COMPARE while being generated starting with the same code. From the documentation below, it looks like sorting of the datasets could become nondeterministic in some circumstances, I'm wondering how this is usually handled.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/sqlproc/n0gwogdxntzooun1azrzwrwrqvzq.htm
Thanks.
Wherever a specific sort order is needed, properly written code has to explicitly force that order when a dataset is created.
Since PROC COMPARE depends on the order, you need sorting and use ID when comparing datasets.
The problem typically comes when you have duplicate keys in the tables. ORDER BY in SQL is indeterministic when there are duplicates of the ORDER variables.
The fast and dirty way to get around this is to use PROC SORT instead (or after the SQL extract), and sort by all variables, e.g.:
proc sort data=lib1.table2;
by key date _ALL_;
run;
If the variables are in the same order on the tables that you are trying to compare, that ought to do the trick.
Hi, I see detail on PROC SQL and not deterministic behavior is described in https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/sqlproc/n0gwogdxntzooun1azrzwrwrqvzq.htm my doubts come from the fact essentially that in the SAS log there is no warning message generated when order by in the proc SQL is not deterministic. I'm wondering why this potential issue is not properly highlighted in the log.
@mbertol4 wrote:
Hi, I see detail on PROC SQL and not deterministic behavior is described in https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/sqlproc/n0gwogdxntzooun1azrzwrwrqvzq.htm my doubts come from the fact essentially that in the SAS log there is no warning message generated when order by in the proc SQL is not deterministic. I'm wondering why this potential issue is not properly highlighted in the log.
It is not flagged as it is part of the basic definition of the SQL language. It is a relational algebra for working on SETS. A set does not have any order, just members.
I see it's "basic definition of the SQL language" but that has impacts on code portability. Isn't it good programming practice to avoid not deterministic code ?
@mbertol4 wrote:
I see it's "basic definition of the SQL language" but that has impacts on code portability. Isn't it good programming practice to avoid not deterministic code ?
You picked that word to describe the behavior. It is deterministic. It returns the set of observations that match the program you gave.
When we migrated from SAS 9.3 to SAS 9.4 we found that the PROC SQL result set order was sometimes different between the two versions, in the absence of an ORDER BY clause. Apparently enhancements to the SQL interpreter in 9.4 caused this. The simple addition of an explicit ORDER BY cured the problem, and ensured that our SAS 9.4 PROC SQL gave identical results to 9.3, including the result set order.
The not deterministic code is the code you fed to the interpreter. If you expect a certain order, your code has to force it, period. If a follow-up step (ie your PROC COMPARE) requires a specific order, set it.
It depends on the data, not on the code itself; that's the reason why I was expecting some sort of warning message. As explained in https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/sqlproc/n0gwogdxntzooun1azrzwrwrqvzq.htm "However, if the ORDER BY clause does not reference a joint combination of unique values, then the order of rows is not deterministic if ORDER BY is unstable."
@mbertol4 wrote:
"However, if the ORDER BY clause does not reference a joint combination of unique values, then the order of rows is not deterministic if ORDER BY is unstable."
Exactly. That's why it is YOUR duty to care for the correct order. And since the SQL step CAN do what you want (= what you tell it to do through your code) perfectly well, there's no need for a message. A WARNING or other message is only needed when the interpreter cannot exactly do what you tell it through the code (your expectations are NOT part of any code).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.