Hi,
We are migrating from SAS9.2 to SAS9.4.
We discovered that in one PROC SQL steps show that the method that the SQL optimizer includes a sort(sqxsort) in SAS 9.2. but the same code using the same data, for SAS 9.4, it removes the sort(dont know the reason).This has huge implications because the data output is differ
Please help me in this
Here am attaching code and log files
Would you like to post the code? If the sort is necessary then you should Always explicitly set it in the statement. SQL by its very nature is Not an ordered system, order matters not to SQL processed data. For example, a system using multithreading may not produce the same output as a single thread process. Different data (or more data) may develop more requirements than originally etc. If it needs a particular order, then the SQL processor will create an object for it, if it doesn't then it wont. If you haven't explicitly stated and order, then there is no guarentee that the SQL processor will need that sort, you have just been lucky until this point. For an exact reason why the SQLoptimiser behind the scenes has decided not to use a sort, would be something the support desk maybe able to provide if you give them the code, but at the end of the day if you haven't specified it then your at the mercy of the optimiser/processor. Not much else to say really - in all your code always be very explicit about what it is your doing, don't allow systems to guess.
please find the attachments for log and code files
This is very odd. The SQL optimizer decisions may change (i.e. improve) over the releases but the result sets should not. So something is clearly wrong but we need more info (logs, sample data). Try to reduce your case (rows and columns) as much as possible. You may find that something else, not related to what you see in the _method output, is wrong.
Also do not exclude the possibility that in case of differences between 9.2 and 9.4 actually the older one may be wrong and what you see is improvement :-).
Regards,
- Jan.
Sorry, this: "but the result sets should not" isn't exactly true. As I said, SQL has many influences - it is one reason why the use of monotonic() is cautioned - it can give different results each run. If the programming isn't specifically stating an order, then there is no guarentee that the returned dataset will be the same. Is the compiler using a clustered read or non-clustered read for instance will change the read type.
I have to add some nuance to this. The OP mentioned SQXSORT so this is about the execution plan the optimizer has decided upon. It is what one sees when running SQL with the _TREE and _METHOD options. This sorting will occur if the decision was made to sort intermediate sets for best performance, probably to do a match/merge. It is not something that is under the control of the user. Only few options exist that allow for influencing the optimizer using hints in the code. Eg. noindex=yes. A query is like algebra, no matter what approach you take in solving a problem, there can only be one correct outcom. The optimizer does what it's ment to do, optimize the path taken to resolve the query, and should not influcence the result set. If it does that that's a bug.
The use of unsuported or undocumented features like MONOTONIC() may indeed be the cause of the different result, but would disqualify any complaints.
Anyway, unless we get to see some code, logs and what have you we can only be guessing.
Regards,
- Jan.
I think this could well be the same problem we experienced with SAS SQL when migrating from 9.3 to 9.4.
We found that some of our queries resulted in a different order of output data where no ORDER BY was explicitly specified when comparing 9.3 with 9.4 (using identical code and input data). We fixed the problem by putting in ORDER BY statements to retain the same data order as in 9.3.
This issue was only apparent for some more complicated queries and we never really nailed the exact circumstances under which it would happen. We suspect this issue was caused by SQL optimiser changes in 9.4.
@LinusH - well I wouldn't necessarily describe it as misbehaving, more a case of being consistently different - that SAS 9.4 often gives you a different default output data order than 9.2/9.3. It has convinced me to be more diligent with ORDER BY's though...
BTW, I also found SQL queries that worked in 9.3 that gave the wrong results in 9.4. I tracked the problem to SAS and as far as I'm aware they are looking to resolve this in a later release.
please find the attachments for logs and code files
A small but important change betweem 9.2 and 9.4 is visible:
9.2:
sqxsort sqxsrc( BCW_DTL.ETL_WORK_ITEM_NUM ) sqxsort sqxsrc( WORK.W667SN06_RULE4_3 )
9.4:
sqxsort sqxsrc( BCW_DTL.ETL_WORK_ITEM_NUM ) sqxsrc( WORK.W667SN06_RULE4_3 )
If I examine the operations on WORK.W667SN06_RULE4_3 leading up to the SQL query, I see that this dataset is sorted shortly before the SQL query. This fact is recorded in the descriptive header of the dataset and can be used by subsequent operations to prevent unnecessary sorts. It is plausible that the 9.4 optimizer takes this into account and eliminates the sort (sqxsort) but just sequentially reads the the source rows from the table (sqxsrc). This can siginificantly speed up queries. A clear improvement of the 9.2 query plan.
Other than that I see no issues with this job. You will have to explain where the differences between the versions really matter. As implied before the output of _METHOD / _TREE is no a problem in itself. carefully inspect and compare work.W667TARG_RULE5_1 in both cases. If it is the order in which the rows are retrieved are different I refer to the response from @LinusH that explains the order of the rows cannot be relied upon unless you use a ORDER BY clause or simply run a proc sort afterwards. If the input data is the same but the output is not either 9.2 or 9.4 is at fault. If the generated SQL is different (I didn't see it but no guarantee here) DI Studio may be at fault. Either way I.M.O. SAS Support should be contacted. Be prepared to send them your data.
Kind regards,
- Jan.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.