BookmarkSubscribeRSS Feed
Swethajava23
Calcite | Level 5

Hi,

 

We are migrating from SAS9.2 to SAS9.4.

We discovered that in one PROC SQL steps show that the method that the SQL optimizer includes a sort(sqxsort)  in SAS 9.2. but the same code using the same data, for SAS 9.4, it removes the sort(dont know the reason).This has huge implications because the data output is differ

 

Please help me in this

 

 

 

12 REPLIES 12
LinusH
Tourmaline | Level 20
Without any log it's hard to assist.
Are u 100% everything else is 100% the same between the two environments?
Data never sleeps
Swethajava23
Calcite | Level 5

Here am attaching code and log files

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Would you like to post the code?  If the sort is necessary then you should Always explicitly set it in the statement.  SQL by its very nature is Not an ordered system, order matters not to SQL processed data.  For example, a system using multithreading may not produce the same output as a single thread process.  Different data (or more data) may develop more requirements than originally etc.  If it needs a particular order, then the SQL processor will create an object for it, if it doesn't then it wont.  If you haven't explicitly stated and order, then there is no guarentee that the SQL processor will need that sort, you have just been lucky until this point.  For an exact reason why the SQLoptimiser behind the scenes has decided not to use a sort, would be something the support desk maybe able to provide if you give them the code, but at the end of the day if you haven't specified it then your at the mercy of the optimiser/processor.  Not much else to say really - in all your code always be very explicit about what it is your doing, don't allow systems to guess.

Swethajava23
Calcite | Level 5

please find the attachments for log and code files

jklaverstijn
Rhodochrosite | Level 12

This is very odd. The SQL optimizer decisions may change (i.e. improve) over the releases but the result sets should not. So something is clearly wrong but we need more info (logs, sample data). Try to reduce your case (rows and columns) as much as possible. You may find that something else, not related to what you see in the _method output, is wrong.

 

Also do not exclude the possibility that in case of differences between 9.2 and 9.4 actually the older one may be wrong and what you see is improvement :-).

 

Regards,

- Jan.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, this: "but the result sets should not" isn't exactly true.  As I said, SQL has many influences - it is one reason why the use of monotonic() is cautioned - it can give different results each run.  If the programming isn't specifically stating an order, then there is no guarentee that the returned dataset will be the same.  Is the compiler using a clustered read or non-clustered read for instance will change the read type.

jklaverstijn
Rhodochrosite | Level 12

I have to add some nuance to this. The OP mentioned SQXSORT so this is about the execution plan the optimizer has decided upon. It is what one sees when running SQL with the _TREE and _METHOD options. This sorting will occur if the decision was made to sort intermediate sets for best performance, probably to do a match/merge. It is not something that is under the control of the user. Only few options exist that allow for influencing the optimizer using hints in the code. Eg. noindex=yes. A query is like algebra, no matter what approach you take in solving a problem, there can only be one correct outcom. The optimizer does what it's ment to do, optimize the path taken to resolve the query, and should not influcence the result set. If it does that that's a bug.

 

The use of unsuported or undocumented features like MONOTONIC() may indeed be the cause of the different result, but would disqualify any complaints.

 

Anyway, unless we get to see some code, logs and what have you we can only be guessing.

 

Regards,

- Jan.

SASKiwi
PROC Star

I think this could well be the same problem we experienced with SAS SQL when migrating from 9.3 to 9.4.

 

We found that some of our queries resulted in a different order of output data where no ORDER BY was explicitly specified when comparing 9.3 with 9.4 (using identical code and input data). We fixed the problem by putting in ORDER BY statements to retain the same data order as in 9.3.

 

This issue was only apparent for some more complicated queries and we never really nailed the exact circumstances under which it would happen. We suspect this issue was caused by SQL optimiser changes in 9.4. 

LinusH
Tourmaline | Level 20
You can never forsee a specific sort order from a SQL query, so I can hardly see this as an example of misbehaving. If need your result set in a specific sort order, you MUST use order by.
Data never sleeps
SASKiwi
PROC Star

@LinusH - well I wouldn't necessarily describe it as misbehaving, more a case of being consistently different - that SAS 9.4 often gives you a different default output data order than 9.2/9.3. It has convinced me to be more diligent with ORDER BY's though...Smiley Happy

 

BTW, I also found SQL queries that worked in 9.3 that gave the wrong results in 9.4. I tracked the problem to SAS and as far as I'm aware they are looking to resolve this in a later release.  

Swethajava23
Calcite | Level 5

please find the attachments for logs and code files

jklaverstijn
Rhodochrosite | Level 12

A small but important change betweem 9.2 and 9.4 is visible:

 

9.2:

 

                          sqxsort
                              sqxsrc( BCW_DTL.ETL_WORK_ITEM_NUM )
                          sqxsort
                              sqxsrc( WORK.W667SN06_RULE4_3 )

 

9.4:

                          sqxsort
                              sqxsrc( BCW_DTL.ETL_WORK_ITEM_NUM )
                          sqxsrc( WORK.W667SN06_RULE4_3 )

 

If I examine the operations on WORK.W667SN06_RULE4_3 leading up to the SQL query, I see that this dataset is sorted shortly before the SQL query. This fact is recorded in the descriptive header of the dataset and can be used by subsequent operations to prevent  unnecessary sorts. It is plausible that the 9.4 optimizer takes this into account and eliminates the sort (sqxsort) but just sequentially reads the the source rows from the table (sqxsrc). This can siginificantly speed up queries. A clear improvement of the 9.2 query plan.

 

Other than that I see no issues with this job. You will have to explain where the differences between the versions really matter. As implied before the output of _METHOD / _TREE is no a problem in itself. carefully inspect and compare work.W667TARG_RULE5_1 in both cases. If it is the order in which the rows are retrieved are different I refer to the response from @LinusH that explains the order of the rows cannot be relied upon unless you use a ORDER BY clause or simply run a proc sort afterwards. If the input data is the same but the output is not either 9.2 or 9.4 is at fault.  If the generated SQL is different (I didn't see it but no guarantee here) DI Studio may be at fault. Either way I.M.O. SAS Support should be contacted. Be prepared to send them your data.

 

Kind regards,

- Jan.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2243 views
  • 7 likes
  • 5 in conversation