SAS 9.3 and SAS 9.4 left join sort order

Mohan_Rang · Posted 03-19-2021 06:49 PM

Hi, I'm working on a SAS migration project - migrating codes from SAS 9.3 to SAS 9.4 (SAS studio). In SAS 9.3 the final SAS datasets from the SAS codes are used for data warehousing where as in SAS 9.4 the data warehouse is Redshift. We noticed the result sets produced by left join (sas to sas datasets, sas to redshift tables) has different sort order in SAS 9.3 and SAS 9.4 when we don't have a condition or keys in place to control the entire row values. When we remove duplicates by keys in PROC SORT after left join, it eliminates different records because the left join sorts the result set implicitly in different order within duplicate keys. As a result, PROC SORT picks different records in both 9.3 and 9.4 and we see data mismatch between 9.3 and 9.4 in the final datasets. The client doesn't want to make any changes in 9.3 legacy codes to control the record order. So we tried finding patterns in the left join sort order in 9.3 and 9.4 and it looks like they are sorting them randomly within duplicate keys. Did anyone encounter this issue in your migration project? Is this something expected because I never thought left join in 9.3 and 9.4 works differently?

ballardw · Posted 03-19-2021 07:05 PM

One might say that if you can tell the difference between records "deleted" then the approach was suboptimal to begin with as you were using a process that is not always repeatable.

From the Proc SORT documentation for NODUPEKEY:

Tips Use the EQUALS option with the NODUPKEY option for consistent results in your output data sets.

might help. But if the previous code wasn't using it then adding it now is likely to change the order anyway.

Proc SQL never guarantees any specific order of records within an "order by" group. Set operations do care about order and SQL is based on set operations. If you need something done in a specific order then perhaps a data step is in the works.

Sharing specific code used might be a good idea.

SASKiwi · Posted 03-19-2021 09:26 PM

We encountered this exact issue when migrating from SAS 9.3 to 9.4. When there is no ORDER BY on the final result set the data order was sometimes different between the two versions. From what I understand, SAS made significant improvements in the SQL interpreter between releases. The fix is easy - just add an ORDER BY to get the required data order.

I suggest you advise your client that it is unrealistic to expect that migrating to a new SAS version will not require any coding changes. In my experience there are always some changes required often caused by algorithm improvements, better error trapping or newly-implemented options. This in no way should be considered "wrong" or not best practice.

Kurt_Bremser · Posted 03-20-2021 01:48 AM

If the order of a result dataset from the SQL procedure is important for you, you MUST (MUST) set it explicitly.

See Maxim 31.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ChrisNZ · Posted 03-20-2021 04:04 AM

Is this an issue, or just a question?

If it is an issue, I agree with @Kurt_Bremser that any required order should be explicitly set. If it is left to chance, your client in unjustified expecting chance to be consistent.

Also note that typically, only SAS can store sorted data; SQL databases cannot do that. This can be very costly as tables must be re-sorted each time ordered data is required, such as when joining.

However, I see that Redshift can use sort keys. So it seems my knowledge needs updating. Can you tell me more?

High-Performance SAS Coding - Third Edition

SAS 9.3 and SAS 9.4 left join sort order

Re: SAS 9.3 and SAS 9.4 left join sort order

Re: SAS 9.3 and SAS 9.4 left join sort order

Re: SAS 9.3 and SAS 9.4 left join sort order

Re: SAS 9.3 and SAS 9.4 left join sort order

SAS 9.3 and SAS 9.4 left join sort order

Re: SAS 9.3 and SAS 9.4 left join sort order

Re: SAS 9.3 and SAS 9.4 left join sort order

Re: SAS 9.3 and SAS 9.4 left join sort order

Re: SAS 9.3 and SAS 9.4 left join sort order

Registration is open

SAS Training: Just a Click Away