BookmarkSubscribeRSS Feed
Mohan_Rang
Calcite | Level 5

Hi, I'm working on a SAS migration project - migrating codes from SAS 9.3 to SAS 9.4 (SAS studio). In SAS 9.3 the final SAS datasets from the SAS codes are used for data warehousing where as in SAS 9.4 the data warehouse is Redshift. We noticed the result sets produced by left join (sas to sas datasets, sas to redshift tables) has different sort order in SAS 9.3 and SAS 9.4 when we don't have a condition or keys in place to control the entire row values. When we remove duplicates by keys in PROC SORT after left join, it eliminates different records because the left join sorts the result set implicitly in different order within duplicate keys. As a result, PROC SORT picks different records in both 9.3 and 9.4 and we see data mismatch between 9.3 and 9.4 in the final datasets. The client doesn't want to make any changes in 9.3 legacy codes to control the record order. So we tried finding patterns in the left join sort order in 9.3 and 9.4 and it looks like they are sorting them randomly within duplicate keys. Did anyone encounter this issue in your migration project? Is this something expected because I never thought left join in 9.3 and 9.4 works differently?

4 REPLIES 4
ballardw
Super User

One might say that if you can tell the difference between records "deleted" then the approach was suboptimal to begin with as you were using a process that is not always repeatable.

 

From the Proc SORT documentation for NODUPEKEY:

Tips Use the EQUALS option with the NODUPKEY option for consistent results in your output data sets.

might help. But if the previous code wasn't using it then adding it now is likely to change the order anyway.

 

Proc SQL never guarantees any specific order of records within an "order by" group. Set operations do care about order and SQL is based on set operations. If you need something done in a specific order then perhaps a data step is in the works.

 

Sharing specific code used might be a good idea.

SASKiwi
PROC Star

We encountered this exact issue when migrating from SAS 9.3 to 9.4. When there is no ORDER BY on the final result set the data order was sometimes different between the two versions. From what I understand, SAS made significant improvements in the SQL interpreter between releases. The fix is easy - just add an ORDER BY to get the required data order.

 

I suggest you advise your client that it is unrealistic to expect that migrating to a new SAS version will not require any coding changes. In my experience there are always some changes required often caused by algorithm improvements, better error trapping or newly-implemented options. This in no way should be considered "wrong" or not best practice.

 

ChrisNZ
Tourmaline | Level 20

Is this an issue, or just a question?

If it is an issue, I agree with @Kurt_Bremser that any required order should be explicitly set. If it is left to chance, your client in unjustified expecting chance to be consistent.

Also note that typically, only SAS can store sorted data; SQL databases cannot do that. This can be very costly as tables must be re-sorted each time ordered data is required, such as when joining.

However, I see that Redshift can use sort keys. So it seems my knowledge needs updating. Can you tell me more?

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1342 views
  • 2 likes
  • 5 in conversation