BookmarkSubscribeRSS Feed
Mohan_Rang
Calcite | Level 5

When I load a SAS dataset sorted by a key(contains duplicate keys) using PROC SORT into Redshift, the same sorting order is not preserved in Redshift. For eg. When I compare the SAS dataset with Redshift table, they are not in the same sorting order, the data is really jumbled within the duplicates keys and very hard to find a pattern of how data stores in Redshift columns or why that's happening. Please find attached the sample example for reference.

 

Any suggestions or guidance on maintaining the same sorting after data load into Redshift from SAS would be very helpful. Thanks!

7 REPLIES 7
ballardw
Super User

Please share exactly how you moved the data to Redshift.

Were you creating a new file/table/ or what ever Redshift might call it? Or appending/inserting to an existing bit?

 

 

Putting "results" into Excel adds yet another place to corrupt data. Plus many of us won't open XLSX files from unknown sources because of organization policies or concerns over viruses and such.

Mohan_Rang
Calcite | Level 5
PROC APPEND BASE = Target_Table (bulkload=yes bl_compress=yes bl_bucket='xxx' bl_default_dir = 'xxx' bl_use_escape =YES) 
DATA = Source_Table force; 
RUN;

This is the code used to load the data from SAS to Redshift

ballardw
Super User

You did not "load" data, it appears that you appended it to an existing table.

So any data already in the table had its existing sort order and you can only expect the records added at the end of the table to maybe retain the order of the SAS data set.

I would suggest if you require a specific order to either use Redshift tools, what ever they may be, to resort the data after you append or to create a new table.

 

This is basically the equivalent of pasting lines from one document at the end of another.

 

 

Mohan_Rang
Calcite | Level 5

Thank you for the response. I would like to clarify. First, I delete all the data from the Redshift Target table and load the new SAS dataset into empty Redshift data structure. Actually the row count of both SAS table and Redshift data are the same but the sort order within the duplicate keys are random, not exactly how I would see in the SAS datatset before the data load into Redshift. It looks like SAS follows a certain sort sequence and Redshift follows another or maybe I'm wrong. If I can understand how the sort sequence works in Redshift, that would be really helpful in manipulating the sort order in Redshift to preserve it exactly like SAS. 

Tom
Super User Tom
Super User

Does Redshift actually have a concept of order?  Most database systems do not return results in the order the observations were loaded. They return them in the order they are fetched from the storage which is generally totally independent of the order they were loaded. This is especially true of systems like Redshift that support multiple processors.  If you want the observations returned in a specific order then use the ORDER BY clause in your SQL query.  If you want to preserve the order they observations had in the SAS dataset then generated a new variable that you can use in the ORDER BY clause.

Mohan_Rang
Calcite | Level 5

Thanks for your response. I'm not sure if they have a concept of order but from my understanding Redshift uses sort keys to determine the order in which rows in a table are stored. I have no problems with Redshift returning the results from storage as well. They do return them in SAS in the same order they were loaded. The problem occurs during the data load from SAS to Redshift. Please see the sample screenshot below. It randomly stores some of the records in random order within the keys. I would like to mimic the same SAS sort order within the duplicate keys in Redshift too.

sasvred.PNG

 

 

 

Kurt_Bremser
Super User

If you want to see a specific order from a database system, you have to force it in the query.

SAS does in fact do a query to the DBMS table when you open it in a viewtable window, and it will not specify an order during this. I suggest you define a view in Redshift with an ORDER BY clause, and use that to view your table.

As others said, DB systems will store rows in any order they think is optimal, and return them in that order. The only order they will assume automatically would be that of defined primary keys.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1451 views
  • 1 like
  • 4 in conversation