BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jagadeesh2907
Obsidian | Level 7

Hi All,

 

I have 2 datasets A and B with a key variable "Customer_id".

Dataset A does not have duplicates, however, Dataset B have duplicate key value but the complete record is not duplicate. consider this as a transaction dataset with mulitple amount for each "customer_id". 

 

when i try to do left join of dataset A with Dataset B, in the result dataset C, I would be getting multiple records for "customer_id" which has duplicate values. This is clear.

 

What is my problem statement is that, will the order of the duplicate customer_id i.e the records of the Customer_id with duplicates differ when we run it each time ? 

 

As far as i know, SAS reads the data record by record and hence this should not change, however, wanted to have an expert opinion or check if my understanding is not correct. 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

You need a variable (or several variables) with which you can force the order in the ORDER BY clause of PROC SQL.

Note that this will be less performant than establishing the order first with PROC SORT and then join in a DATA step.

View solution in original post

7 REPLIES 7
ballardw
Super User

Unless you go to some length to force things do not expect SQL to process anything in a given order. If sequence of operation is important then a data step merge may be a better approach. Or if the data is truly a transaction and the desired result is one output record per match then a data step UPDATE may be more in order.

Jagadeesh2907
Obsidian | Level 7
this explains my difference. thank you for this point. how can i force SQL to follow a certain order ? is it by sorting the datasets which are needed to be joined ?
Kurt_Bremser
Super User

You need a variable (or several variables) with which you can force the order in the ORDER BY clause of PROC SQL.

Note that this will be less performant than establishing the order first with PROC SORT and then join in a DATA step.

Reeza
Super User
SQL doesn't respect order of rows at all. You cannot assume it will generate the same order unless you explicitly control it.
Data steps will by design to some degree but it also assumes that the data being provided is always in the exact same order.
Jagadeesh2907
Obsidian | Level 7
Thank you Rezza, how can i explicitly control this through SQL ? is it by sorting the datasets before join or there are any other ways ?
ChrisNZ
Tourmaline | Level 20

SQL never guarantees a processing order. That's a "feature" of the language.

You should generate a sequential read, if reading from a SAS V9 dataset, if you specify the   proc sql nothreads;   option. 

 

When reading multiple tables, sorting them by the join key should prevent that proc sql generate a sort, and it should read both tables sequentially if the nothreads option is used. Ultimately, the SQL optimiser makes the decision.

Patrick
Opal | Level 21

With SQL the only way to guarantee the sort order is via an Order clause AFTER the join.

Even though due to some SAS specifics there might be some cases where you get the desired sort order also without an order by clause, you never should not attempt to implement this way because it's inherently not how SQL works.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1022 views
  • 4 likes
  • 6 in conversation