Hi All,
I have 2 datasets A and B with a key variable "Customer_id".
Dataset A does not have duplicates, however, Dataset B have duplicate key value but the complete record is not duplicate. consider this as a transaction dataset with mulitple amount for each "customer_id".
when i try to do left join of dataset A with Dataset B, in the result dataset C, I would be getting multiple records for "customer_id" which has duplicate values. This is clear.
What is my problem statement is that, will the order of the duplicate customer_id i.e the records of the Customer_id with duplicates differ when we run it each time ?
As far as i know, SAS reads the data record by record and hence this should not change, however, wanted to have an expert opinion or check if my understanding is not correct.
You need a variable (or several variables) with which you can force the order in the ORDER BY clause of PROC SQL.
Note that this will be less performant than establishing the order first with PROC SORT and then join in a DATA step.
Unless you go to some length to force things do not expect SQL to process anything in a given order. If sequence of operation is important then a data step merge may be a better approach. Or if the data is truly a transaction and the desired result is one output record per match then a data step UPDATE may be more in order.
You need a variable (or several variables) with which you can force the order in the ORDER BY clause of PROC SQL.
Note that this will be less performant than establishing the order first with PROC SORT and then join in a DATA step.
SQL never guarantees a processing order. That's a "feature" of the language.
You should generate a sequential read, if reading from a SAS V9 dataset, if you specify the proc sql nothreads; option.
When reading multiple tables, sorting them by the join key should prevent that proc sql generate a sort, and it should read both tables sequentially if the nothreads option is used. Ultimately, the SQL optimiser makes the decision.
With SQL the only way to guarantee the sort order is via an Order clause AFTER the join.
Even though due to some SAS specifics there might be some cases where you get the desired sort order also without an order by clause, you never should not attempt to implement this way because it's inherently not how SQL works.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.