BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Pre-app
Calcite | Level 5

Hi,

 

I am using SAS EG 4.3.

 

"For a merge with a BY statement, your data needs to be sorted, either by using PROC SORT or having data that is in sorted order already, or indexed using the BY variables."

 

But, when I use PROC MERGE to merge 2 tables from Netezza, I can skip over the step of PROC SORT. The output is correct and sorted. Why? Are the tables from Netezza pre-defined by something?

 

 

For example, (Netez is a libref for a Netezza database. 1st, create 2 tables in Netezza; 2nd, use PROC MERGE)

 

Data Netez.A1;

Input ID Mth Name$ Height;

cards;

7 5 D 2

1 1 A 1

3 2 B 2

9 5 E 2

5 3 C 2

;

run;

 

Data Netez.A2;

Input ID Mth Name$ Weight;

cards;

5 3 C 4

2 1 A 2

7 5 D 5

4 2 B 3

;

run;

 

data dummy;

merge Netez.A1 (in=x) Netez.A2 (in=y);

by id;

if x;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
TomKari
Onyx | Level 15

SQL and SAS were in development around the same time, in the 1970s. One of the foundational concepts of SQL is that the sequence of the records in a table is generally irrelevant; that's why the original language didn't have any concepts to process one row after the other.

 

As part of this, when you do a join, issues about table sequence are left "behind the scenes", for the SQL compiler to figure out.

 

One fallacy that circulates is that a SQL join is more "efficient" than a SAS merge because the data doesn't need to be sorted; I assure you, there is a sort being done behind the scenes. Computationally, doing a SAS merge on unsorted data is similar to doing a SQL join on the same data.

 

One last comment; don't assume that the output of a SQL join is in any particular order, unless you've used an ORDER BY clause. Theoretically, it could change from one run to the next.

 

Tom

View solution in original post

4 REPLIES 4
Reeza
Super User

The rules you're referencing are for a data step merge. 

 

There's no PROC MERGE that I'm aware of, but if you use a SQL merge, either in SAS BASE, EG Query Builder (which is a SQL query builder) or a SQL server it sorts it automatically for the merge, you don't need to first sort. This is a feature of SQL in general. However, there's no guarantee on the sort order unless you also specify the ORDER by statement. 

 


@Pre-app wrote:

Hi,

 

I am using SAS EG 4.3.

 

"For a merge with a BY statement, your data needs to be sorted, either by using PROC SORT or having data that is in sorted order already, or indexed using the BY variables."

 

But, when I use PROC MERGE to merge 2 tables from Netezza, I can skip over the step of PROC SORT. The output is correct and sorted. Why? Are the tables from Netezza pre-defined by something?

 

 

For example, (Netez is a libref for a Netezza database. 1st, create 2 tables in Netezza; 2nd, use PROC MERGE)

 

Data Netez.A1;

Input ID Mth Name$ Height;

cards;

7 5 D 2

1 1 A 1

3 2 B 2

9 5 E 2

5 3 C 2

;

run;

 

Data Netez.A2;

Input ID Mth Name$ Weight;

cards;

5 3 C 4

2 1 A 2

7 5 D 5

4 2 B 3

;

run;

 

data dummy;

merge Netez.A1 (in=x) Netez.A2 (in=y);

by id;

if x;

run;


 

TomKari
Onyx | Level 15

SQL and SAS were in development around the same time, in the 1970s. One of the foundational concepts of SQL is that the sequence of the records in a table is generally irrelevant; that's why the original language didn't have any concepts to process one row after the other.

 

As part of this, when you do a join, issues about table sequence are left "behind the scenes", for the SQL compiler to figure out.

 

One fallacy that circulates is that a SQL join is more "efficient" than a SAS merge because the data doesn't need to be sorted; I assure you, there is a sort being done behind the scenes. Computationally, doing a SAS merge on unsorted data is similar to doing a SQL join on the same data.

 

One last comment; don't assume that the output of a SQL join is in any particular order, unless you've used an ORDER BY clause. Theoretically, it could change from one run to the next.

 

Tom

Tom
Super User Tom
Super User

It is a feature of SAS/Access to .....

Basically SAS tells the database to give it the records in the right order.  So if you are using a BY statement in your DATA or PROC step then SAS will automatically add an ORDER BY clause to the query that it pushes into the database to get the records.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1914 views
  • 1 like
  • 4 in conversation