Desktop productivity for business analysts and programmers

When use PROC MERGE in SAS EG to merge 2 tables from Netezza, no need to sort the data. Why?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

When use PROC MERGE in SAS EG to merge 2 tables from Netezza, no need to sort the data. Why?

Hi,

 

I am using SAS EG 4.3.

 

"For a merge with a BY statement, your data needs to be sorted, either by using PROC SORT or having data that is in sorted order already, or indexed using the BY variables."

 

But, when I use PROC MERGE to merge 2 tables from Netezza, I can skip over the step of PROC SORT. The output is correct and sorted. Why? Are the tables from Netezza pre-defined by something?

 

 

For example, (Netez is a libref for a Netezza database. 1st, create 2 tables in Netezza; 2nd, use PROC MERGE)

 

Data Netez.A1;

Input ID Mth Name$ Height;

cards;

7 5 D 2

1 1 A 1

3 2 B 2

9 5 E 2

5 3 C 2

;

run;

 

Data Netez.A2;

Input ID Mth Name$ Weight;

cards;

5 3 C 4

2 1 A 2

7 5 D 5

4 2 B 3

;

run;

 

data dummy;

merge Netez.A1 (in=x) Netez.A2 (in=y);

by id;

if x;

run;


Accepted Solutions
Solution
‎11-17-2017 04:38 PM
PROC Star
Posts: 1,265

Re: When use PROC MERGE in SAS EG to merge 2 tables from Netezza, no need to sort the data. Why?

SQL and SAS were in development around the same time, in the 1970s. One of the foundational concepts of SQL is that the sequence of the records in a table is generally irrelevant; that's why the original language didn't have any concepts to process one row after the other.

 

As part of this, when you do a join, issues about table sequence are left "behind the scenes", for the SQL compiler to figure out.

 

One fallacy that circulates is that a SQL join is more "efficient" than a SAS merge because the data doesn't need to be sorted; I assure you, there is a sort being done behind the scenes. Computationally, doing a SAS merge on unsorted data is similar to doing a SQL join on the same data.

 

One last comment; don't assume that the output of a SQL join is in any particular order, unless you've used an ORDER BY clause. Theoretically, it could change from one run to the next.

 

Tom

View solution in original post


All Replies
Super User
Posts: 22,874

Re: When use PROC MERGE in SAS EG to merge 2 tables from Netezza, no need to sort the data. Why?

The rules you're referencing are for a data step merge. 

 

There's no PROC MERGE that I'm aware of, but if you use a SQL merge, either in SAS BASE, EG Query Builder (which is a SQL query builder) or a SQL server it sorts it automatically for the merge, you don't need to first sort. This is a feature of SQL in general. However, there's no guarantee on the sort order unless you also specify the ORDER by statement. 

 


Pre-app wrote:

Hi,

 

I am using SAS EG 4.3.

 

"For a merge with a BY statement, your data needs to be sorted, either by using PROC SORT or having data that is in sorted order already, or indexed using the BY variables."

 

But, when I use PROC MERGE to merge 2 tables from Netezza, I can skip over the step of PROC SORT. The output is correct and sorted. Why? Are the tables from Netezza pre-defined by something?

 

 

For example, (Netez is a libref for a Netezza database. 1st, create 2 tables in Netezza; 2nd, use PROC MERGE)

 

Data Netez.A1;

Input ID Mth Name$ Height;

cards;

7 5 D 2

1 1 A 1

3 2 B 2

9 5 E 2

5 3 C 2

;

run;

 

Data Netez.A2;

Input ID Mth Name$ Weight;

cards;

5 3 C 4

2 1 A 2

7 5 D 5

4 2 B 3

;

run;

 

data dummy;

merge Netez.A1 (in=x) Netez.A2 (in=y);

by id;

if x;

run;


 

Solution
‎11-17-2017 04:38 PM
PROC Star
Posts: 1,265

Re: When use PROC MERGE in SAS EG to merge 2 tables from Netezza, no need to sort the data. Why?

SQL and SAS were in development around the same time, in the 1970s. One of the foundational concepts of SQL is that the sequence of the records in a table is generally irrelevant; that's why the original language didn't have any concepts to process one row after the other.

 

As part of this, when you do a join, issues about table sequence are left "behind the scenes", for the SQL compiler to figure out.

 

One fallacy that circulates is that a SQL join is more "efficient" than a SAS merge because the data doesn't need to be sorted; I assure you, there is a sort being done behind the scenes. Computationally, doing a SAS merge on unsorted data is similar to doing a SQL join on the same data.

 

One last comment; don't assume that the output of a SQL join is in any particular order, unless you've used an ORDER BY clause. Theoretically, it could change from one run to the next.

 

Tom

Occasional Contributor
Posts: 5

Re: When use PROC MERGE in SAS EG to merge 2 tables from Netezza, no need to sort the data. Why?

Thank all.
Super User
Super User
Posts: 7,860

Re: When use PROC MERGE in SAS EG to merge 2 tables from Netezza, no need to sort the data. Why?

[ Edited ]

It is a feature of SAS/Access to .....

Basically SAS tells the database to give it the records in the right order.  So if you are using a BY statement in your DATA or PROC step then SAS will automatically add an ORDER BY clause to the query that it pushes into the database to get the records.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 252 views
  • 1 like
  • 4 in conversation