turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Need a sorted data before MERGING????

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 10:21 AM

Hi All,

Can anybody tell me, why do we need a sorted data before merging while it can be sort after merging as well?

Regards

Anand

Accepted Solutions

Solution

10-29-2013
10:48 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 10:48 AM

While there are many ways to do the 'Merging' without presorting the data, such as Proc SQL, Hash(), Proc Format among others, I suppose you are asking use of Merge statement in Data Step.

Data Step is sequential utility, meaning it goes through obs from top down, one by one till the end of the table. So when doing the merge by, it only makes sense that both tables are presorted by 'BY' variable, where data step is able to check and exhaust the same values of 'BY' variable from both table at the same time. I can't imagine any other way to do the "merge" given the way how it works.

HTH,

Haikuo

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 10:34 AM

Because, unlike a SQL join, a merge is essentially a row-by-row operation, and if the keys are not in the same order on both data sets, the merge can't proceed.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 10:46 AM

"Merging" data sets together can be done in many ways, one of which is the data step MERGE statement. This does require a sorted dataset as Steve has stated, but a SQL JOIN can also merge datasets and does not require a sort in advance.

Solution

10-29-2013
10:48 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 10:48 AM

While there are many ways to do the 'Merging' without presorting the data, such as Proc SQL, Hash(), Proc Format among others, I suppose you are asking use of Merge statement in Data Step.

Data Step is sequential utility, meaning it goes through obs from top down, one by one till the end of the table. So when doing the merge by, it only makes sense that both tables are presorted by 'BY' variable, where data step is able to check and exhaust the same values of 'BY' variable from both table at the same time. I can't imagine any other way to do the "merge" given the way how it works.

HTH,

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 11:22 AM

my question is specific about Merging using Merge statement.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 11:36 AM

What hasn't been mentioned is that while the observations need to be "presented" to the MERGE in order. The contributing data sets don't have to be physically sorted. Can you think of a way to logically "sort" a data set?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-29-2013 11:20 PM

Are you referring TAGSORT option in Proc Sort where only the 'by' variable get sorted Or are you referring Sortedby= dataset option where a table is only allegedly sorted?

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-30-2013 03:44 AM

I was thinking of INDEX.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-30-2013 04:26 AM

Yep I was thinking the same thing. Another option to logically "sort" a dataset is to use a SPDE dataset, where the engine returns the data in sorted order according to the BY variables.

And most merges via SQL do a sort under the covers. So, while it may be "syntactically elegant" not to pre-sort your data, it's still sorting behind the scenes. Use the _method option on the SQL statement to get more details on how SQL is doing the merge.

It's probably outside the scope of this thread to go into all the scenarios when SQL sorts/doesn't sort. But further details are in this excellent Usage Note by Paul Kent. IMO all users of SQL should read this

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

a week ago

I was trying to find the answer to the same question and below explanation gives you better understanding i hope.

The merge join works by simultaneously reading and comparing the two sorted inputs one row at a time. At each step, we compare the next row from each input. If the rows are equal, we output a joined row and continue. If the rows are not equal, we discard the lesser of the two inputs and continue. Since the inputs are sorted, we know that we are discarding a row that is less than any of the remaining rows in either input and, thus, can never join.

We can express the algorithm in pseudo-code as:

get first row R1 from input 1

get first row R2 from input 2

while not at the end of either input

begin

if R1 joins with R2

begin

return (R1, R2)

get next row R2 from input 2

end

else if R1 < R2

get next row R1 from input 1

else

get next row R2 from input 2

end

Source :https://blogs.msdn.microsoft.com/craigfr/2006/08/03/merge-join/