Need a sorted data before MERGING????

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 81
Accepted Solution

Need a sorted data before MERGING????

Hi All,

Can anybody tell me, why do we need a sorted data before merging while it can be sort after merging as well?

Regards

Anand


Accepted Solutions
Solution
‎10-29-2013 10:48 AM
Respected Advisor
Posts: 3,124

Re: Need a sorted data before MERGING????

While there are many ways to do the 'Merging' without presorting the data, such as Proc SQL, Hash(), Proc Format among others, I suppose you are asking use of Merge statement in Data Step.

Data Step is sequential utility, meaning it goes through obs from top down, one by one till the end of the table. So when doing the merge by, it only makes sense that both tables are presorted by 'BY' variable, where data step is able to check and exhaust the same values of  'BY' variable from both table at the same time. I can't imagine any other way to do the "merge" given the way how it works.

HTH,

Haikuo

View solution in original post


All Replies
Respected Advisor
Posts: 2,655

Re: Need a sorted data before MERGING????

Because, unlike a SQL join, a merge is essentially a row-by-row operation, and if the keys are not in the same order on both data sets, the merge can't proceed.

Steve Denham

Super User
Posts: 17,942

Re: Need a sorted data before MERGING????

"Merging" data sets together can be done in many ways, one of which is the data step MERGE statement. This does require a sorted dataset as Steve has stated, but a SQL JOIN can also merge datasets and does not require a sort in advance.

Solution
‎10-29-2013 10:48 AM
Respected Advisor
Posts: 3,124

Re: Need a sorted data before MERGING????

While there are many ways to do the 'Merging' without presorting the data, such as Proc SQL, Hash(), Proc Format among others, I suppose you are asking use of Merge statement in Data Step.

Data Step is sequential utility, meaning it goes through obs from top down, one by one till the end of the table. So when doing the merge by, it only makes sense that both tables are presorted by 'BY' variable, where data step is able to check and exhaust the same values of  'BY' variable from both table at the same time. I can't imagine any other way to do the "merge" given the way how it works.

HTH,

Haikuo

Frequent Contributor
Posts: 81

Re: Need a sorted data before MERGING????

my question is specific about Merging using Merge statement.

Respected Advisor
Posts: 3,777

Re: Need a sorted data before MERGING????

What hasn't been mentioned is that while the observations need to be "presented" to the MERGE in order.  The contributing data sets don't have to be physically sorted.  Can you think of a way to logically "sort" a data set?

Respected Advisor
Posts: 3,124

Re: Need a sorted data before MERGING????

Are you referring TAGSORT option in Proc Sort where only the 'by' variable get sorted Or are you referring Sortedby= dataset option where a table is only allegedly sorted?

Haikuo

Respected Advisor
Posts: 3,777

Re: Need a sorted data before MERGING????

I was thinking of INDEX.

Super Contributor
Posts: 377

Re: Need a sorted data before MERGING????

Yep I was thinking the same thing.  Another option to logically "sort" a dataset is to use a SPDE dataset, where the engine returns the data in sorted order according to the BY variables.

And most merges via SQL do a sort under the covers.  So, while it may be "syntactically elegant" not to pre-sort your data, it's still sorting behind the scenes.  Use the _method option on the SQL statement to get more details on how SQL is doing the merge.

It's probably outside the scope of this thread to go into all the scenarios when SQL sorts/doesn't sort.  But further details are in this excellent Usage Note by Paul Kent.  IMO all users of SQL should read this Smiley Happy

TS-DOC: TS-553 - SQL Joins -- The Long and The Short of It

Frequent Learner
Posts: 1

Re: Need a sorted data before MERGING????

 

I was trying to find the answer to the same question and below explanation gives you better understanding i hope.

 

The merge join works by simultaneously reading and comparing the two sorted inputs one row at a time.  At each step, we compare the next row from each input.  If the rows are equal, we output a joined row and continue.  If the rows are not equal, we discard the lesser of the two inputs and continue.  Since the inputs are sorted, we know that we are discarding a row that is less than any of the remaining rows in either input and, thus, can never join.

 

We can express the algorithm in pseudo-code as:

 

get first row R1 from input 1
get first row R2 from input 2
while not at the end of either input
    begin
        if R1 joins with R2
            begin
                return (R1, R2)
                get next row R2 from input 2
            end
        else if R1 < R2
            get next row R1 from input 1
        else
            get next row R2 from input 2
    end

 

Source :https://blogs.msdn.microsoft.com/craigfr/2006/08/03/merge-join/

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 1282 views
  • 6 likes
  • 7 in conversation