Hello,
Please refer to the attached question.
I am not sure what the suggested answer means by "SORT BY provides reducer level sorting instead of job level sorting".
I know what ORDER BY means in the context of PROC SQL and DBMS's in general.
Thanks.
Odesh.
@odesh wrote:
JUst making sure that I understand the difference between ORDER BY and SORT
BY in HIveQL:
1. ORDER BY sorts the entire result set ( which can be be very resource
intensive with a large result set)
2. SORT BY sorts within each reducer which should be more efficient in
terms of processing time.
Am I correct ?
Thanks.
Odesh.
Yes, that's how I understand what's explained under the links I've posted earlier.
This is not SAS but Hive SQL syntax and you would need to ask such a question in a Hadoop/Hive forum. But Googling a bit here what's documented.
Difference between Sort By and Order By
Hive supports SORT BY which sorts the data per reducer. The difference between "order by" and "sort by" is that the former guarantees total order in the output while the latter only guarantees ordering of the rows within a reducer. If there are more than one reducer, "sort by" may give partially ordered final results.
As far as I understand things Hive SQL gets translated into MapReduce for execution. It appears that Hive Sort By and Order By will result in different MapReduce code logic.
Not sure how the answer could be narrower. Hive Sort By results in a sort of rows within a reducer. If you've got more than one reducer then the data isn't sorted over the whole file but only within the chunks per reducer.
@odesh wrote:
JUst making sure that I understand the difference between ORDER BY and SORT
BY in HIveQL:
1. ORDER BY sorts the entire result set ( which can be be very resource
intensive with a large result set)
2. SORT BY sorts within each reducer which should be more efficient in
terms of processing time.
Am I correct ?
Thanks.
Odesh.
Yes, that's how I understand what's explained under the links I've posted earlier.
This is a knowledge-sharing community for learners in the Academy. Find answers to your questions or post here for a reply.
To ensure your success, use these getting-started resources:
Estimating Your Study Time
Reserving Software Lab Time
Most Commonly Asked Questions
Troubleshooting Your SAS-Hadoop Training Environment
Ready to level-up your skills? Choose your own adventure.