09-13-2017 04:39 AM
In the HPA environment, we wish to place a sequence within a certain rule.
Normally, we can easily do this in the old environment. But when we try to do it in the HPA environment, it is not working at HUE. Our guess is that the code we use below is not transferred to HIVE and it is running on the old machine. This creates jobs that last too long on tables with billions of data and eventually end in error.
We tried to do this assignment with rank and sort.
The code gets an error when done with Rank.
Since the sort is done by splitting the data between the servers, the last join table is wrong because it is sorted in each machine and then merged (the data in each machine is ordered but not suitable for general sorting)
IF FIRST.ID THEN SIRA=0;
SIRA + 1;
09-17-2017 06:03 PM
I don't know much about HPA, but if it is anything like other highly parallelised environments, ordered tables are not really a feature. They defeat the purpose of the parallelisation, which is to split the table into small random chunks for processing. If you force a known row order on the whole table, you are back to one node only, and performance dies. That's why deriving the median is not generally supported.
So go back a step: why do you need to number the rows?