DATA Step, Macro, Functions and more

Assigning sequence to records in the HPA environment

Reply
Occasional Contributor
Posts: 12

Assigning sequence to records in the HPA environment

Hi all,

 

In the HPA environment, we wish to place a sequence within a certain rule.

 

Normally, we can easily do this in the old environment. But when we try to do it in the HPA environment, it is not working at HUE. Our guess is that the code we use below is not transferred to HIVE and it is running on the old machine. This creates jobs that last too long on tables with billions of data and eventually end in error.

 

We tried to do this assignment with rank and sort.

The code gets an error when done with Rank.

 

Since the sort is done by splitting the data between the servers, the last join table is wrong because it is sorted in each machine and then merged (the data in each machine is ordered but not suitable for general sorting)

 

 

DATA B;

     SET A;

     BY ID;

     IF FIRST.ID THEN SIRA=0;

     SIRA + 1;

RUN;

 

 

 

Many thanks,

Best Regards

 

Onur

PROC Star
Posts: 2,231

Re: Assigning sequence to records in the HPA environment

I don't know much about HPA, but if it is anything like other highly parallelised environments, ordered tables are not really a feature. They defeat the purpose of the parallelisation, which is to split the table into small random chunks for processing. If you force a known row order on the whole table, you are back to one node only, and performance dies. That's why deriving the median is not generally supported.

So go back a step: why do you need to number the rows?

Ask a Question
Discussion stats
  • 1 reply
  • 109 views
  • 0 likes
  • 2 in conversation