Solved: Greenplum SQL related question

gyambqt · Posted 05-28-2020 06:58 AM

Hi Experts,

I have a SQL as shown below:

select stg.* from(
select stg.ID, stg.revision_dttm, stg.no,date, stg.value,
row_number() over(partition by stg.ID order by stg.value desc,
stg.date desc) as tmp_row
from table1 stg
) stg

inner join table2 tref1
on stg.ID = tref1.ID and
stg.revision_dttm = tref1.revision_dttm
and tref1.id2='1'

each time when I run script above I was getting different result for tmp_row. why is this happening?

run1:

ID revision_dttm no value date tmp_row
1 2020-02-14 08:12:49 a 0 2020-01-20 00:00:00 1
1 2020-02-14 08:12:49 b 0 2020-01-20 00:00:00 2

run2:

ID revision_dttm no value date tmp_row
1 2020-02-14 08:12:49 a 0 2020-01-20 00:00:00 2
1 2020-02-14 08:12:49 b 0 2020-01-20 00:00:00 1

JBailey · Posted 05-28-2020 01:45 PM

Hi @gyambqt

Just a guess.

The ORDER BY clause sorts on stg.value and stg.date. These values are the same in your output. So, the sort is coming into the PARTITION clause in a different order but is still correct. Greenplum is a highly parallel database. Nodes could be producing intermediate result sets at different times. Thus, producing a random assignment of the tmp_row value.

Best wishes,

@JBailey

View solution in original post

JBailey · Posted 05-28-2020 01:45 PM

Hi @gyambqt

Just a guess.

The ORDER BY clause sorts on stg.value and stg.date. These values are the same in your output. So, the sort is coming into the PARTITION clause in a different order but is still correct. Greenplum is a highly parallel database. Nodes could be producing intermediate result sets at different times. Thus, producing a random assignment of the tmp_row value.

Best wishes,

@JBailey

Greenplum SQL related question

Re: Greenplum SQL related question

Re: Greenplum SQL related question

Greenplum SQL related question

Re: Greenplum SQL related question

Re: Greenplum SQL related question

Registration is open

SAS Training: Just a Click Away