Solved: Help with Ranking

hhchenfx · Posted 11-24-2020 07:17 PM

Hi Everyone,

I have a time series data and I would like to perform the following rank:

for each row, look back 199 rows (so total 200 rows).

Rank value within 200 rows (so the rank will be 1 to 200)

Identify the top 150,180, 190, 196 values.

I use the percentile concept but it I think I did something wrong.

For illustration purpose, I create a list of value of 1 to 200.

In the first code, when data is sorted in ascending order, the code identifies correctly these 150,180, 190, 196 values for the last row.

However, when I switch position for number 1 and 200, the code Incorrectly identify the percentile value (151 181 191 197) for the last row.

My point is that the change I make is just the order and therefore at the last row, the code still look at the full 200 values as before, and it should return the same percentile value.

Can anyone please help me to fix my code?

Thank you,

HHC

*NUMBERS ARE IN ORDER --> GOOD;
data have;
do value=1 to 200;
output;
end;
run;

data want; set have;
	drop aa1-aa200;
	array a{0:199} aa1-aa200;
	retain aa:;
		pct_low=pctl(75,of a{*}); 
		pct_mid=pctl(90,of a{*});	
		pct_high=pctl(95,of a{*});	
		pct_top=pctl(98,of a{*});

	a{mod(_n_,200)}=value;
run;



*SWICH POSITION OF NUMBER 1 and 200 --> NOT GOOD;
data have2; set have;
if _N_=200 then value=1;
if _N_=1 then value=200;run;


data want2; set have2;
	drop aa1-aa200;
	array a{0:199} aa1-aa200;
	retain aa:;
		pct_low=pctl(75,of a{*}); 
		pct_mid=pctl(90,of a{*});	
		pct_high=pctl(95,of a{*});	
		pct_top=pctl(98,of a{*});

	a{mod(_n_,200)}=value;
run;

mkeintz · Posted 11-24-2020 08:50 PM

If you only moved the assignment statement in the DATA WANT2 step but not the DATA WANT step, then you should not expect the same results. Because the DATA WANT2 step is evaluating 200 obs (200, then 2 through 199, then 1), while the DATA WANT step is still only evaluating 199 values (1 ascending to 199).

In short, data order IS NOT affecting the calculation of percentiles.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 11-24-2020 08:07 PM

The problem is that you are calculating the percentiles PRIOR to assigning value to an element of array a. Therefore the last observation of the first instance is generating percentiles based on 199 values ranging from 1 to 199. It ignores the value=200.

But the second instance is generating percentiles based on 199 values of 200, followed by 2 through 199. It ignores the 1.

Why are you assigning values to the array elements prior to percentile calclulation?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hhchenfx · Posted 11-24-2020 08:30 PM

So I move the array definition above the pct and the result are the same now.

However, the numbers are 150.5 180.5 190.5 and 196.5, still not the exact value yet of (151 181 191 197)


data want2; set have2;
	*drop aa1-aa200;

	array a{0:199} aa1-aa200;

	a{mod(_n_,200)}=value;

	retain aa:;
		pct_low=pctl(75,of a{*}); 
		pct_mid=pctl(90,of a{*});	
		pct_high=pctl(95,of a{*});	
		pct_top=pctl(98,of a{*});

run;

mkeintz · Posted 11-24-2020 08:50 PM

If you only moved the assignment statement in the DATA WANT2 step but not the DATA WANT step, then you should not expect the same results. Because the DATA WANT2 step is evaluating 200 obs (200, then 2 through 199, then 1), while the DATA WANT step is still only evaluating 199 values (1 ascending to 199).

In short, data order IS NOT affecting the calculation of percentiles.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ChrisNZ · Posted 11-24-2020 08:12 PM

This gives the same result for both tables:

data WANT; 
  set HAVE end=LASTOBS;
  array A{200} _temporary_;
  A[_N_]=VALUE;
  if LASTOBS;
  PCT_75=pctl(75,of A{*}); 
  PCT_90=pctl(90,of A{*});	
  PCT_95=pctl(95,of A{*});	
  PCT_98=pctl(98,of A{*});
run;

It is also much more efficient.

Please do not use tabs, the code you pasted is not formatted properly.

High-Performance SAS Coding - Third Edition

Help with Ranking

Re: Help with Ranking

Re: Help with Ranking

Re: Help with Ranking

Re: Help with Ranking

Re: Help with Ranking

Help with Ranking

Re: Help with Ranking

Re: Help with Ranking

Re: Help with Ranking

Re: Help with Ranking

Re: Help with Ranking

Registration is open

SAS Training: Just a Click Away