BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hhchenfx
Rhodochrosite | Level 12

Hi Everyone,

I have a time series data and I would like to perform the following rank:

for each row, look back 199 rows (so total 200 rows).

Rank value within 200 rows (so the rank will be 1 to 200)

Identify the top 150,180, 190, 196 values.

 

I use the percentile concept but it I think I did something wrong.

 

For illustration purpose, I create a list of value of 1 to 200.

In the first code, when data is sorted in ascending order, the code identifies correctly these 150,180, 190, 196 values for the last row.

 

However, when I switch position for number 1 and 200, the code Incorrectly identify the percentile value (151 181 191 197) for the last row.

My point is that the change I make is just the order and therefore at the last row, the code still look at the full 200 values as before, and it should return the same percentile value.

 

Can anyone please help me to fix my code?

 

Thank you,

 

HHC

*NUMBERS ARE IN ORDER --> GOOD;
data have;
do value=1 to 200;
output;
end;
run;

data want; set have;
	drop aa1-aa200;
	array a{0:199} aa1-aa200;
	retain aa:;
		pct_low=pctl(75,of a{*}); 
		pct_mid=pctl(90,of a{*});	
		pct_high=pctl(95,of a{*});	
		pct_top=pctl(98,of a{*});

	a{mod(_n_,200)}=value;
run;



*SWICH POSITION OF NUMBER 1 and 200 --> NOT GOOD;
data have2; set have;
if _N_=200 then value=1;
if _N_=1 then value=200;run;


data want2; set have2;
	drop aa1-aa200;
	array a{0:199} aa1-aa200;
	retain aa:;
		pct_low=pctl(75,of a{*}); 
		pct_mid=pctl(90,of a{*});	
		pct_high=pctl(95,of a{*});	
		pct_top=pctl(98,of a{*});

	a{mod(_n_,200)}=value;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If you only moved the assignment statement in the DATA WANT2 step but not the DATA WANT step, then you should not expect the same results.  Because the DATA WANT2 step is evaluating 200 obs (200, then 2 through 199, then 1), while the DATA WANT step is still only evaluating 199 values (1 ascending to 199).

 

In short, data order IS NOT affecting the calculation of percentiles. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
mkeintz
PROC Star

The problem is that you are calculating the percentiles PRIOR to assigning value to an element of array a.  Therefore the last observation of the first instance is generating percentiles based on 199 values ranging from 1 to 199.  It ignores the value=200.

 

But the second instance is generating percentiles based on 199 values of 200, followed by 2 through 199.  It ignores the 1.

 

Why are you assigning values to the array elements prior to percentile calclulation?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
hhchenfx
Rhodochrosite | Level 12

So I move the array definition above the pct and the result are the same now.

However, the numbers are 150.5 180.5 190.5 and 196.5, still not the exact value yet of (151 181 191 197)

 


data want2; set have2;
	*drop aa1-aa200;

	array a{0:199} aa1-aa200;

	a{mod(_n_,200)}=value;

	retain aa:;
		pct_low=pctl(75,of a{*}); 
		pct_mid=pctl(90,of a{*});	
		pct_high=pctl(95,of a{*});	
		pct_top=pctl(98,of a{*});

run;

 

mkeintz
PROC Star

If you only moved the assignment statement in the DATA WANT2 step but not the DATA WANT step, then you should not expect the same results.  Because the DATA WANT2 step is evaluating 200 obs (200, then 2 through 199, then 1), while the DATA WANT step is still only evaluating 199 values (1 ascending to 199).

 

In short, data order IS NOT affecting the calculation of percentiles. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

This gives the same result for both tables:

 

data WANT; 
  set HAVE end=LASTOBS;
  array A{200} _temporary_;
  A[_N_]=VALUE;
  if LASTOBS;
  PCT_75=pctl(75,of A{*}); 
  PCT_90=pctl(90,of A{*});	
  PCT_95=pctl(95,of A{*});	
  PCT_98=pctl(98,of A{*});
run;

It is also much more efficient.

Please do not use tabs, the code you pasted is not formatted properly.

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1123 views
  • 0 likes
  • 3 in conversation