Hi Everyone,
I have a time series data and I would like to perform the following rank:
for each row, look back 199 rows (so total 200 rows).
Rank value within 200 rows, so the rank runs from 1 to 200.
Identify the top 150,180, 190, 196 values.
I use the percentile concept but it doesn't work 100% as expected.
My code is below.
Can anyone please help?
Thank you,
HHC
*calculate Moving percentile for each window based on this record and previous 199 record;
data PRC; set PRC;
drop aa1-aa200;
array a{0:199} aa1-aa200;
retain aa:;
pct_low=pctl(75,of a{*});
pct_mid=pctl(90,of a{*});
pct_high=pctl(95,of a{*});
pct_top=pctl(98,of a{*});
a{mod(_n_,200)}=ROC1;
run;
For one thing, you should insert the current value into the array before calculating percentiles
/* Calculate moving percentiles for each window based on this record and previous 199 records */
data PRC;
set PRC;
array a{0:199} _temporary_;
a{mod(_n_,200)}=ROC1;
if _n_ >= 200 then do;
pct_low = pctl(75, of a{*});
pct_mid = pctl(90, of a{*});
pct_high = pctl(95, of a{*});
pct_top = pctl(98, of a{*});
end;
run;
(untested)
As for what you expect... you should tell us.
What do you expect to do with ties? The PCTL function has 5 variations with the default being 5. Look up the effect of QNTLDEF to see potential effect.
Please describe exactly what was wrong with the percentile approach?
Hi @hhchenfx,
Since you know the ranks in advance, the ORDINAL function or the SMALLEST function -- depending on how you want to handle missing values -- will be easier to use. The SMALLEST function could also be replaced by the LARGEST function. For example, the 196th smallest of 200 non-missing values is the 5th largest. Moreover, you can save the DROP and RETAIN statements as well as the variable names aa1-aa200 by using a temporary array (EDIT: unless aa1-aa200 are already contained in dataset PRC):
array a{0:199} _temporary_;
For one thing, you should insert the current value into the array before calculating percentiles
/* Calculate moving percentiles for each window based on this record and previous 199 records */
data PRC;
set PRC;
array a{0:199} _temporary_;
a{mod(_n_,200)}=ROC1;
if _n_ >= 200 then do;
pct_low = pctl(75, of a{*});
pct_mid = pctl(90, of a{*});
pct_high = pctl(95, of a{*});
pct_top = pctl(98, of a{*});
end;
run;
(untested)
As for what you expect... you should tell us.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.