Hi Everyone,
I have a time series data and I would like to perform the following rank:
for each row, look back 199 rows (so total 200 rows).
Rank value within 200 rows, so the rank runs from 1 to 200.
Identify the top 150,180, 190, 196 values.
I use the percentile concept but it doesn't work 100% as expected.
My code is below.
Can anyone please help?
Thank you,
HHC
*calculate Moving percentile for each window based on this record and previous 199 record;
data PRC; set PRC;
drop aa1-aa200;
array a{0:199} aa1-aa200;
retain aa:;
pct_low=pctl(75,of a{*});
pct_mid=pctl(90,of a{*});
pct_high=pctl(95,of a{*});
pct_top=pctl(98,of a{*});
a{mod(_n_,200)}=ROC1;
run;
For one thing, you should insert the current value into the array before calculating percentiles
/* Calculate moving percentiles for each window based on this record and previous 199 records */
data PRC;
set PRC;
array a{0:199} _temporary_;
a{mod(_n_,200)}=ROC1;
if _n_ >= 200 then do;
pct_low = pctl(75, of a{*});
pct_mid = pctl(90, of a{*});
pct_high = pctl(95, of a{*});
pct_top = pctl(98, of a{*});
end;
run;
(untested)
As for what you expect... you should tell us.
What do you expect to do with ties? The PCTL function has 5 variations with the default being 5. Look up the effect of QNTLDEF to see potential effect.
Please describe exactly what was wrong with the percentile approach?
Hi @hhchenfx,
Since you know the ranks in advance, the ORDINAL function or the SMALLEST function -- depending on how you want to handle missing values -- will be easier to use. The SMALLEST function could also be replaced by the LARGEST function. For example, the 196th smallest of 200 non-missing values is the 5th largest. Moreover, you can save the DROP and RETAIN statements as well as the variable names aa1-aa200 by using a temporary array (EDIT: unless aa1-aa200 are already contained in dataset PRC):
array a{0:199} _temporary_;
For one thing, you should insert the current value into the array before calculating percentiles
/* Calculate moving percentiles for each window based on this record and previous 199 records */
data PRC;
set PRC;
array a{0:199} _temporary_;
a{mod(_n_,200)}=ROC1;
if _n_ >= 200 then do;
pct_low = pctl(75, of a{*});
pct_mid = pctl(90, of a{*});
pct_high = pctl(95, of a{*});
pct_top = pctl(98, of a{*});
end;
run;
(untested)
As for what you expect... you should tell us.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.