Hi Everyone,
My code below calculates moving percentile for each id based on previous 5 day. This is done for every row in the table.
Since I am interested in only the last few date (say date=5), I don't want to calculate for the whole table as it takes time to create the array for all rows in my big data.
I try to use IF function but it doesn’t work. Look like the array is based on 1 row date=5 instead of previous dates.
Can you please help to fix?
Thanks
HHC
data have;
input date id $ value;
datalines;
1 a 1
1 b 3
1 c 6
2 a 9
2 b 6
2 c 8
3 a 99
3 b 9
3 c 5
3 d 5
4 a 10
4 b 8
4 c 7
5 a 9
5 b 15
5 c 20
6 a 3
6 b 8
6 c 55
;run;
proc sort data=have; by id date;run;
*Calculate percentile for each id FULL TABLE;
data want; set have;
*drop aa:;
array a{0: 4} aa1-aa5;
a{mod(_n_,5)}=value;
retain aa:;
pct_95=pctl(95,of a{*});
run;
*****Calculate percentile for each id For JUST few DATE*******;
data want_short; set have;
if date = 5 then do;
array a{0: 4} aa1-aa5;
a{mod(_n_,5)}=value;
retain aa:;
pct_95=pctl(95,of a{*});
end;
run;
I think code should like :
proc sort data=have; by id date;run;
*****Calculate percentile for each id For JUST few DATE*******;
data want_short; set have;
by id ;
if first.id then n=0;
n+1;
array a{0: 4} _temporary_;
a{mod(n,5)}=value;
if date = 5 then pct_95=pctl(95,of a{*});
run;
Hi Ksharp,
Thanks a lot for helping.
Your code cut the time of SAS running by immense amount.
I am not quite understand your code and it would be helpful if you could explain it abit.
For me, you code still create array for every row, right?
If so the time saving coming only from the pct statement since it is conditional on date=5.
HHC
My array is TEMPORARY array unlike yours VECTOR array which waste too many resource of PC ,but they have the same usage for your case.
"For me, you code still create array for every row, right?"
Yes . It is just a temp array out of somewhere . and for every row it is the same for all over time.
"If so the time saving coming only from the pct statement since it is conditional on date=5."
No. time saving is coming from my temporary array .
Thanks for the explanation.
HHC
Another alternative is to sort by id/date as per @Ksharp , but instead of an array, use value, lag(value),...,lag4(value) as the arguments of the PCTL function:
data have;
input date id $ value;
datalines;
1 a 1
1 b 3
1 c 6
2 a 9
2 b 6
2 c 8
3 a 99
3 b 9
3 c 5
3 d 5
4 a 10
4 b 8
4 c 7
5 a 9
5 b 15
5 c 20
6 a 3
6 b 8
6 c 55
;run;
proc sort data=have out=need;
by id date;
run;
data want;
set need;
by id;
pctl_95=pctl(95,value,lag(value),lag2(value),lag3(value),lag4(value));
if lag4(id)^=id then call missing(pctl_95);
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.