BookmarkSubscribeRSS Feed
hhchenfx
Barite | Level 11

Hi Everyone,

 

My code below calculates moving percentile for each id based on previous 5 day. This is done for every row in the table. 

 

Since I am interested in only the last few date (say date=5), I don't want to calculate for the whole table as it takes time to create the array for all rows in my big data.

 

I try to use IF function but it doesn’t work. Look like the array is based on 1 row date=5 instead of previous dates.

 

Can you please help to fix?

 

Thanks

 

HHC

 

data have; 
input date  id $ value;
datalines;
1 a 1
1 b 3
1 c 6
2 a 9
2 b 6
2 c 8
3 a 99
3 b 9
3 c 5
3 d 5
4 a 10
4 b 8
4 c 7
5 a 9
5 b 15
5 c 20
6 a 3
6 b 8
6 c 55
;run;

proc sort data=have; by id date;run;

*Calculate percentile for each id FULL TABLE;
data want; set have;
	*drop aa:;
	array a{0: 4} aa1-aa5;
	a{mod(_n_,5)}=value;
	retain aa:;
		pct_95=pctl(95,of a{*});
run;


*****Calculate percentile for each id For JUST few DATE*******;
data want_short; set have;
if date = 5 then do;
	array a{0: 4} aa1-aa5;
	a{mod(_n_,5)}=value;
	retain aa:;
		pct_95=pctl(95,of a{*});
end;
run;
5 REPLIES 5
Ksharp
Super User

I think code should like :

proc sort data=have; by id date;run;

*****Calculate percentile for each id For JUST few DATE*******;
data want_short; set have;
by id ;
if first.id then n=0;
n+1;

	array a{0: 4} _temporary_;
	a{mod(n,5)}=value;

if date = 5 then pct_95=pctl(95,of a{*});

run;
hhchenfx
Barite | Level 11

Hi Ksharp,

Thanks a lot for helping.

Your code cut the time of SAS running by immense amount.

I am not quite understand your code and it would be helpful if you could explain it abit.

For me, you code still create array for every row, right?

If so the time saving coming only from the pct statement since it is conditional on date=5.

HHC

Ksharp
Super User

My array is TEMPORARY array unlike yours VECTOR array which waste too many resource of PC ,but they have the same usage for your case.

 

"For me, you code still create array for every row, right?"

Yes . It is just a temp array out of somewhere . and for every row it is the same for all over time.

 

 

 

"If so the time saving coming only from the pct statement since it is conditional on date=5."

No. time saving is coming from my temporary array .

hhchenfx
Barite | Level 11

Thanks for the explanation.

HHC

mkeintz
PROC Star

Another alternative is to sort by id/date as per @Ksharp , but instead of an array, use  value, lag(value),...,lag4(value) as the arguments of the PCTL function:

 

data have; 
input date  id $ value;
datalines;
1 a 1
1 b 3
1 c 6
2 a 9
2 b 6
2 c 8
3 a 99
3 b 9
3 c 5
3 d 5
4 a 10
4 b 8
4 c 7
5 a 9
5 b 15
5 c 20
6 a 3
6 b 8
6 c 55
;run;

proc sort data=have out=need;
  by id date;
run;

data want;
  set need;
  by id;
  pctl_95=pctl(95,value,lag(value),lag2(value),lag3(value),lag4(value));
  if lag4(id)^=id then call missing(pctl_95);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 561 views
  • 3 likes
  • 3 in conversation