Hi,
how to calculate the classic window aggregations (like cumulative sum, moving average, moving stddev, ....) in CAS on CAS tables with CAS actions only?
Datastep or fedsql are forbidden as answers 😁, I don't want to reinvent the wheel and write the code for this, I need a configurable action to execute these standard calculations (like the old proc expand did).
EDIT:
Perhaps with the action "aggregation.aggregate"? Not sure what it does, the documentation is very unclear and there are no useful examples (https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=v_009&docsetId=casanpg&docsetTarget=cas-ag...)
Thank you very much
Regards
Here is a little bit of code to get you started:
cas mySession;
libname casuser cas caslib="casuser" sessref="mySession";
data casuser.test;
do i=1 to 12;
do y=2020 to 2021;
date=mdy(i, 1, y);
total=round(rand("uniform", 1, 10));
output;
end;
end;
format date date9.;
drop i y;
run;
proc cas;
tbl={name="test", caslib="casuser"};
table.fetch / table=tbl, index=FALSE;
aggregation.aggregate /
table=tbl || {groupBy="Date"}
,varSpecs={{name="Total", subset="MEAN"} }
,ID="Date"
,Interval="MONTH"
,windowInt="QTR"
,casOut={name="summary", caslib="casuser", replace=TRUE}
;
table.fetch /
table={name="summary", caslib="casuser", orderBy={"date"}}
,index=FALSE
;
quit;
proc expand data=casuser.test
out=Stats;
id Date;
convert Total=_total_Summary_Mean_ / transout=(movave 3 trimleft 2);
run;
proc compare base=stats compare=casuser.summary brief transpose;
id date;
run;
PROC COMPARE results show minute, insignificant differences in the calculated values for a few rows, but otherwise results are identical:
The COMPARE Procedure Comparison of WORK.STATS with CASUSER.SUMMARY (Method=EXACT) Comparison Results for Observations date=01OCT2020: Variable Base Value Compare Diff. % Diff _total_Summary_Mean_ 5.000000 5 -8.88178E-16 -1.77636E-14 date=01MAR2021: Variable Base Value Compare Diff. % Diff _total_Summary_Mean_ 4.333333 4.3333333333 8.881784E-16 2.049643E-14 date=01MAY2021: Variable Base Value Compare Diff. % Diff _total_Summary_Mean_ 4.333333 4.3333333333 -8.88178E-16 -2.04964E-14 date=01DEC2021: Variable Base Value Compare Diff. % Diff _total_Summary_Mean_ 6.000000 6 2.664535E-15 4.440892E-14 NOTE: Values of the following 1 variables compare unequal: _total_Summary_Mean_
I have no idea of whether the CAS approach is more or less efficient than proc expand for sorted data from a sas data set.
But it's certainly more flexible. A major shortcoming (IMO) of PROC EXPAND, is that it only provides univariate statistics for windows. An equivalent ordinary data step (and I presume an equivalent PROC CAS) would be much superior to proc expand in generating rolling window bivariate/multivariate stats - for example generating rolling windows sums-of-squares-and-cross-products (SSCP). A rolling window SSCP is ready-made for rolling-window regressions. PROC EXPAND can't do that unless you pre-process the data to generate observation-by-observation individual cross-products.
So @SASJedi , may I shift @Edoedoedo 's goal posts a bit. Might we see CAS actions to generate rolling SSCP?
I'm going to admit right here that I am not statistically inclined, and CSSP is not one of the statistics I'm familiar with - nor is it supported by the aggregation.aggregate CAS action. But your post suggests you may know how to accomplish what you need with DATA step code. If so, give the dataStep.runCode action a whirl 🙂
Since there is a set of really good cumulative functions in VA I guess there must be a use of the aggregate function. What I really want is the cumulative month to date average for all days in a month and a YTD cumulative for all the days in a year.
Traditionally I've always done these sort of tasks with a datastep. That need the data sorted and retain. But I would love to sse how this have been achieved in VA
Pål N
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.