- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
how to calculate the classic window aggregations (like cumulative sum, moving average, moving stddev, ....) in CAS on CAS tables with CAS actions only?
Datastep or fedsql are forbidden as answers 😁, I don't want to reinvent the wheel and write the code for this, I need a configurable action to execute these standard calculations (like the old proc expand did).
EDIT:
Perhaps with the action "aggregation.aggregate"? Not sure what it does, the documentation is very unclear and there are no useful examples (https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=v_009&docsetId=casanpg&docsetTarget=cas-ag...)
Thank you very much
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is a little bit of code to get you started:
cas mySession;
libname casuser cas caslib="casuser" sessref="mySession";
data casuser.test;
do i=1 to 12;
do y=2020 to 2021;
date=mdy(i, 1, y);
total=round(rand("uniform", 1, 10));
output;
end;
end;
format date date9.;
drop i y;
run;
proc cas;
tbl={name="test", caslib="casuser"};
table.fetch / table=tbl, index=FALSE;
aggregation.aggregate /
table=tbl || {groupBy="Date"}
,varSpecs={{name="Total", subset="MEAN"} }
,ID="Date"
,Interval="MONTH"
,windowInt="QTR"
,casOut={name="summary", caslib="casuser", replace=TRUE}
;
table.fetch /
table={name="summary", caslib="casuser", orderBy={"date"}}
,index=FALSE
;
quit;
proc expand data=casuser.test
out=Stats;
id Date;
convert Total=_total_Summary_Mean_ / transout=(movave 3 trimleft 2);
run;
proc compare base=stats compare=casuser.summary brief transpose;
id date;
run;
PROC COMPARE results show minute, insignificant differences in the calculated values for a few rows, but otherwise results are identical:
The COMPARE Procedure Comparison of WORK.STATS with CASUSER.SUMMARY (Method=EXACT) Comparison Results for Observations date=01OCT2020: Variable Base Value Compare Diff. % Diff _total_Summary_Mean_ 5.000000 5 -8.88178E-16 -1.77636E-14 date=01MAR2021: Variable Base Value Compare Diff. % Diff _total_Summary_Mean_ 4.333333 4.3333333333 8.881784E-16 2.049643E-14 date=01MAY2021: Variable Base Value Compare Diff. % Diff _total_Summary_Mean_ 4.333333 4.3333333333 -8.88178E-16 -2.04964E-14 date=01DEC2021: Variable Base Value Compare Diff. % Diff _total_Summary_Mean_ 6.000000 6 2.664535E-15 4.440892E-14 NOTE: Values of the following 1 variables compare unequal: _total_Summary_Mean_
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have no idea of whether the CAS approach is more or less efficient than proc expand for sorted data from a sas data set.
But it's certainly more flexible. A major shortcoming (IMO) of PROC EXPAND, is that it only provides univariate statistics for windows. An equivalent ordinary data step (and I presume an equivalent PROC CAS) would be much superior to proc expand in generating rolling window bivariate/multivariate stats - for example generating rolling windows sums-of-squares-and-cross-products (SSCP). A rolling window SSCP is ready-made for rolling-window regressions. PROC EXPAND can't do that unless you pre-process the data to generate observation-by-observation individual cross-products.
So @SASJedi , may I shift @Edoedoedo 's goal posts a bit. Might we see CAS actions to generate rolling SSCP?
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm going to admit right here that I am not statistically inclined, and CSSP is not one of the statistics I'm familiar with - nor is it supported by the aggregation.aggregate CAS action. But your post suggests you may know how to accomplish what you need with DATA step code. If so, give the dataStep.runCode action a whirl 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Since there is a set of really good cumulative functions in VA I guess there must be a use of the aggregate function. What I really want is the cumulative month to date average for all days in a month and a YTD cumulative for all the days in a year.
Traditionally I've always done these sort of tasks with a datastep. That need the data sorted and retain. But I would love to sse how this have been achieved in VA
Pål N