Contributor
Posts: 73

# Aggregate statistical proc

Hello SAS/STAT Community

It is foundation rule that in 2-dimensional matrix representing time-series data, if columns are time intervals then multiple regression model helps deriving relevant conclusions. So for instance if there are four columns representing quarter 1 through 4 and rows in this matrix represent year 1 through 10, then I am contemplating what is appropriate statistic that can be used to populate cell value in this 2D matrix.

Quarter1Quarter2Quarter3Quarter4
x1x2x3x4
y1y2y3y4
z1z2z3z4
x1x2x3x4
y1y2y3y4
z1z2z3z4
x1x2x3x4
y1y2y3y4
z1z2z3z4
x1x2x3x4

Since original matrix was of (10 x 365) dimension representing data for 10 years for every single day. I can safely use proc mean to populate cell value which essentially boils down to reducing 90 observations into one. Note that cell values in above matrix represent financial data.

Is there any underlying thumb rule to study other statistic than MEAN that might be more  relevant for time-series analysis. Please suggest PROCs that are provided by SAS in statistical package that aggregate data.

Thank you.

Posts: 2,655

## Re: Aggregate statistical proc

Since this is financial data, you may wish to use MEDIAN in proc means as a data reduction tool.  Less chance of bias.

Steve Denham

Contributor
Posts: 73

## Re: Aggregate statistical proc

That sounds elegant way to hash out comparison of MEAN, MEDIAN options and test hypotheses for trends of variables spanning across time interval as against those that are minimally biased temporally.

SKEWNESS, VARIANCE and KURTOSIS are on my list for consideration as well, however I am totally in woods as to mathematically applying these concepts in my data model.  Guess transformation regression is quite challenging in comparison to multiple regression.

Will including two variables in CLASS statement of MEANS proc, in this particular case year and quarter lead into output dataset for median? Deciding between what is more appropriate:

proc means data=input mean median skewness

var price;

class year quarter;

output out=outdataset;

run;

OR

proc means data=input mean median skewness

var price;

class quarter;

by year;

output out=outdataset;

run;

considering data structure of input dataset is as follows:

day year quarter price

Posts: 2,655

## Re: Aggregate statistical proc

Given a choice, I prefer the class statement to the by statement in PROC MEANS.  The results are more complete, and can be identified by the _TYPE_ variable.  But it is a preference, and for huge, sorted datasets, by group processing is probably faster.

Steve Denham

Discussion stats
• 3 replies
• 221 views
• 3 likes
• 2 in conversation