turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Aggregate statistical proc

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-06-2013 09:58 AM

Hello SAS/STAT Community

It is foundation rule that in 2-dimensional matrix representing time-series data, if columns are time intervals then multiple regression model helps deriving relevant conclusions. So for instance if there are four columns representing quarter 1 through 4 and rows in this matrix represent year 1 through 10, then I am contemplating what is appropriate statistic that can be used to populate cell value in this 2D matrix.

Quarter1 | Quarter2 | Quarter3 | Quarter4 |
---|---|---|---|

x1 | x2 | x3 | x4 |

y1 | y2 | y3 | y4 |

z1 | z2 | z3 | z4 |

x1 | x2 | x3 | x4 |

y1 | y2 | y3 | y4 |

z1 | z2 | z3 | z4 |

x1 | x2 | x3 | x4 |

y1 | y2 | y3 | y4 |

z1 | z2 | z3 | z4 |

x1 | x2 | x3 | x4 |

Since original matrix was of (10 x 365) dimension representing data for 10 years for every single day. I can safely use proc mean to populate cell value which essentially boils down to reducing 90 observations into one. Note that cell values in above matrix represent financial data.

Is there any underlying thumb rule to study other statistic than MEAN that might be more relevant for time-series analysis. Please suggest PROCs that are provided by SAS in statistical package that aggregate data.

Thank you.

Gadkari

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to noobs

06-06-2013 01:44 PM

Since this is financial data, you may wish to use MEDIAN in proc means as a data reduction tool. Less chance of bias.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

06-06-2013 04:26 PM

That sounds elegant way to hash out comparison of MEAN, MEDIAN options and test hypotheses for trends of variables spanning across time interval as against those that are minimally biased temporally.

SKEWNESS, VARIANCE and KURTOSIS are on my list for consideration as well, however I am totally in woods as to mathematically applying these concepts in my data model. Guess transformation regression is quite challenging in comparison to multiple regression.

Will including two variables in CLASS statement of MEANS proc, in this particular case year and quarter lead into output dataset for median? Deciding between what is more appropriate:

proc means data=input mean median skewness

var price;

class year quarter;

output out=outdataset;

run;

OR

proc means data=input mean median skewness

var price;

class quarter;

by year;

output out=outdataset;

run;

considering data structure of input dataset is as follows:

day year quarter price

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to noobs

06-10-2013 10:54 AM

Given a choice, I prefer the class statement to the by statement in PROC MEANS. The results are more complete, and can be identified by the _TYPE_ variable. But it is a preference, and for huge, sorted datasets, by group processing is probably faster.

Steve Denham