Dear community,
As you can read above I am supposed to work with a huge dataset consisting of yearly earnings numbers of >20000 companies. In particular, I am analyzing the persistence of earnings which is measured (amongst others) by the Beta-coefficient estimator of an ARIMA time series Regression.
I already cleaned the data set by e.g. deleting observations for firms which reported their earnings for less than 7 consecutive years.
Not taking into consideration the ARIMA model specification Theory - assume I use ARIMA (1,0,0) -
my issue is:
Since I have that many companies which I can group the ARIMA procedure for, the CPU takes infinitely long to run a grouped Analysis.
However, if I don't group it (using BY Statement), SAS would not aknowledge the Panel structure of my dataset, would it?
I am a fairly unexperienced SAS user, that's why I'm not sure about this one.
Alternatively, I ran a proc panel Regression, which I assume acknowledges the Panel structure of my data and gave me results for an overall Regression. However, it obviously does not give descriptive statistics (percentiles, mean, …) concerning the Regression properties for each individual Company.
Do you guys have any advice for me to conduct my analysis most efficiently in my case?
Does it even make sense to use proc arima for Panel type data?
Any help will be much appreciated, thanks a lot in advance!
If you want to run regular panel model with a really big dataset, you can try SAS viya PROC CPANEL. It saves data parallel which compute the data much faster. But even for regular SAS it can still process large data. Once I tried PROC PANEL with 10 GB data it ran fine. Have you tried to use PROC PANEL with your data?
Thanks for your feedback.
Yeah, I ran proc panel and it didn't take too much time.
I got seemingly satisfying results but my issue is that I'm not sure about the estimation method used in that kind of procedure or rather if the results are the same as if I had ran a grouped ARIMA time series analysis.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.