Hi!
We´re taking Big data analytics as an elective and have been stuck for a long time, please try to understand where we are stucked.
We have had applied econometrics last year where we had OLS, heterosked. time series, stationarity etc and finally ARIMA model. However, my understanding is not broad enough to make this forecasting model (therefore, what I write might not be what is correct).
We have built our own input excel sheet where we have listet the APPLE stock price, date, indexed search on iphone, mac etc from google trend, and nasdaq price. Further, built logged values and diflog to returns in SAS. We have tried to build an easy model in arima, and are still working with it (trends, stationarity issues, getting the best lags etc). I am not asking you to do out assignment, but explain because we have been stuck with this problem for a long time..
So, we have tried to make a forecast (we did not learn the forecasting CODES last year, but building ARIMA.)
We have tried to delete 10% of the latest dates/observations for building the "test" set out from the training set. The problem is that we dont understand our teachers codes (she has explained it well and she is a good teacher), but here are some of the codes for forecasting which we just dont get correct in SAS (or dont understand where the codes gets the numbers from/what we have to calculate before making the forecasting).
* Step 1 – we make a copy of the variable we want to forecast out of sample. The result is saved in the temporary data set work.a;
data a;
set data.assignment1;
if date < '1 dec2014'd then return_f = return; else return_f = .;run; quit;
* Step 2 – we run a regression and create forecasts P_, and upper and lower confidence limits ucl_ and lcl_;
ods noproctitle;
ods graphics / imagemap=on;
proc reg data=WORK.A alpha=0.05 plots(only)=(diagnostics residuals rstudentbypredicted observedbypredicted);
model return_f=like1 omx /;
output out=WORK.Reg_stats p=p_ lcl=lcl_ ucl=ucl_ r=r_ student=student_ rstudent=rstudent_;
run;quit;
*step 3: creating the graph. You can drop and drag to get the first line but need to make changes to the code to obtain more than one line – a series statement for each
ods graphics / reset imagemap;
/*--SGPLOT proc statement--*/
proc sgplot data=WORK.REG_STATS; ; /*--Scatter plot settings--*/
series x=Date y=return / lineattrs=(color=blue pattern=solid)transparency=0.0 name='actual';
series x=date y=P_ / lineattrs=(color=red pattern=solid) transparency=0.0 name='predicted';
series x=date y=lcl_ / lineattrs=(color=black pattern=shortdash)transparency=0.0 name='lower';
series x=date y=ucl_ / lineattrs=(color=black pattern=shortdash)transparency=0.0 name='upper';
refline '01Dec2014'd / axis=x;
/*--X Axis--*/ xaxis grid;
/*--Y Axis--*/ yaxis grid;
run;ods graphics / reset;
Is it possible we dont understand the basis or the coding?? Where/how is the out sheet made and what is needed for making this forecast? And, we dont get the results or we don´t understand the basis (either or). Do we use arima, do we lag and make a simple forecast (or naive)? or what? We´re in the newest SAS (9.4) and attached is an old version of our dataset ...........HELP.
Hello -
Not sure if my response will be useful, but you may want to double check your PROC REG code.
None of the variables used in your "MODEL" statement: model return_f=like1 omx /; seem to be part of your sample data set as far as I can tell. This is probably the reason why your PROC REG code is failing.
You wrote: "Where/how is the out sheet made and what is needed for making this forecast?"
On a very high level the flow should be as such:
a) in step 1 you create a table called "a"
b) in step 2 you will want to use this table in PROC REG - make sure that your model statement only contains variables "a" features
c) in step 2 your OUTPUT statement of PROC REG creates a table called WORK.Reg_stats
d) in step 3 you are using table WORK.Reg_stats in PROC SGPLOT to create a plot
If you get this code to run, you may want to think about if a linear regression model is indeed the best model to run for time series data.
Here is a simple forecast using ESM, which you may want to use as a reference. Instead of importing your Excel sheet I'm replicating your data in a data step and then use PROC ESM to create a forecast for the next 7 days. Note that I'm using the BACK option to compare these forecasts to data which was not used for modeling.
Hope this gives you a jump start.
Thanks,
Udo
data have ;
FORMAT Close BEST12.
Date DATE9.
;
INFORMAT Close BEST11.
Date ANYDTDTE9.
;
INPUT Close
Date
;
cards;
107.955733 8/22/2016
108.293993 8/23/2016
107.478182 8/24/2016
107.020532 8/25/2016
106.393753 8/26/2016
106.274363 8/29/2016
105.458552 8/30/2016
105.558040 8/31/2016
106.184827 9/1/2016
107.179719 9/2/2016
107.149865 9/6/2016
107.806498 9/7/2016
104.981001 9/8/2016
102.603209 9/9/2016
104.901415 9/12/2016
107.398588 9/13/2016
111.199076 9/14/2016
114.979668 9/15/2016
114.332987 9/16/2016
112.999835 9/19/2016
112.989884 9/20/2016
112.969990 9/21/2016
114.034524 9/22/2016
112.134277 9/23/2016
112.303406 9/26/2016
112.512333 9/27/2016
113.367940 9/28/2016
111.606985 9/29/2016
112.472544 9/30/2016
111.945245 10/3/2016
112.422796 10/4/2016
112.472544 10/5/2016
113.308249 10/6/2016
113.477379 10/7/2016
115.457220 10/10/2016
115.705943 10/11/2016
116.740624 10/12/2016
116.382470 10/13/2016
117.029143 10/14/2016
116.949558 10/17/2016
116.869965 10/18/2016
116.521754 10/19/2016
116.462055 10/20/2016
116.004406 10/21/2016
117.049045 10/24/2016
117.645979 10/25/2016
114.999563 10/26/2016
113.895240 10/27/2016
113.139120 10/28/2016
112.960039 10/31/2016
110.920507 11/1/2016
111.019995 11/2/2016
109.830002 11/3/2016
108.839996 11/4/2016
110.410004 11/7/2016
111.059998 11/8/2016
110.879997 11/9/2016
107.790001 11/10/2016
108.430000 11/11/2016
105.709999 11/14/2016
;
run;
proc esm data=have plot=forecasts back=7 lead=7;
id date interval=weekday accumulate=total;
forecast close / model=damptrend;
run;
Hi,
As I'm also learning forecasting, I have learnt that there isn't a simple "follow these steps" type method to doing these analyses. It takes a lot of understanding your data and a solid grasp on the different concepts and methods.
Having said that, here are some papers and presentations that I have found helpful.
Introducing SAS Forecasting Server - may not be directly related but I found it helpful in understanding interpretation
Introduction to Forecasting Methods
I also recommend the following books, if you can find them at your school library:
An Introduction to Time Series Analysis and Forecasting
Practical Time Series Analysis using SAS
SAS for Forecasting Time Series
I apologise I can't be more help but this is something I'm still learning myself. Good luck!
Chris
Hello -
Not sure if my response will be useful, but you may want to double check your PROC REG code.
None of the variables used in your "MODEL" statement: model return_f=like1 omx /; seem to be part of your sample data set as far as I can tell. This is probably the reason why your PROC REG code is failing.
You wrote: "Where/how is the out sheet made and what is needed for making this forecast?"
On a very high level the flow should be as such:
a) in step 1 you create a table called "a"
b) in step 2 you will want to use this table in PROC REG - make sure that your model statement only contains variables "a" features
c) in step 2 your OUTPUT statement of PROC REG creates a table called WORK.Reg_stats
d) in step 3 you are using table WORK.Reg_stats in PROC SGPLOT to create a plot
If you get this code to run, you may want to think about if a linear regression model is indeed the best model to run for time series data.
Here is a simple forecast using ESM, which you may want to use as a reference. Instead of importing your Excel sheet I'm replicating your data in a data step and then use PROC ESM to create a forecast for the next 7 days. Note that I'm using the BACK option to compare these forecasts to data which was not used for modeling.
Hope this gives you a jump start.
Thanks,
Udo
data have ;
FORMAT Close BEST12.
Date DATE9.
;
INFORMAT Close BEST11.
Date ANYDTDTE9.
;
INPUT Close
Date
;
cards;
107.955733 8/22/2016
108.293993 8/23/2016
107.478182 8/24/2016
107.020532 8/25/2016
106.393753 8/26/2016
106.274363 8/29/2016
105.458552 8/30/2016
105.558040 8/31/2016
106.184827 9/1/2016
107.179719 9/2/2016
107.149865 9/6/2016
107.806498 9/7/2016
104.981001 9/8/2016
102.603209 9/9/2016
104.901415 9/12/2016
107.398588 9/13/2016
111.199076 9/14/2016
114.979668 9/15/2016
114.332987 9/16/2016
112.999835 9/19/2016
112.989884 9/20/2016
112.969990 9/21/2016
114.034524 9/22/2016
112.134277 9/23/2016
112.303406 9/26/2016
112.512333 9/27/2016
113.367940 9/28/2016
111.606985 9/29/2016
112.472544 9/30/2016
111.945245 10/3/2016
112.422796 10/4/2016
112.472544 10/5/2016
113.308249 10/6/2016
113.477379 10/7/2016
115.457220 10/10/2016
115.705943 10/11/2016
116.740624 10/12/2016
116.382470 10/13/2016
117.029143 10/14/2016
116.949558 10/17/2016
116.869965 10/18/2016
116.521754 10/19/2016
116.462055 10/20/2016
116.004406 10/21/2016
117.049045 10/24/2016
117.645979 10/25/2016
114.999563 10/26/2016
113.895240 10/27/2016
113.139120 10/28/2016
112.960039 10/31/2016
110.920507 11/1/2016
111.019995 11/2/2016
109.830002 11/3/2016
108.839996 11/4/2016
110.410004 11/7/2016
111.059998 11/8/2016
110.879997 11/9/2016
107.790001 11/10/2016
108.430000 11/11/2016
105.709999 11/14/2016
;
run;
proc esm data=have plot=forecasts back=7 lead=7;
id date interval=weekday accumulate=total;
forecast close / model=damptrend;
run;
@udo_sas that was very helpful, and even though i wasn't the original poster, thanks 🙂
have a great one
Chris
Hi,
I don´t know if you got my "quick reply"..
Thank you, that helped a lot! Regarding your proc esm code instead of linear; if we are using our imported excel sheet instead of replicating:
1. What is the logic with BACK option? (How do you compare the forecasts to data which was not used for modeling? --> Is this how you use one "training set" and one "test set"?
2. If we have weekly dates and uses the logprice (return), diff and lagged values of these; how will your :
proc esm data=have plot=forecasts back=7 lead=7;
id date interval=weekday accumulate=total;
forecast close / model=damptrend;
run;
Look like? I´ve tried to change the data=have to our dataset and interval=weekday to week but it is hard to understand and thus not get it correct.
Lastly, thank you again for your time, if you have any links/articles/sites you recommend regarding our assignment (I guess you have an idea what we are trying to do), I would be extremely thankful if you share these links/articles:).
Have already read a lot in the SAS book and other recourses provided by our professor.
All the best,
Mikkel
Hello -
Please excuse for delay in responding.
1. What is the logic with BACK option?
Please check out: https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/out-of-sample-range-and-holdout-samp... which should give you some hints.
2. Currently ESM does not allow you to incorporate inputs like logprice. My suggestion would be to look at UCM instead. Check out: http://support.sas.com/documentation/cdl/en/etsug/68148/HTML/default/viewer.htm#etsug_ucm_examples04... to get started.
3. You will find some additional books here:
I would also recommend to check out: https://www.otexts.org/fpp
Thanks,
Udo
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.