BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mnaes
Calcite | Level 5

Hi! 

 

We´re taking Big data analytics as an elective and have been stuck for a long time, please try to understand where we are stucked. 

 

We have had applied econometrics last year where we had OLS, heterosked. time series, stationarity etc and finally ARIMA model. However, my understanding is not broad enough to make this forecasting model (therefore, what I write might not be what is correct).

We have built our own input excel sheet where we have listet the APPLE stock price, date, indexed search on iphone, mac etc from google trend, and nasdaq price. Further, built logged values and diflog to returns in SAS. We have tried to build an easy model in arima, and are still working with it (trends, stationarity issues, getting the best lags etc). I am not asking you to do out assignment, but explain because we have been stuck with this problem for a long time.. 

So, we have tried to make a forecast (we did not learn the forecasting CODES last year, but building ARIMA.) 

We have tried to delete 10% of the latest dates/observations for building the "test" set out from the training set. The problem is that we dont understand our teachers codes (she has explained it well and she is a good teacher), but here are some of the codes for forecasting which we just dont get correct in SAS (or dont understand where the codes gets the numbers from/what we have to calculate before making the forecasting). 

 

* Step 1 – we make a copy of the variable we want to forecast out of sample. The result is saved in the temporary data set work.a;

data a;

set data.assignment1;

if date < '1 dec2014'd then return_f = return; else return_f = .;run; quit;

 

* Step 2 – we run a regression and create forecasts P_, and upper and lower confidence limits ucl_ and lcl_;

ods noproctitle;

ods graphics / imagemap=on;

proc reg data=WORK.A alpha=0.05 plots(only)=(diagnostics residuals   rstudentbypredicted observedbypredicted); 

model return_f=like1 omx /; 

output out=WORK.Reg_stats p=p_ lcl=lcl_ ucl=ucl_ r=r_ student=student_   rstudent=rstudent_;

run;quit;

 

*step 3: creating the graph. You can drop and drag to get the first line but need to make changes to the code to obtain more than one line – a series statement for each

ods graphics / reset imagemap;

/*--SGPLOT proc statement--*/

proc sgplot data=WORK.REG_STATS;  ;  /*--Scatter plot settings--*/ 

series x=Date y=return / lineattrs=(color=blue pattern=solid)transparency=0.0   name='actual'; 

series x=date y=P_ / lineattrs=(color=red pattern=solid) transparency=0.0   name='predicted'; 

series x=date y=lcl_ / lineattrs=(color=black   pattern=shortdash)transparency=0.0 name='lower'; 

series x=date y=ucl_ / lineattrs=(color=black   pattern=shortdash)transparency=0.0 name='upper';

refline '01Dec2014'd / axis=x; 

/*--X Axis--*/  xaxis grid;

/*--Y Axis--*/  yaxis grid;

run;ods graphics / reset;

 

Is it possible we dont understand the basis or the coding?? Where/how is the out sheet made and what is needed for making this forecast? And, we dont get the results or we don´t understand the basis (either or). Do we use arima, do we lag and make a simple forecast (or naive)? or what? We´re in the newest SAS (9.4) and attached is an old version of our dataset ...........HELP. 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
udo_sas
SAS Employee

Hello -

Not sure if my response will be useful, but you may want to double check your PROC REG code.

None of the variables used in your "MODEL" statement: model return_f=like1 omx /; seem to be part of your sample data set as far as I can tell. This is probably the reason why your PROC REG code is failing.

 

You wrote: "Where/how is the out sheet made and what is needed for making this forecast?"

On a very high level the flow should be as such:

a) in step 1 you create a table called "a"

b) in step 2 you will want to use this table in PROC REG - make sure that your model statement only contains variables "a" features

c) in step 2 your OUTPUT statement of PROC REG creates a table called WORK.Reg_stats 

d) in step 3 you are using table WORK.Reg_stats in PROC SGPLOT to create a plot

If you get this code to run, you may want to think about if a linear regression model is indeed the best model to run for time series data.

 

Here is a simple forecast using ESM, which you may want to use as a reference. Instead of importing your Excel sheet I'm replicating your data in a data step and then use PROC ESM to create a forecast for the next 7 days. Note that I'm using the BACK option to compare these forecasts to data which was not used for modeling.

 

Hope this gives you a jump start.

Thanks,

Udo

 

data have ;

FORMAT Close BEST12.

Date DATE9.

;

INFORMAT Close BEST11.

Date ANYDTDTE9.

;

INPUT Close

Date

;

cards;

107.955733 8/22/2016

108.293993 8/23/2016

107.478182 8/24/2016

107.020532 8/25/2016

106.393753 8/26/2016

106.274363 8/29/2016

105.458552 8/30/2016

105.558040 8/31/2016

106.184827 9/1/2016

107.179719 9/2/2016

107.149865 9/6/2016

107.806498 9/7/2016

104.981001 9/8/2016

102.603209 9/9/2016

104.901415 9/12/2016

107.398588 9/13/2016

111.199076 9/14/2016

114.979668 9/15/2016

114.332987 9/16/2016

112.999835 9/19/2016

112.989884 9/20/2016

112.969990 9/21/2016

114.034524 9/22/2016

112.134277 9/23/2016

112.303406 9/26/2016

112.512333 9/27/2016

113.367940 9/28/2016

111.606985 9/29/2016

112.472544 9/30/2016

111.945245 10/3/2016

112.422796 10/4/2016

112.472544 10/5/2016

113.308249 10/6/2016

113.477379 10/7/2016

115.457220 10/10/2016

115.705943 10/11/2016

116.740624 10/12/2016

116.382470 10/13/2016

117.029143 10/14/2016

116.949558 10/17/2016

116.869965 10/18/2016

116.521754 10/19/2016

116.462055 10/20/2016

116.004406 10/21/2016

117.049045 10/24/2016

117.645979 10/25/2016

114.999563 10/26/2016

113.895240 10/27/2016

113.139120 10/28/2016

112.960039 10/31/2016

110.920507 11/1/2016

111.019995 11/2/2016

109.830002 11/3/2016

108.839996 11/4/2016

110.410004 11/7/2016

111.059998 11/8/2016

110.879997 11/9/2016

107.790001 11/10/2016

108.430000 11/11/2016

105.709999 11/14/2016

;

run;

proc esm data=have plot=forecasts back=7 lead=7;

id date interval=weekday accumulate=total;

forecast close / model=damptrend;

run;

View solution in original post

5 REPLIES 5
DarthPathos
Lapis Lazuli | Level 10

Hi,

 

As I'm also learning forecasting, I have learnt that there isn't a simple "follow these steps" type method to doing these analyses.  It takes a lot of understanding your data and a solid grasp on the different concepts and methods.  

 

Having said that, here are some papers and presentations that I have found helpful.

 

Time Series by Nate Derby

Introducing SAS Forecasting Server - may not be directly related but I found it helpful in understanding interpretation

Introduction to Forecasting Methods

 

I also recommend the following books, if you can find them at your school library:

An Introduction to Time Series Analysis and Forecasting

Practical Time Series Analysis using SAS

SAS for Forecasting Time Series

 

I apologise I can't be more help but this is something I'm still learning myself.  Good luck!

Chris

 

 

Has my article or post helped? Please mark as Solution or Like the article!
udo_sas
SAS Employee

Hello -

Not sure if my response will be useful, but you may want to double check your PROC REG code.

None of the variables used in your "MODEL" statement: model return_f=like1 omx /; seem to be part of your sample data set as far as I can tell. This is probably the reason why your PROC REG code is failing.

 

You wrote: "Where/how is the out sheet made and what is needed for making this forecast?"

On a very high level the flow should be as such:

a) in step 1 you create a table called "a"

b) in step 2 you will want to use this table in PROC REG - make sure that your model statement only contains variables "a" features

c) in step 2 your OUTPUT statement of PROC REG creates a table called WORK.Reg_stats 

d) in step 3 you are using table WORK.Reg_stats in PROC SGPLOT to create a plot

If you get this code to run, you may want to think about if a linear regression model is indeed the best model to run for time series data.

 

Here is a simple forecast using ESM, which you may want to use as a reference. Instead of importing your Excel sheet I'm replicating your data in a data step and then use PROC ESM to create a forecast for the next 7 days. Note that I'm using the BACK option to compare these forecasts to data which was not used for modeling.

 

Hope this gives you a jump start.

Thanks,

Udo

 

data have ;

FORMAT Close BEST12.

Date DATE9.

;

INFORMAT Close BEST11.

Date ANYDTDTE9.

;

INPUT Close

Date

;

cards;

107.955733 8/22/2016

108.293993 8/23/2016

107.478182 8/24/2016

107.020532 8/25/2016

106.393753 8/26/2016

106.274363 8/29/2016

105.458552 8/30/2016

105.558040 8/31/2016

106.184827 9/1/2016

107.179719 9/2/2016

107.149865 9/6/2016

107.806498 9/7/2016

104.981001 9/8/2016

102.603209 9/9/2016

104.901415 9/12/2016

107.398588 9/13/2016

111.199076 9/14/2016

114.979668 9/15/2016

114.332987 9/16/2016

112.999835 9/19/2016

112.989884 9/20/2016

112.969990 9/21/2016

114.034524 9/22/2016

112.134277 9/23/2016

112.303406 9/26/2016

112.512333 9/27/2016

113.367940 9/28/2016

111.606985 9/29/2016

112.472544 9/30/2016

111.945245 10/3/2016

112.422796 10/4/2016

112.472544 10/5/2016

113.308249 10/6/2016

113.477379 10/7/2016

115.457220 10/10/2016

115.705943 10/11/2016

116.740624 10/12/2016

116.382470 10/13/2016

117.029143 10/14/2016

116.949558 10/17/2016

116.869965 10/18/2016

116.521754 10/19/2016

116.462055 10/20/2016

116.004406 10/21/2016

117.049045 10/24/2016

117.645979 10/25/2016

114.999563 10/26/2016

113.895240 10/27/2016

113.139120 10/28/2016

112.960039 10/31/2016

110.920507 11/1/2016

111.019995 11/2/2016

109.830002 11/3/2016

108.839996 11/4/2016

110.410004 11/7/2016

111.059998 11/8/2016

110.879997 11/9/2016

107.790001 11/10/2016

108.430000 11/11/2016

105.709999 11/14/2016

;

run;

proc esm data=have plot=forecasts back=7 lead=7;

id date interval=weekday accumulate=total;

forecast close / model=damptrend;

run;

DarthPathos
Lapis Lazuli | Level 10

@udo_sas that was very helpful, and even though i wasn't the original poster, thanks 🙂

 

have a great one

Chris

 

Has my article or post helped? Please mark as Solution or Like the article!
Mnaes
Calcite | Level 5

Hi,

 

I don´t know if you got my "quick reply".. 

 

Thank you, that helped a lot! Regarding your proc esm code instead of linear; if we are using our imported excel sheet instead of replicating:
1. What is the logic with BACK option? (How do you compare the forecasts to data which was not used for modeling? --> Is this how you use one "training set" and one "test set"?

2. If we have weekly dates and uses the logprice (return), diff and lagged values of these; how will your :

proc esm data=have plot=forecasts back=7 lead=7;

id date interval=weekday accumulate=total;

forecast close / model=damptrend;

run;

Look like? I´ve tried to change the data=have to our dataset and interval=weekday to week but it is hard to understand and thus not get it correct.

Lastly, thank you again for your time, if you have any links/articles/sites you recommend regarding our assignment (I guess you have an idea what we are trying to do), I would be extremely thankful if you share these links/articles:).

Have already read a lot in the SAS book and other recourses provided by our professor.

All the best,
Mikkel

udo_sas
SAS Employee

Hello -

Please excuse for delay in responding.

1. What is the logic with BACK option?

 

Please check out: https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/out-of-sample-range-and-holdout-samp... which should give you some hints.

 

2. Currently ESM does not allow you to incorporate inputs like logprice. My suggestion would be to look at UCM instead. Check out: http://support.sas.com/documentation/cdl/en/etsug/68148/HTML/default/viewer.htm#etsug_ucm_examples04... to get started.

 

3. You will find some additional books here:

https://www.sas.com/store/books/products-solutions/sas-ets/cBooks-cbooks_productsandsolutions-cbooks...

I would also recommend to check out: https://www.otexts.org/fpp

 

Thanks,

Udo

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2973 views
  • 4 likes
  • 3 in conversation