Hello,
My problem is to first find the top 5 facility countries that had the most sales, which I have done by using PROC FREQ.
which gave me the list of countries by number of sales shown below.
Now, I know the top 5 countries out of 30.
The next task is to predict future sales for 3 months for these (top 5) countries.
I have no idea how to do this... So far I have this code:
which gives me this plot:
I have no knowledge of how to use PROC FORECAST to project the next 3 months of sales...
Any help is appreciated!
Thank you
Hello,
I would certainly check if the de-duplication of dates makes sense ( the way you did it ).
@SASKiwi has proposed to use NODUPKEY option on PROC SORT statement to get rid of duplicate dates (within one and the same facility), but he also warned that this may not be the right way ! Maybe you have to sum the sales amounts on 2 or 3 same dates within one facility. The latter can be done with PROC TIMESERIES.
With regard to the forecasting amount 3 times being the same, that's because the default model is simple (single) exponential smoothing (MODEL=SIMPLE).
There are 8 possible ESM's, you could try them all and choose the best one based on some goodness-of-fit criterion you have learned in class (like MAPE = Mean absolute percentage error).
Here's what you can put on the FORECAST statement behind the forward slash ( / ) :
specifies the forecasting model to be used to forecast the time series. You can specify the following forecasting model-names:
produces no forecast, but the time series is appended with missing values in the OUT= data set. This option is useful when the results stored in the OUT= data set are used in a subsequent analysis where forecasts of the independent variables are needed to forecast the dependent variable.
performs simple (single) exponential smoothing.
performs double (Brown) exponential smoothing.
performs linear (Holt) exponential smoothing.
performs damped trend exponential smoothing.
performs additive seasonal exponential smoothing.
performs multiplicative seasonal exponential smoothing.
uses the Winters multiplicative method.
uses the Winters additive method.
By default, MODEL=SIMPLE.
Good luck,
Koen
Hello,
For an alternative to your PROC FREQ code, see here :
4 ways to find the k smallest and largest data values in SAS
By Rick Wicklin on The DO Loop January 26, 2022
https://blogs.sas.com/content/iml/2022/01/26/k-smallest-largest-data.html
To forecast 3 months ahead:
Do not use PROC FORECAST !! It's obsolete and discarded from the documentation.
If you want an easy method, use PROC ESM (exponential smoothing models with optimized smoothing weights).
Forecasting is a profession in itself. If you build up knowledge, you can use ARIMA models (without or with additional explanatory factors) or other methodologies. ESM is very basic!
Good luck,
Koen
Hi there,
Thanks for the response, I was starting to think no one was going to reply.
I have never used PROC ESM, and this is not discussed in my textbook...
How would I write the code for finding 3 months of sales data for each country?
This is my code:
And the errors I receive:
Thank you for any help
Hello,
What is the forecasting method that your text book discusses?
If you are OK with using Exponential Smoothing, you can try this code :
proc esm data=mis543.toys out=NextMonths_Toys LEAD=3 PLOT=FORECASTS;
by FacilityCountry;
id TransactionMonth interval=month;
forecast ProductPriceActual;
run;
Koen
Thanks for the help.
When I try the code you suggested, I get multiple errors..
I don't know if it changes the code required, but I only need to predict the sales for 5 country facilities including:
"United"
"Spain"
"Sweden"
"German"
"Austra"
(This is how they're spelled in the dataset, its not a typo)
Hello,
Your data is not sorted by country facility and (within the former) by timestamp.
On the ID statement you can add the NOTSORTED option.
But you need to sort at least on the variable that contains your by-groups.
Further more, you have duplicate dates (within one and the same by-group). Get rid of the duplicate dates. Duplicate dates are not allowed in time series analysis and forecasting. I would use PROC TIMESERIES to collapse (summarize) info from x times the same date in one record (observation).
Cheers,
Koen
I do not know how to sort data or change the dates to SAS format. Its odd that this assignment for my class is more about learning how to forecast sales and yet I was provided a dataset that needs to be changed so much. I feel that I should have been taught that first.
How do I sort the country variable? And how can I remove duplicate dates?
Thanks
You use PROC SORT to sort your data. Try this:
proc sort data=mis543.toys
out=toys;
by FacilityCountry TransactionMonth;
run;
proc esm data = toys.....
.......
After running this code:
I get 30 errors:
I feel like I am so close! How do I fix this?
Changing the PROC SORT to remove rows with duplicate keys may get rid of the errors:
proc sort data=mis543.toys
out=toys
nodupkey;
by FacilityCountry TransactionMonth;
run;
However this might not be the correct way to deal with the duplicate rows. You need to understand your input data better to decide on the right way.
You have been a lot of help. Thank you so much for the assistance. I am still not sure what is wrong with this dataset, but I no longer have any errors! As you can see, the predicted "productpriceactual" (sales) has been forecasted for Jan-March 2018 (SUCCESS!) however, the numbers are all the same? Seems very odd... They are like this for all of the FacilityCountries.
Hello,
I would certainly check if the de-duplication of dates makes sense ( the way you did it ).
@SASKiwi has proposed to use NODUPKEY option on PROC SORT statement to get rid of duplicate dates (within one and the same facility), but he also warned that this may not be the right way ! Maybe you have to sum the sales amounts on 2 or 3 same dates within one facility. The latter can be done with PROC TIMESERIES.
With regard to the forecasting amount 3 times being the same, that's because the default model is simple (single) exponential smoothing (MODEL=SIMPLE).
There are 8 possible ESM's, you could try them all and choose the best one based on some goodness-of-fit criterion you have learned in class (like MAPE = Mean absolute percentage error).
Here's what you can put on the FORECAST statement behind the forward slash ( / ) :
specifies the forecasting model to be used to forecast the time series. You can specify the following forecasting model-names:
produces no forecast, but the time series is appended with missing values in the OUT= data set. This option is useful when the results stored in the OUT= data set are used in a subsequent analysis where forecasts of the independent variables are needed to forecast the dependent variable.
performs simple (single) exponential smoothing.
performs double (Brown) exponential smoothing.
performs linear (Holt) exponential smoothing.
performs damped trend exponential smoothing.
performs additive seasonal exponential smoothing.
performs multiplicative seasonal exponential smoothing.
uses the Winters multiplicative method.
uses the Winters additive method.
By default, MODEL=SIMPLE.
Good luck,
Koen
You have all been a tremendous help with this assignment.
I really appreciate it!
proc sort data=mis543.toys
out=toys
nodupkey;
by FacilityCountry TransactionMonth;
run;proc sort data=work.toys
out=sorttoys;
by FacilityCountry TransactionMonth;
run;
proc esm data=work.sorttoys out=NextMonths_Toys LEAD=3 PLOT=FORECASTS;
by FacilityCountry;
id TransactionMonth interval=month;
forecast ProductPriceActual / model=addwinters;
run;
I think this is the right code! Definitely feeling less defeated.
Thanks Again.
Hello,
The 2nd PROC SORT will do nothing.
work.toys and work.sorttoys are entirely equal (including the sort order of the observations).
Here's alternative code (in case nodupkey is not right and you want to sum the amounts that have the same date (within a facility)).
proc sort data=mis543.toys
out=work.toys;
by FacilityCountry TransactionMonth;
run;
proc timeseries data=work.toys out=work.mtoys;
by FacilityCountry ;
id TransactionMonth interval=month
accumulate=TOTAL /*median*/
setmiss=0
/* start='01jan1998'd */
/* end ='31dec2000'd */ ;
var ProductPriceActual;
run;
proc esm data=work.mtoys out=NextMonths_Toys LEAD=3 PLOT=FORECASTS;
by FacilityCountry;
id TransactionMonth interval=month;
forecast ProductPriceActual / model=addwinters;
run;
Good luck,
Koen
I think I'm taking this same class that you were. I am in the same exact boat! Sorted to find top five countries, NOW WHAT!? Textbook is NO help. Modules hardly touch on this subject. (Also, my internet was out for the past two days, causing me to work on this one and two days before it's due, little time to ask prof for help!)
What does "sales" even mean? I interpreted it the same way you did - a frequency count of observations in each country because each row seems to be unique by product ordered. But does sales mean total dollars from "ProductPriceActual"? Do I need to add up all the actual price per country?
I feel like I know what I'd like to do with this dataset to get started, but it's so humongous for just a homework assignment, and it also feels disconnected from the course material. Also, all help I'm finding online is about "forecasting," but forecasting is never mentioned in this course.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.