BookmarkSubscribeRSS Feed

FREE! SAS Sample Data Sets for Forecasting

Started ‎04-18-2022 by
Modified ‎06-01-2022 by
Views 12,002

SAS provides tons of data sets for free to use with our analytics products for demonstrating the software capabilities, testing out your custom programs and pipelines, and training purposes. But how do you know which data sets are appropriate for forecasting? Where can you find these data sets? How do you make them ready for forecasting? This post will help you figure out which sample data sets can be used for forecasting.

 

Forecasting can be done in both SAS Visual Analytics (using the Forecasting object) and in SAS Visual Forecasting. Below I’ll show some of the readily available data sets that will work well for either or both of these products. Some will work straight out of the box. Others will require a bit of massaging.

I’ll discuss three categories of SAS Sample Data Sets:

 

  • Datasets that ship with the software and are directly accessible in the software
  • Data sets that you can download from the Internet
  • Data sets that you can create from SAS Code; this code to create data sets is also available for download from

be_1_image001.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

The first thing you need for forecasting is an historic record over time with a date, time or datetime variable. This variable must be in the proper format to use it in SAS, i.e., it must be a SAS date, SAS time, or SAS datetime. Also, in the ideal world you would have at least six full cycles of any significant cycles or seasons. So, for example, if there are annual cycles, and you want to forecast the next year, ideally you would have six years of data.

 

While there are a lot of time series data sets, some of them do not have independent variables. This means that you cannot demonstrate either scenario analysis or goal seeking. Others do not have categories and so do not allow you to demonstrate hierarchical forecasting. So to fully understand and illustrate the power of SAS forecasting tools (SAS Visual Analytics forecasting object and SAS Visual Forecasting), your data should include:

 

  • a date, time or datetime variable
  • sufficiently long historic period of data that is or can be rolled up into regular time intervals (e.g., months, days, years, minutes, etc.); ideally this would include at least 6 full cycles of any important cycles or seasons
  • one or more dependent (target) variables
  • independent (input) variables
  • categories to use for BY GROUPS
  • it’s a plus to show additional features if you have:
    • attribute tables
    • segments

 

The PRICEDATA data set is an excellent choice.  The basic table is available in SASHELP. Attribute tables and a data set with segments are available at the following:

 

SAS Visual Forecasting-specific data sets available online https://github.com/vasepu/SAS-Visual-Forecasting---sample-data-sets

 

IMHO, the richest data sets to learn and/or showcase the features of Visual Forecasting are the sample data sets at this link .

 

be_2_image003.png

 

These data are based on sales, profit, etc. over time. They include:

 

  • DCSKINPRODUCT.sas7bdat
  • DCSKINPRODUCT_SEG.sas7bdat
  • eventrepository.sas7bdat
  • pricedata.sashdat
  • pricedata_attributes.sashdat
  • skinproduct.sashdat
  • skinproduct_attributes.sashdat
  • skinproduct_attributes_seg.sas7bdat
  • skinproduct_vfdemo.sas7bdat

 

They will download in a .zip file and you can then extract the individual data sets.

 

be_3_image004.png

 

SASHELP and SAMPSIO data sets

 

My second choice are the SASHELP and SAMPSIO data sets because they ship with the software and are accessible directly from SAS Software, commonly configured to appear in your libraries for example from SAS Studio. To see how to access and load these data sets see my YouTube.

 

  • SASHELP
    • Over 300 datasets, of which some are appropriate for time series, including:
      • AIR
      • AIRLINE
      • CITIDAY, CITIWK, CITIMON, CITIQTR, CITIYR
      • ELECTRIC
      • ORSALES
      • PRICEDATA
      • RETAIL
    • The HOLIDAY data set can be used for events in Visual Forecasting
    • Information about 21 of the SASHELP data sets is at this link, but none of these 21 are appropriate for time series analyses

 

be_4_image005.png

 

be_5_image006.png

 

  • SAMPSIO
    • over 250 datasets (including both real and notional), some are appropriate for time series analysis including:
      • COSMETIC, COSMETIC2
      • HIS_DATA
      • HIST
    • Some of the SAMPSIO datasets that do include date variables are not ideally suited for forecasting because of the short time frame of the data, such as GASPROD, GRAINPRD, OILPROD, RFDATA and STOCKS.
    • data dictionary for 54 of these datasets
    • SAMPSIO data sets that start with DM are not suitable for forecasting. These are data mining data sets and most of them are suitable for SAS Enterprise Miner or SAS Visual Data Mining and Machine Learning as follows:
      • Use DMA[xxxx] data sets and a Data Partition node to create your training, validation, and test data.
      • Use the DML[xxxx] data sets for training; they contain input and target values
      • Use DMT[xxxx] data sets as test data for comparing models
      • Use DMV[xxxx] data sets for validation
      • Use DMS[xxxx] data sets for scoring
      • DMD[xxxx] data sets are NOT suitable for the Enterprise Miner graphic user interface environment

 

be_6_1_image007.png

 

be_7_1_image008.png

 

Depending on your environment and what has been done in it, you may not see the SAMPSIO library listed. By running a simple data step as shown below, you should then see SAMPSIO. data one; set sampsio.hmeq; run; SAS Viya Example Data Sets (csv example data sets) 

 

    • Over 35 datasets (including both real and notional), some are appropriate for time series analysis including:
      • air.csv
      • electric.csv
      • holiday.csv to use for events
      • orsales.csv
      • prodsale.csv, prdsal2.csv, prdsal3.csv
      • pricedata.csv
      • retailbuys.csv
      • us-stormdata-2014…2018.csv
      • waterflow.csv
    • Many of these will require data preparation steps to prepare them for forecasting

 

Use a SAS Program in SAS Studio to Create a Dataset

 

An old school way to get a data set is to create the data set using SAS Code and you can download many of these SAS coding programs that create data. For example, there are a plethora of these programs available in the SAS 9 documentation. Just a few examples of these programs are listed below:

 

 

Real Data from Publicly Available Sources

 

In addition to the many choices of sample data sets that SAS provides, there are many excellent data sets available from the internet that will work with SAS forecasting with just a bit of data munging. One of my favorite sources is the electricity generation data available from the US Energy Information Agency. Another great data set is on US covid cases by state, brought to my attention by my colleague Stacey Wang.

 

Accessing, Loading, and Preparing Data for Forecasting

 

For an exhaustive (and exhausting) demonstration showing you how to find, import and/or load these data sets, see my video.

 

What if you have a date but it’s not a SAS date? There are a couple of ways to create a SAS date, depending on the format of your original date. For example, as shown in SAS Visual Analytics below, you may use DateFromMDY or TreatAs.

 

 

be_8_image009.png

 

 

be_9_image010.png

 

For details on creating SAS dates, see Teri Patsilaras’s post Build a date in SAS Visual Analytics Reports

 

SUMMARY: Beth’s Favorite SAS Data Sets for Forecasting

 

We all have our favorites. Our favorite beverage, our favorite sport, our favorite time of year, our favorite child… oops, maybe not that last one. Well, in any case, I have my favorite data sets that work well for forecasting. See the table below to see my favorite categories in order.

 

be_10_image011.png

 

 

The pricedata and skin product data sets from this link https://github.com/vasepu/SAS-Visual-Forecasting---sample-data-sets are my favorite if I want to illustrate many features of SAS Visual Forecasting and SAS Visual Analytics using the Forecasting object. One version of the pricedata set is also available from SASHELP, and is very useful. As you see below, it includes:

 

  • date in a SAS date format
  • dependent variable (below, sale)
  • independent variables (below, cost and price)
  • by variables for hierarchical forecasting (below, RegionName, ProductLine, and ProductName)

 

be_11_image002.png

 

However, if my main focus is illustrating the concept of forecasting and the ability to capture seasonality, I prefer the airline data set, which is available in SASHELP.

 

be_12_image012.png

 

FOR MORE INFORMATION:

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎06-01-2022 01:32 PM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags