A critical assumption in all time series models is that the observations are sampled with the same frequency. Unfortunately, it is often the case that some values of the variable of interest are either missing or unavailable for certain dates in the sample period. There are several ways of dealing with the problem, including aggregation and interpolation, which are illustrated in the example "Transforming the Frequency of Time Series Data."
Many times, however, you merely want to make note of the missing values and proceed with the analysis using only those observations for which you have data. For instance, you are interested in analyzing a company's cash balances at the beginning of the month. A perusal of the records reveals that, before August 1996, entries in the books are sporadic. If the data set is large, it may not be altogether obvious which months are missing.
In the example, the EXPAND procedure is used to provide missing values for the omitted dates in an aperiodic data set.
Suppose that you want to analyze the amount of cash in a company's account at the beginning of each month over the past year and a half. Inspection of the books yields the following data set shown in Figure 1.
data cash;
input date : monyy. balance @@;
label balance = "Cash Account Balance";
format date monyy.;
datalines;
aug95 84 sep95 52 oct95 8 dec95 98 jan96 61 feb96 24 may96 67 jun96 58
aug96 43 sep96 3 oct96 73 nov96 90 dec96 89 jan97 55 feb97 86 mar97 79
apr97 23
;
proc print data=cash;
title 'Cash Account Balances - Original Data';
run;
|
Figure 1: Cash Account Balances - Original Data
Notice that there are several months with no entry.
If you want to create a data set that includes the missing observations for the months with no entry and keep the cash value missing in these observations, you can use the EXPAND procedure with the following options:
proc expand data=cash out=cash2 to=month method=none;
id date;
run;
proc print data=cash2;
title 'Cash Account Balances - Expanded Data';
run;
The DATA= option specifies the input SAS data set as CASH and the OUT= option creates an output data set CASH2. The TO= option determines the frequency of the output data set; in this case, the observations are monthly. The METHOD=NONE option specifies that no interpolation be performed. The METHOD=NONE option cannot be used when frequency conversion is specified; however, in this case, you are interested only in including missing values for the omitted observations. The modified data set appears in Figure 2.
|
Figure 2: Cash Account Balances - Expanded Data
The following code creates the plot of the data in Figure 3 with the missing observations highlighted.
data graph;
set cash2;
if balance=. then unknown=98;
run;
proc gplot data=graph;
plot balance*date unknown*date / overlay vaxis=axis2
href='01nov95'd
href='01mar96'd
href='01apr96'd
href='01jul96'd
chref=red lhref=2
cframe=ligr;
title1 "Plot of Cash Balance at Beginning of Month" h=3;
axis2 label=(angle=90 'Cash Account Balance');
symbol1 c=blue interpol=join value=star;
symbol2 c=black interpol=none font=complex h=7 value="?";
run;
quit;
Figure 3: Beginning of Month Cash Balances with Missing Data
You can now proceed with the analysis using a data set that contains an observation, numerical or missing, for every date in the sampling period.
SAS Institute Inc. (1993), SAS/ETS User's Guide, Version 6, Second Edition, Cary, NC: SAS Institute Inc.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!