07-01-2013 10:15 PM
So I have a dataset imported. It has about twenty different time series with annual observations from 1950-2012. The time series start and end in different years for each variable. I want to write a macro to delete time series that have incomplete data in the middle. If the variable has missing observations from say 1950-1960 that's ok, as long as 1961-whenever is continuous (no holes). Basically I'm thinking for each variable I need to record when the time series starts, when it ends and check for missing values in between. I have no idea how to do this...
07-01-2013 11:13 PM
My sample data's at work and I can't access it for now. Each variable is a time series and I have a "date" variable which each observation is a year. Date is like
The other time series all start and end at different dates. I just need to get rid of the incomplete ones.
07-01-2013 11:50 PM
It is hard to give you some advice ,since you don't post the sample and the output you need.
Base on my understanding.
if dif(date) ne 1 then delete ;
07-03-2013 04:41 PM
As Ksharp mentioned without seeing the data it is tricky to give advice.
Assuming that you have access to SAS/ETS software - then PROC TIMESERIES might be beneficial.
Here is an example to illustrate what I have in mind.
*create some test series (6 in total - 3 problematic ones);
if date gt "01JAN59"d then delete;
if date lt "01JAN50"d then delete;
if date gt "01MAR53"d and date lt "01JUL53"d then delete;
if date gt "01AUG58"d and date lt "01DEC58"d then delete;
if date="01AUG53"d then delete;
*I take it that you are interested in flagging series 3,4 and 6 - since they have gaps. All other series are complete - they might be shorter than others, though;
proc timeseries data=have out=_null_ outsum=want(where=(nmiss ne 0) keep=series start end nobs n nmiss);
id date interval=month zeromiss=both;
Want data set:
Series START END NOBS N NMISS
3 JAN1949 DEC1960 144 141 3
4 JAN1949 DEC1960 144 141 3
6 JAN1950 DEC1960 132 131 1
07-03-2013 05:51 PM
I am picturing data with one row per year and 20 variables.
Year S1 S2 S3
1950 . 100 .
1951 . 102 203
1952 50 103 204
1953 51 . .
So for that data it looks like S1 is recorded from 1952 to 1953, S2 from 1950 to 1952 and S3 from 1951 to 1952.
Do you want to find those boundaries?
Are you worried about data with gaps? For example if S2 was missing in 1951 then you couldn't use 1950 to 1952 because of the gap?
Is this what you are talking about?