I've a daily timeseries data and looking to find out the step changes occurred in the entire time period of year data. If the outlier occurred on one day, the change remained for a week at the minimum
My daily data is looking like :
Date response
1/1/2010 5
1/2/2010 5
1/3/2010 9
1/4/2010 5
1/5/2010 5
1/6/2010 9
1/7/2010 9
1/8/2010 9
1/9/2010 9.5
1/10/210 9.5
1/11/2010 9
1/12/2010 9
1/13/2010 7
1/14/2010 6
1/15/2010 5
I am looking to flag the dates with positive or negative step changes in response (let's say above 90 percentile). In the above 15 day data, I am looking to flag the start date of the step change which is 1/6/2010 as shift in response variable continued for a week but not the date of 1/3/2010 as shift occurred only on that date and shift didn't continue to next date. I've tried proc capability for finding out the 95 percentile and 5 percentiles but it flagged all the dates from 1/6/2010 to 1/12/2010 and also the 1/3/2010. So it didn't work to give the beginning date of the step change or dates where the step change occurred.
If I can pull the start date of the week, where 95 and 5 percentiles occurred, that would be great
Even though I feel this can be done using one-step Hash, but the easier way for me is to generate a intermediate table to first categorize the data, then apply some data step DOW.
data have;
input date :mmddyy10. response ;
format date mmddyy10.;
cards;
1/1/2010 5
1/2/2010 5
1/3/2010 9
1/4/2010 5
1/5/2010 5
1/6/2010 9
1/7/2010 9
1/8/2010 9
1/9/2010 9.5
1/10/2010 9.5
1/11/2010 9
1/12/2010 9
1/13/2010 7
1/14/2010 6
1/15/2010 5
;;;;
proc format;
value res
low -< 9 = 'low'
9 - high ='high'
;
run;
data want1;
set have;
_cat=put(response, res4.);
run;
data want;
do _n_=1 by 1 until (last._cat);
set want1;
by _cat notsorted;
end;
do _i=1 by 1 until (last._cat);
set want1;
by _cat notsorted;
if _cat='high' and _n_>=7 and _i=1 then flag=1;else flag=.;
output;
end;
drop _:;
run;
Haikuo
Update: FWIW, Here is a Hash solution:
data want_hash;
if _n_=1 then do;
declare hash h(ordered:'y');
h.definekey('date');
h.definedata('date','response');
h.definedone();
declare hiter hi('h');
end;
set have ;
if response <9 then do;
_d=date;_r=response;
rc=hi.first();
do _i=1 by 1 while (rc = 0);
if h.num_items >=7 and _i=1 then flag=1; else flag=.;
output;
rc=hi.next();
end;
h.clear();
date=_d;response=_r;output;
end;
else h.replace();
keep date response flag;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.