Grouping Trends by Monotonicity

skipjimroo · Posted 02-14-2023 04:02 AM

Hi there,

I I have output from proc UCM with the additive outliers (spikes) and level shifts labelled. (I am running in9.4.) I want to:

Extend these labels to the start and end of the increase period. The start period is the month before the increase. The end period is the last month of the increase.
Then summarise the data so that only the start and end dates and values are shown.
I also want level shifts to take precedence, so that if a level shift and spike both occur during this increase period, then the increase is classed as a level shift.

The input data looks like this

data got;
    infile datalines dsd delimiter='|';
format ID $1. Date date9. value 8. valuediff 8. breaktype $12.;
informat date ddmmyy10.; 
   input ID Date Value ValueDiff Breaktype;
    datalines;
A|01/01/2022|1602040||NoBreak
A|01/02/2022|1997341|395301|NoBreak
A|01/03/2022|45544|-1951797|NoBreak
A|01/04/2022|138421|92877|LevelShift
A|01/05/2022|999286|860865|NoBreak
A|01/06/2022|1178399|179113|Spike
A|01/07/2022|881297|-297102|NoBreak
A|01/08/2022|949765|68468|LevelShift
A|01/09/2022|1308423|358658|NoBreak
A|01/10/2022|1004914|-303509|NoBreak
A|01/11/2022|1191019|186105|NoBreak
A|01/12/2022|803331|-387688|NoBreak
B|01/01/2022|921410||NoBreak
B|01/02/2022|1068787|147377|NoBreak
B|01/03/2022|297284|771503|NoBreak
B|01/04/2022|338337|41053|LevelShift
B|01/05/2022|487583|149246|NoBreak
B|01/06/2022|696031|208448|Spike
B|01/07/2022|689942|6089|NoBreak
B|01/08/2022|763417|73475|LevelShift
B|01/09/2022|983677|220260|NoBreak
B|01/10/2022|545737|437940|NoBreak
B|01/11/2022|740081|194344|NoBreak
B|01/12/2022|533809|206272|NoBreak
;
run;

My target data should look like this:

ID	break_startdate	break_enddate	breaktype	startbreak_value	endbreak_value
A	01/03/2022	01/06/2022	LevelShift	45544	1178399
A	01/07/2022	01/09/2022	LevelShift	881297	1308423
B	01/03/2022	01/06/2022	LevelShift	297284	696031
B	01/07/2022	01/09/2022	LevelShift	689942	983677

I have created some code that works in some circumstances, but not all.

It cant handle multiple breaks in the same id.
It doesnt let level shifts take precedence over spikes.
And it doesnt work if the break label isnt at the start of the increase. I have tried using lag functions, but the issue is that you dont know how many rows to lag by.


data need
(keep=id

startbreakvalue endbreakvalue newbreak
structbreak_startdate structbreak_enddate rename=(newbreak=breaktype));
set got;
by id date breaktype;
retain structbreak_startdate structbreak_enddate
startbreakvalue endbreakvalue newbreak;
format structbreak_startdate structbreak_enddate date9.;

if first.id =1   then do;
startbreakvalue = 99999;
endbreakvalue = 0;
structbreak_startdate = '31dec3500'd;
structbreak_enddate = '1jan1960'd;
newbreak =breaktype;
end;
else do;
if breaktype ne "NoBreak" then do;
newbreak =breaktype;
structbreak_enddate ='31dec3500'd;
structbreak_startdate =intnx('month',date,-1);
startbreakvalue = value-valuediff;
end;

if breaktype = "NoBreak" and valuediff<0 then do;
x=1;
structbreak_enddate =min(structbreak_enddate,intnx('month',date,-1));
endbreakvalue = max(endbreakvalue,value-valuediff);
end;
end;
if last.id = 1;

;
run;

Any help would be greatly appreciated!

Grouping Trends by Monotonicity

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!