Hi there, I I have output from proc UCM with the additive outliers (spikes) and level shifts labelled. (I am running in9.4.) I want to: Extend these labels to the start and end of the increase period. The start period is the month before the increase. The end period is the last month of the increase. Then summarise the data so that only the start and end dates and values are shown. I also want level shifts to take precedence, so that if a level shift and spike both occur during this increase period, then the increase is classed as a level shift. The input data looks like this data got;
infile datalines dsd delimiter='|';
format ID $1. Date date9. value 8. valuediff 8. breaktype $12.;
informat date ddmmyy10.;
input ID Date Value ValueDiff Breaktype;
datalines;
A|01/01/2022|1602040||NoBreak
A|01/02/2022|1997341|395301|NoBreak
A|01/03/2022|45544|-1951797|NoBreak
A|01/04/2022|138421|92877|LevelShift
A|01/05/2022|999286|860865|NoBreak
A|01/06/2022|1178399|179113|Spike
A|01/07/2022|881297|-297102|NoBreak
A|01/08/2022|949765|68468|LevelShift
A|01/09/2022|1308423|358658|NoBreak
A|01/10/2022|1004914|-303509|NoBreak
A|01/11/2022|1191019|186105|NoBreak
A|01/12/2022|803331|-387688|NoBreak
B|01/01/2022|921410||NoBreak
B|01/02/2022|1068787|147377|NoBreak
B|01/03/2022|297284|771503|NoBreak
B|01/04/2022|338337|41053|LevelShift
B|01/05/2022|487583|149246|NoBreak
B|01/06/2022|696031|208448|Spike
B|01/07/2022|689942|6089|NoBreak
B|01/08/2022|763417|73475|LevelShift
B|01/09/2022|983677|220260|NoBreak
B|01/10/2022|545737|437940|NoBreak
B|01/11/2022|740081|194344|NoBreak
B|01/12/2022|533809|206272|NoBreak
;
run;
My target data should look like this: ID break_startdate break_enddate breaktype startbreak_value endbreak_value A 01/03/2022 01/06/2022 LevelShift 45544 1178399 A 01/07/2022 01/09/2022 LevelShift 881297 1308423 B 01/03/2022 01/06/2022 LevelShift 297284 696031 B 01/07/2022 01/09/2022 LevelShift 689942 983677 I have created some code that works in some circumstances, but not all. It cant handle multiple breaks in the same id. It doesnt let level shifts take precedence over spikes. And it doesnt work if the break label isnt at the start of the increase. I have tried using lag functions, but the issue is that you dont know how many rows to lag by.
data need
(keep=id
startbreakvalue endbreakvalue newbreak
structbreak_startdate structbreak_enddate rename=(newbreak=breaktype));
set got;
by id date breaktype;
retain structbreak_startdate structbreak_enddate
startbreakvalue endbreakvalue newbreak;
format structbreak_startdate structbreak_enddate date9.;
if first.id =1 then do;
startbreakvalue = 99999;
endbreakvalue = 0;
structbreak_startdate = '31dec3500'd;
structbreak_enddate = '1jan1960'd;
newbreak =breaktype;
end;
else do;
if breaktype ne "NoBreak" then do;
newbreak =breaktype;
structbreak_enddate ='31dec3500'd;
structbreak_startdate =intnx('month',date,-1);
startbreakvalue = value-valuediff;
end;
if breaktype = "NoBreak" and valuediff<0 then do;
x=1;
structbreak_enddate =min(structbreak_enddate,intnx('month',date,-1));
endbreakvalue = max(endbreakvalue,value-valuediff);
end;
end;
if last.id = 1;
;
run; Any help would be greatly appreciated!
... View more