Heluuu,
I am trying to find a window of interest in my time data. For an item x I have an interval [a, b], and inside this interval at various points I have examinations e. Each examination comes with a date d.
My problem is that in some cases the interval [a, b] is large, and there is a significant gap between the last examination en and b.
I would like to scratch all examinations that occur prior to this gap.
I have the following relevant points in a table:
DATA dsHistory;
INPUT ItemsX ExamsE EndpointB;
DATALINES;
0001 02/03/95 04/08/13
0001 07/11/98 04/08/13
0001 09/09/02 04/08/13
0001 01/06/06 04/08/13
0205 01/05/87 02/11/10
0205 14/03/01 02/11/10
0205 22/02/04 02/11/10
0205 17/08/08 02/11/10
0205 11/08/09 02/11/10
;
RUN;
Using this example, I would like x0001 to disappear (as there is no ei close to b0001), and I would like the first observation of x0205 to disappear as well (there is a significant gap between the first observation and the others).
In short, my proposed algorithm looks like:
foreach unique subset ItemX x in X
t_in_sequence = b(x)
foreach ExamsE e in x
if t_in_sequence - e > 6 years
Remove observation corresponding to e(x) from set;
else
t_in_sequence = t_examination;
Now, my problem is that I don't know how to do this other than splitting the dataset into several thousand subsets, and reiterating through them with a new t_in_sequence every time. How can I go through each x inplace? How can I update t_in_sequence while going through x?
Does anyone have suggestions on how to do this?
Thank you for reading
Maybe clarify for me, why would 0205 row 1 be removed and not the rest, as they are all > 6 months from endpoint?
Currently I am at:
proc sql;
create table WANT as
select DS.*,
MDS.MIN_DATE as MIN_DATE format=date9.,
MDS.MAX_DATE as MAX_DATE format=date9.,
case when DS.EXAMSE between CALCULATED MIN_DATE and intnx("MONTHS",CALCULATED MIN_DATE,6) then "Y"
else "N" end as RESULT
from DSHISTORY DS
left join (select distinct ITEMSX,min(EXAMSE) as MIN_DATE,max(EXAMSE) as MAX_DATE from DSHISTORY group by ITEMSX) MDS
on DS.ITEMSX=MDS.ITEMSX;
quit;
Well, I don't have a working SAS at the moment, but what about something like:
proc sql;
create table WANT as
select *
from DSHISTORY DS
where DS.DSEXAMSE >= intck("MONTHS",(select min(THIS.DSEXAMSE) from DSHISTORY THIS where THIS.ITEMSX=DS.ITEMSX),6);
quit;
Maybe clarify for me, why would 0205 row 1 be removed and not the rest, as they are all > 6 months from endpoint?
Currently I am at:
proc sql;
create table WANT as
select DS.*,
MDS.MIN_DATE as MIN_DATE format=date9.,
MDS.MAX_DATE as MAX_DATE format=date9.,
case when DS.EXAMSE between CALCULATED MIN_DATE and intnx("MONTHS",CALCULATED MIN_DATE,6) then "Y"
else "N" end as RESULT
from DSHISTORY DS
left join (select distinct ITEMSX,min(EXAMSE) as MIN_DATE,max(EXAMSE) as MAX_DATE from DSHISTORY group by ITEMSX) MDS
on DS.ITEMSX=MDS.ITEMSX;
quit;
The algorithm should go by 6 years, not 6 months.
Hmm, it will take me some time to read up on and hopefully understand the functions you are using.
Thank you
Ah, maybe just change the line to 
                case  when DS.EXAMSE between CALCULATED MIN_DATE and intnx("YEAR",CALCULATED MIN_DATE,6) then "Y"
intnx is and interval function.
I'm afraid I do not understand where DS comes from. Is it a new set? is it a copy of dsHistory?
Thank you.
Its an alias.  When you specify a dataset on a from in SQL, you can put some text after it to use an alias in your code.
   from       DSHISTORY DS
So the alias DS refers to the dataset DSHISTORY.
Awesome, now it's almost working.
I got the RESULT column to keep track of everything no more than 6 years from b. Now I just need the timeframe to slide with each new entry (that is, if e is within range of b, make e + 6 years the new cutoff point).
proc sql;
create table WANT as
select *,
MDS.MIN_DATE as MIN_DATE format=DATETIME20.,
MDS.MAX_DATE as MAX_DATE format=DATETIME20.,
case when ExamsE between CALCULATED MIN_DATE and intnx("DTYEAR",CALCULATED MIN_DATE,6) then "Y"
else "N" end as RESULT
from dsHistory
left join (select distinct dsHistory.ItemsX,min(ExamsE) as MIN_DATE,max(ExamsE) as MAX_DATE from dsHistory group by dsHistory.ItemsX) MDS
on dsHistory.ItemsX=MDS.ItemsX;
quit;
Do you have suggestions on how to add in the moving timeframe?
Thank you in advance
Ah so your logic is more of:
row 1 - set roling_date=date
row 2 - if date within rolling_date and rolling_date+6 months then rolling_date=date
...
If so then perhaps you would be best off with a datastep and retain, e.g.;
proc sort data=have;
by ItemsX ExamsE ;
run;
data want;
set have;
by itemsx;
retain rolling_date;
if first.itemsx then do;
if examse <= intnx("DTYEAR",endpointb,6) then do;
rolling_date=examse;
result="Y";
else do;
rolling_date=endpointb;
result="N";
end;
end;
else do;
if examse <= intnx("DTYEAR",rolling_date,6) then do;
rolling_date=examse;
result="Y";
else do;
rolling_date=endpointb;
result="N";
end;
end;
run;
Now it's working flawlessly. Thank you so much for the assistance. Hopefully I have learnt something in the process as well 
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
