BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Dess
Calcite | Level 5

Heluuu,

I am trying to find a window of interest in my time data. For an item x I have an interval [a, b], and inside this interval at various points I have examinations e. Each examination comes with a date d.

My problem is that in some cases the interval [a, b] is large, and there is a significant gap between the last examination en and b.

I would like to scratch all examinations that occur prior to this gap.

I have the following relevant points in a table:

DATA dsHistory;

     INPUT ItemsX ExamsE EndpointB;

DATALINES;

0001 02/03/95 04/08/13

0001 07/11/98 04/08/13

0001 09/09/02 04/08/13

0001 01/06/06 04/08/13

0205 01/05/87 02/11/10

0205 14/03/01 02/11/10

0205 22/02/04 02/11/10

0205 17/08/08 02/11/10

0205 11/08/09 02/11/10

;

RUN;

Using this example, I would like x0001 to disappear (as there is no eclose to b0001), and I would like the first observation of x0205 to disappear as well (there is a significant gap between the first observation and the others).

In short, my proposed algorithm looks like:

foreach unique subset ItemX x in X

     t_in_sequence = b(x)

     foreach ExamsE e in x

          if t_in_sequence - e > 6 years

               Remove observation corresponding to e(x) from set;

          else

               t_in_sequence = t_examination;

       

Now, my problem is that I don't know how to do this other than splitting the dataset into several thousand subsets, and reiterating through them with a new t_in_sequence every time. How can I go through each x inplace? How can I update t_in_sequence while going through x?

Does anyone have suggestions on how to do this?

Thank you for reading

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Maybe clarify for me, why would 0205 row 1 be removed and not the rest, as they are all > 6 months from endpoint?

Currently I am at:

proc sql;

     create table WANT as

     select     DS.*,

                MDS.MIN_DATE as MIN_DATE format=date9.,

                MDS.MAX_DATE as MAX_DATE format=date9.,

                case  when DS.EXAMSE between CALCULATED MIN_DATE and intnx("MONTHS",CALCULATED MIN_DATE,6) then "Y"

                      else "N" end as RESULT

     from       DSHISTORY DS

     left join  (select distinct ITEMSX,min(EXAMSE) as MIN_DATE,max(EXAMSE) as MAX_DATE from DSHISTORY group by ITEMSX) MDS

     on         DS.ITEMSX=MDS.ITEMSX;

quit;

View solution in original post

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, I don't have a working SAS at the moment, but what about something like:

proc sql;

     create table WANT as

     select     *

     from       DSHISTORY DS

     where     DS.DSEXAMSE >= intck("MONTHS",(select min(THIS.DSEXAMSE) from DSHISTORY THIS where THIS.ITEMSX=DS.ITEMSX),6);

quit;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Maybe clarify for me, why would 0205 row 1 be removed and not the rest, as they are all > 6 months from endpoint?

Currently I am at:

proc sql;

     create table WANT as

     select     DS.*,

                MDS.MIN_DATE as MIN_DATE format=date9.,

                MDS.MAX_DATE as MAX_DATE format=date9.,

                case  when DS.EXAMSE between CALCULATED MIN_DATE and intnx("MONTHS",CALCULATED MIN_DATE,6) then "Y"

                      else "N" end as RESULT

     from       DSHISTORY DS

     left join  (select distinct ITEMSX,min(EXAMSE) as MIN_DATE,max(EXAMSE) as MAX_DATE from DSHISTORY group by ITEMSX) MDS

     on         DS.ITEMSX=MDS.ITEMSX;

quit;

Dess
Calcite | Level 5

The algorithm should go by 6 years, not 6 months.

Hmm, it will take me some time to read up on and hopefully understand the functions you are using.

Thank you

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah, maybe just change the line to
                case  when DS.EXAMSE between CALCULATED MIN_DATE and intnx("YEAR",CALCULATED MIN_DATE,6) then "Y"

intnx is and interval function.

Dess
Calcite | Level 5

I'm afraid I do not understand where DS comes from. Is it a new set? is it a copy of dsHistory?

Thank you.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Its an alias.  When you specify a dataset on a from in SQL, you can put some text after it to use an alias in your code.
   from       DSHISTORY DS

So the alias DS refers to the dataset DSHISTORY.

Dess
Calcite | Level 5

Awesome, now it's almost working.

I got the RESULT column to keep track of everything no more than 6 years from b. Now I just need the timeframe to slide with each new entry (that is, if e is within range of b, make e + 6 years the new cutoff point).

proc sql;

     create table WANT as

     select     *,

                MDS.MIN_DATE as MIN_DATE format=DATETIME20.,

                MDS.MAX_DATE as MAX_DATE format=DATETIME20.,

                case  when ExamsE between CALCULATED MIN_DATE and intnx("DTYEAR",CALCULATED MIN_DATE,6) then "Y"

                      else "N" end as RESULT

     from       dsHistory

     left join  (select distinct dsHistory.ItemsX,min(ExamsE) as MIN_DATE,max(ExamsE) as MAX_DATE from dsHistory group by dsHistory.ItemsX) MDS

     on         dsHistory.ItemsX=MDS.ItemsX;

quit;

Do you have suggestions on how to add in the moving timeframe?

Thank you in advance

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah so your logic is more of:

row 1 - set roling_date=date

row 2 - if date within rolling_date and rolling_date+6 months then rolling_date=date

...

If so then perhaps you would be best off with a datastep and retain, e.g.;

proc sort data=have;

     by ItemsX ExamsE ;

run;

data want;

     set have;

     by itemsx;

     retain rolling_date;

     if first.itemsx then do;

          if examse <= intnx("DTYEAR",endpointb,6) then do;

               rolling_date=examse;

               result="Y";

          else do;

               rolling_date=endpointb;

               result="N";

          end;

     end;

     else do;

          if examse <= intnx("DTYEAR",rolling_date,6) then do;

               rolling_date=examse;

               result="Y";

          else do;

               rolling_date=endpointb;

               result="N";

          end;

     end;

run;

Dess
Calcite | Level 5

Now it's working flawlessly. Thank you so much for the assistance. Hopefully I have learnt something in the process as well Smiley Happy

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3731 views
  • 6 likes
  • 2 in conversation