Finding first, next, and last observation

Accepted Solution Solved
Reply
Contributor
Posts: 38
Accepted Solution

Finding first, next, and last observation

Hi,

I have a dataset in which Obs can become either "1" or "0". For every observation where Obs is "0", it needs to be determined the time when Obs started to be "0" (Time_first), the next time it becomes "1" (Time_last), and the time of the next observation (Time_next).

The best solution I found so far requires two sorting steps. Since it is a pretty large dataset, I would highly appreciate any help to improve the code.

Data Test;

    Format

        Time time8.

        Obs best2.

        Time_first time8.;

    Input

        Time :time8.

        Obs :best2.;

    Retain

        Time_first;

    * Determine Time_first ;

    If Obs=0 Then Do;

        If Time_first=. Then Time_first=Time;

    End;

    Else

        Time_first=.;

Datalines;

09:04:00 1

09:08:00 0

09:11:00 1

09:12:00 0

09:15:00 0

09:16:00 0

09:20:00 1

Run;

Proc Sort Data=Test;

    By descending Time;

Run;

Data Test;

    Set Test;

    FORMAT

        Time_last time8.

        Time_next time8.;

    RETAIN

        Time_last;

    * Determine Time_next (becomes zero later on if Obs=1) ;

    Time_next=lag(Time);

    * Determine Time_last ;

    If Obs=0 Then Do;

        If Time_last=. Then Time_last=Time_next;

    End;

    Else Do;

        Time_last=.;

        Time_next=.;

    End;

 

Run;

Proc Sort Data=Test Out=Test;

    By Time;

Run;


Accepted Solutions
Solution
‎07-30-2013 01:10 PM
Super User
Posts: 5,516

Re: Finding first, next, and last observation

Posted in reply to Georg_UPB

OK, here's one approach.  It assumes your data are already sorted by TIME.

data want;

set have end=done nobs=_nobs_;

if obs=0 then do;

   if done=0 then do pointer=_n_ + 1 to _nobs_ until (test_obs=1);

       set have point=pointer (keep=time obs rename=(obs=test_obs time=test_time));

       if pointer = _n_ + 1 then time_next = test_time;

       if test_obs=1 then time_last=test_time;

   end;

end;

drop test_obs test_time;

run;

No guarantees as to speed, except that it surely has to beat sorting the data.  Also, it's untested code so the possibility of tweaking it is always on the table.

Good luck.

View solution in original post


All Replies
Super User
Posts: 5,516

Re: Finding first, next, and last observation

Posted in reply to Georg_UPB

Just a couple of questions to get started ...

It looks like the final number of observations should not change ... you want all the original observations with some variables added.  (Please confirm.)

When OBS=0, what is the proper value for NEXT_TIME?  Is it the time on the very next observation, or is it the time following TIME_LAST?

There definitely will be methods that are faster than what you are using now, but the program may still take a while to run.

Contributor
Posts: 38

Re: Finding first, next, and last observation

Posted in reply to Astounding

@Astounding

You're right, the number of observations should not change and I just want to add information (in cases where variable OBS=0).

TIME_NEXT is always the time of the very next observation, whereas TIME_LAST is the time of the next observation where OBS=1.

Solution
‎07-30-2013 01:10 PM
Super User
Posts: 5,516

Re: Finding first, next, and last observation

Posted in reply to Georg_UPB

OK, here's one approach.  It assumes your data are already sorted by TIME.

data want;

set have end=done nobs=_nobs_;

if obs=0 then do;

   if done=0 then do pointer=_n_ + 1 to _nobs_ until (test_obs=1);

       set have point=pointer (keep=time obs rename=(obs=test_obs time=test_time));

       if pointer = _n_ + 1 then time_next = test_time;

       if test_obs=1 then time_last=test_time;

   end;

end;

drop test_obs test_time;

run;

No guarantees as to speed, except that it surely has to beat sorting the data.  Also, it's untested code so the possibility of tweaking it is always on the table.

Good luck.

Contributor
Posts: 38

Re: Finding first, next, and last observation

Posted in reply to Georg_UPB

Thank you very much for your help!

After implementing your suggestions the code runs 5 times as fast as before.

Data Have;

    Format

        Time time8.

        Obs best2.

        Time_first time8.;

    Input

        Time :time8.

        Obs :best2.;

    Retain

        Time_first;

    * Determine Time_first ;

    If Obs=0 Then Do;

        If Time_first=. Then Time_first=Time;

    End;

    Else

        Time_first=.;

Datalines;

09:04:00 1

09:08:00 0

09:11:00 1

09:12:00 0

09:15:00 0

09:16:00 0

09:20:00 1

Run;

Data Want(Drop=Test_Obs Test_Time);

    Set

        Have End=Done nobs=_nobs_;

    Format

        Time_last time8.

        Time_next time8.;

    If Obs=0 Then Do;

        If Done=0 Then Do Pointer=_n_ + 1 To _nobs_ Until (Test_Obs=1);

            Set Have (Keep=Time Obs Rename=(Time=Test_Time Obs=Test_Obs)) Point=Pointer;           

            If Pointer = _n_ + 1 Then Time_next=Test_Time;

            If Test_Obs=1 Then Time_last=Test_Time;

   

        End;

    End;

Run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 248 views
  • 0 likes
  • 2 in conversation