BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Georg_UPB
Fluorite | Level 6

Hi,

I have a dataset in which Obs can become either "1" or "0". For every observation where Obs is "0", it needs to be determined the time when Obs started to be "0" (Time_first), the next time it becomes "1" (Time_last), and the time of the next observation (Time_next).

The best solution I found so far requires two sorting steps. Since it is a pretty large dataset, I would highly appreciate any help to improve the code.

Data Test;

    Format

        Time time8.

        Obs best2.

        Time_first time8.;

    Input

        Time :time8.

        Obs :best2.;

    Retain

        Time_first;

    * Determine Time_first ;

    If Obs=0 Then Do;

        If Time_first=. Then Time_first=Time;

    End;

    Else

        Time_first=.;

Datalines;

09:04:00 1

09:08:00 0

09:11:00 1

09:12:00 0

09:15:00 0

09:16:00 0

09:20:00 1

Run;

Proc Sort Data=Test;

    By descending Time;

Run;

Data Test;

    Set Test;

    FORMAT

        Time_last time8.

        Time_next time8.;

    RETAIN

        Time_last;

    * Determine Time_next (becomes zero later on if Obs=1) ;

    Time_next=lag(Time);

    * Determine Time_last ;

    If Obs=0 Then Do;

        If Time_last=. Then Time_last=Time_next;

    End;

    Else Do;

        Time_last=.;

        Time_next=.;

    End;

 

Run;

Proc Sort Data=Test Out=Test;

    By Time;

Run;

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

OK, here's one approach.  It assumes your data are already sorted by TIME.

data want;

set have end=done nobs=_nobs_;

if obs=0 then do;

   if done=0 then do pointer=_n_ + 1 to _nobs_ until (test_obs=1);

       set have point=pointer (keep=time obs rename=(obs=test_obs time=test_time));

       if pointer = _n_ + 1 then time_next = test_time;

       if test_obs=1 then time_last=test_time;

   end;

end;

drop test_obs test_time;

run;

No guarantees as to speed, except that it surely has to beat sorting the data.  Also, it's untested code so the possibility of tweaking it is always on the table.

Good luck.

View solution in original post

4 REPLIES 4
Astounding
PROC Star

Just a couple of questions to get started ...

It looks like the final number of observations should not change ... you want all the original observations with some variables added.  (Please confirm.)

When OBS=0, what is the proper value for NEXT_TIME?  Is it the time on the very next observation, or is it the time following TIME_LAST?

There definitely will be methods that are faster than what you are using now, but the program may still take a while to run.

Georg_UPB
Fluorite | Level 6

@Astounding

You're right, the number of observations should not change and I just want to add information (in cases where variable OBS=0).

TIME_NEXT is always the time of the very next observation, whereas TIME_LAST is the time of the next observation where OBS=1.

Astounding
PROC Star

OK, here's one approach.  It assumes your data are already sorted by TIME.

data want;

set have end=done nobs=_nobs_;

if obs=0 then do;

   if done=0 then do pointer=_n_ + 1 to _nobs_ until (test_obs=1);

       set have point=pointer (keep=time obs rename=(obs=test_obs time=test_time));

       if pointer = _n_ + 1 then time_next = test_time;

       if test_obs=1 then time_last=test_time;

   end;

end;

drop test_obs test_time;

run;

No guarantees as to speed, except that it surely has to beat sorting the data.  Also, it's untested code so the possibility of tweaking it is always on the table.

Good luck.

Georg_UPB
Fluorite | Level 6

Thank you very much for your help!

After implementing your suggestions the code runs 5 times as fast as before.

Data Have;

    Format

        Time time8.

        Obs best2.

        Time_first time8.;

    Input

        Time :time8.

        Obs :best2.;

    Retain

        Time_first;

    * Determine Time_first ;

    If Obs=0 Then Do;

        If Time_first=. Then Time_first=Time;

    End;

    Else

        Time_first=.;

Datalines;

09:04:00 1

09:08:00 0

09:11:00 1

09:12:00 0

09:15:00 0

09:16:00 0

09:20:00 1

Run;

Data Want(Drop=Test_Obs Test_Time);

    Set

        Have End=Done nobs=_nobs_;

    Format

        Time_last time8.

        Time_next time8.;

    If Obs=0 Then Do;

        If Done=0 Then Do Pointer=_n_ + 1 To _nobs_ Until (Test_Obs=1);

            Set Have (Keep=Time Obs Rename=(Time=Test_Time Obs=Test_Obs)) Point=Pointer;           

            If Pointer = _n_ + 1 Then Time_next=Test_Time;

            If Test_Obs=1 Then Time_last=Test_Time;

   

        End;

    End;

Run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1544 views
  • 0 likes
  • 2 in conversation