🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Fluorite | Level 6

## Finding first, next, and last observation

Hi,

I have a dataset in which Obs can become either "1" or "0". For every observation where Obs is "0", it needs to be determined the time when Obs started to be "0" (Time_first), the next time it becomes "1" (Time_last), and the time of the next observation (Time_next).

The best solution I found so far requires two sorting steps. Since it is a pretty large dataset, I would highly appreciate any help to improve the code.

Data Test;

Format

Time time8.

Obs best2.

Time_first time8.;

Input

Time :time8.

Obs :best2.;

Retain

Time_first;

* Determine Time_first ;

If Obs=0 Then Do;

If Time_first=. Then Time_first=Time;

End;

Else

Time_first=.;

Datalines;

09:04:00 1

09:08:00 0

09:11:00 1

09:12:00 0

09:15:00 0

09:16:00 0

09:20:00 1

Run;

Proc Sort Data=Test;

By descending Time;

Run;

Data Test;

Set Test;

FORMAT

Time_last time8.

Time_next time8.;

RETAIN

Time_last;

* Determine Time_next (becomes zero later on if Obs=1) ;

Time_next=lag(Time);

* Determine Time_last ;

If Obs=0 Then Do;

If Time_last=. Then Time_last=Time_next;

End;

Else Do;

Time_last=.;

Time_next=.;

End;

Run;

Proc Sort Data=Test Out=Test;

By Time;

Run;

1 ACCEPTED SOLUTION

Accepted Solutions
PROC Star

## Re: Finding first, next, and last observation

OK, here's one approach.  It assumes your data are already sorted by TIME.

data want;

set have end=done nobs=_nobs_;

if obs=0 then do;

if done=0 then do pointer=_n_ + 1 to _nobs_ until (test_obs=1);

set have point=pointer (keep=time obs rename=(obs=test_obs time=test_time));

if pointer = _n_ + 1 then time_next = test_time;

if test_obs=1 then time_last=test_time;

end;

end;

drop test_obs test_time;

run;

No guarantees as to speed, except that it surely has to beat sorting the data.  Also, it's untested code so the possibility of tweaking it is always on the table.

Good luck.

4 REPLIES 4
PROC Star

## Re: Finding first, next, and last observation

Just a couple of questions to get started ...

It looks like the final number of observations should not change ... you want all the original observations with some variables added.  (Please confirm.)

When OBS=0, what is the proper value for NEXT_TIME?  Is it the time on the very next observation, or is it the time following TIME_LAST?

There definitely will be methods that are faster than what you are using now, but the program may still take a while to run.

Fluorite | Level 6

## Re: Finding first, next, and last observation

@Astounding

You're right, the number of observations should not change and I just want to add information (in cases where variable OBS=0).

TIME_NEXT is always the time of the very next observation, whereas TIME_LAST is the time of the next observation where OBS=1.

PROC Star

## Re: Finding first, next, and last observation

OK, here's one approach.  It assumes your data are already sorted by TIME.

data want;

set have end=done nobs=_nobs_;

if obs=0 then do;

if done=0 then do pointer=_n_ + 1 to _nobs_ until (test_obs=1);

set have point=pointer (keep=time obs rename=(obs=test_obs time=test_time));

if pointer = _n_ + 1 then time_next = test_time;

if test_obs=1 then time_last=test_time;

end;

end;

drop test_obs test_time;

run;

No guarantees as to speed, except that it surely has to beat sorting the data.  Also, it's untested code so the possibility of tweaking it is always on the table.

Good luck.

Fluorite | Level 6

## Re: Finding first, next, and last observation

Thank you very much for your help!

After implementing your suggestions the code runs 5 times as fast as before.

Data Have;

Format

Time time8.

Obs best2.

Time_first time8.;

Input

Time :time8.

Obs :best2.;

Retain

Time_first;

* Determine Time_first ;

If Obs=0 Then Do;

If Time_first=. Then Time_first=Time;

End;

Else

Time_first=.;

Datalines;

09:04:00 1

09:08:00 0

09:11:00 1

09:12:00 0

09:15:00 0

09:16:00 0

09:20:00 1

Run;

Data Want(Drop=Test_Obs Test_Time);

Set

Have End=Done nobs=_nobs_;

Format

Time_last time8.

Time_next time8.;

If Obs=0 Then Do;

If Done=0 Then Do Pointer=_n_ + 1 To _nobs_ Until (Test_Obs=1);

Set Have (Keep=Time Obs Rename=(Time=Test_Time Obs=Test_Obs)) Point=Pointer;

If Pointer = _n_ + 1 Then Time_next=Test_Time;

If Test_Obs=1 Then Time_last=Test_Time;

End;

End;

Run;

Discussion stats
• 4 replies
• 1541 views
• 0 likes
• 2 in conversation