Hi,
I have a dataset in which Obs can become either "1" or "0". For every observation where Obs is "0", it needs to be determined the time when Obs started to be "0" (Time_first), the next time it becomes "1" (Time_last), and the time of the next observation (Time_next).
The best solution I found so far requires two sorting steps. Since it is a pretty large dataset, I would highly appreciate any help to improve the code.
Data Test;
Format
Time time8.
Obs best2.
Time_first time8.;
Input
Time :time8.
Obs :best2.;
Retain
Time_first;
* Determine Time_first ;
If Obs=0 Then Do;
If Time_first=. Then Time_first=Time;
End;
Else
Time_first=.;
Datalines;
09:04:00 1
09:08:00 0
09:11:00 1
09:12:00 0
09:15:00 0
09:16:00 0
09:20:00 1
Run;
Proc Sort Data=Test;
By descending Time;
Run;
Data Test;
Set Test;
FORMAT
Time_last time8.
Time_next time8.;
RETAIN
Time_last;
* Determine Time_next (becomes zero later on if Obs=1) ;
Time_next=lag(Time);
* Determine Time_last ;
If Obs=0 Then Do;
If Time_last=. Then Time_last=Time_next;
End;
Else Do;
Time_last=.;
Time_next=.;
End;
Run;
Proc Sort Data=Test Out=Test;
By Time;
Run;
OK, here's one approach. It assumes your data are already sorted by TIME.
data want;
set have end=done nobs=_nobs_;
if obs=0 then do;
if done=0 then do pointer=_n_ + 1 to _nobs_ until (test_obs=1);
set have point=pointer (keep=time obs rename=(obs=test_obs time=test_time));
if pointer = _n_ + 1 then time_next = test_time;
if test_obs=1 then time_last=test_time;
end;
end;
drop test_obs test_time;
run;
No guarantees as to speed, except that it surely has to beat sorting the data. Also, it's untested code so the possibility of tweaking it is always on the table.
Good luck.
Just a couple of questions to get started ...
It looks like the final number of observations should not change ... you want all the original observations with some variables added. (Please confirm.)
When OBS=0, what is the proper value for NEXT_TIME? Is it the time on the very next observation, or is it the time following TIME_LAST?
There definitely will be methods that are faster than what you are using now, but the program may still take a while to run.
@Astounding
You're right, the number of observations should not change and I just want to add information (in cases where variable OBS=0).
TIME_NEXT is always the time of the very next observation, whereas TIME_LAST is the time of the next observation where OBS=1.
OK, here's one approach. It assumes your data are already sorted by TIME.
data want;
set have end=done nobs=_nobs_;
if obs=0 then do;
if done=0 then do pointer=_n_ + 1 to _nobs_ until (test_obs=1);
set have point=pointer (keep=time obs rename=(obs=test_obs time=test_time));
if pointer = _n_ + 1 then time_next = test_time;
if test_obs=1 then time_last=test_time;
end;
end;
drop test_obs test_time;
run;
No guarantees as to speed, except that it surely has to beat sorting the data. Also, it's untested code so the possibility of tweaking it is always on the table.
Good luck.
Thank you very much for your help!
After implementing your suggestions the code runs 5 times as fast as before.
Data Have;
Format
Time time8.
Obs best2.
Time_first time8.;
Input
Time :time8.
Obs :best2.;
Retain
Time_first;
* Determine Time_first ;
If Obs=0 Then Do;
If Time_first=. Then Time_first=Time;
End;
Else
Time_first=.;
Datalines;
09:04:00 1
09:08:00 0
09:11:00 1
09:12:00 0
09:15:00 0
09:16:00 0
09:20:00 1
Run;
Data Want(Drop=Test_Obs Test_Time);
Set
Have End=Done nobs=_nobs_;
Format
Time_last time8.
Time_next time8.;
If Obs=0 Then Do;
If Done=0 Then Do Pointer=_n_ + 1 To _nobs_ Until (Test_Obs=1);
Set Have (Keep=Time Obs Rename=(Time=Test_Time Obs=Test_Obs)) Point=Pointer;
If Pointer = _n_ + 1 Then Time_next=Test_Time;
If Test_Obs=1 Then Time_last=Test_Time;
End;
End;
Run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.