I'd like to ask help in this, as I am new to SAS, but a PROC SQL approach is usable as well.
My dataset has IDs, a time variable, and a flag. After I sort by id and time, I need to find the first flagged observation of the last flagged group/streak. As in:
ID TIME FLAG 1 2 1 1 3 1 1 4 1 1 5 0 1 6 1 1 7 0 1 8 1 1 9 1 1 10 1 2 2 0 2 3 1 2 4 1 2 5 1 2 6 1 2 7 1
Here I want my script to return the row where time is 8 for ID 1, as it is the first observation from the last "streak", or flagged group. For ID 2 it should be where time is 3.
Desired output:
ID TIME FLAG
1 8 1
2 3 1
I'm trying to wrap my head around using first. and last. here, but I suppose the problem here is that I view temporally displaced flagged groups/streaks as different groups, while SAS looks at them as they are only separated by flag, so a simple "take first. from last." is not sufficient.
I was also thinking of collapsing the flags to a string and using a regex lookahead, but I couldn't come up with either the method or the pattern.
Something like below should work.
data have;
input ID TIME FLAG;
datalines;
1 2 1
1 3 1
1 4 1
1 5 0
1 6 1
1 7 0
1 8 1
1 9 1
1 10 1
2 2 0
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
;
data inter;
set have;
by id time;
if first.id then Cluster=0;
if lag(flag) ne flag then Cluster+1;
run;
proc sort data=inter;
by id descending cluster time;
run;
data want;
set inter;
by id;
if first.id then output;
run;
Something like below should work.
data have;
input ID TIME FLAG;
datalines;
1 2 1
1 3 1
1 4 1
1 5 0
1 6 1
1 7 0
1 8 1
1 9 1
1 10 1
2 2 0
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
;
data inter;
set have;
by id time;
if first.id then Cluster=0;
if lag(flag) ne flag then Cluster+1;
run;
proc sort data=inter;
by id descending cluster time;
run;
data want;
set inter;
by id;
if first.id then output;
run;
@kgym Yes, you've got it.
This sort of problem where the row sequence is of relevance is much easier to solve with a SAS data step than with SQL (because the SAS data step processes the data sequentially).
data have;
input ID TIME FLAG;
datalines;
1 2 1
1 3 1
1 4 1
1 5 0
1 6 1
1 7 0
1 8 1
1 9 1
1 10 1
2 2 0
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
;
data want;
do until(last.flag);
set have;
by id flag notsorted;
if first.flag then want_time=time;
end;
do until(last.flag);
set have;
by id flag notsorted;
if last.id then output;
end;
drop time ;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.