BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kgym
Calcite | Level 5

I'd like to ask help in this, as I am new to SAS, but a PROC SQL approach is usable as well.

My dataset has IDs, a time variable, and a flag. After I sort by id and time, I need to find the first flagged observation of the last flagged group/streak. As in:

ID TIME FLAG
1   2    1
1   3    1
1   4    1
1   5    0
1   6    1
1   7    0
1   8    1
1   9    1
1  10    1
2   2    0
2   3    1
2   4    1
2   5    1
2   6    1
2   7    1

Here I want my script to return the row where time is 8 for ID 1, as it is the first observation from the last "streak", or flagged group. For ID 2 it should be where time is 3.

Desired output:

ID TIME FLAG
1   8    1
2   3    1

I'm trying to wrap my head around using first. and last. here, but I suppose the problem here is that I view temporally displaced flagged groups/streaks as different groups, while SAS looks at them as they are only separated by flag, so a simple "take first. from last." is not sufficient.

I was also thinking of collapsing the flags to a string and using a regex lookahead, but I couldn't come up with either the method or the pattern.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Something like below should work.

data have;
  input ID TIME FLAG;
  datalines;
1 2 1
1 3 1
1 4 1
1 5 0
1 6 1
1 7 0
1 8 1
1 9 1
1 10 1
2 2 0
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
;

data inter;
  set have;
  by id time;
  if first.id then Cluster=0;
  if lag(flag) ne flag then Cluster+1;
run;

proc sort data=inter;
  by id descending cluster time;
run;

data want;
  set inter;
  by id;
  if first.id then output;
run;

 

Patrick_0-1626080014872.png

 

View solution in original post

4 REPLIES 4
Patrick
Opal | Level 21

Something like below should work.

data have;
  input ID TIME FLAG;
  datalines;
1 2 1
1 3 1
1 4 1
1 5 0
1 6 1
1 7 0
1 8 1
1 9 1
1 10 1
2 2 0
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
;

data inter;
  set have;
  by id time;
  if first.id then Cluster=0;
  if lag(flag) ne flag then Cluster+1;
run;

proc sort data=inter;
  by id descending cluster time;
run;

data want;
  set inter;
  by id;
  if first.id then output;
run;

 

Patrick_0-1626080014872.png

 

kgym
Calcite | Level 5
Wow, thank you for the fast reply. So if I understand it correctly, the code finds clusters based on flag changes, sorts it descending based on cluster, and simply pulls the first observation. Splendid!
Patrick
Opal | Level 21

@kgym Yes, you've got it. 

This sort of problem where the row sequence is of relevance is much easier to solve with a SAS data step than with SQL (because the SAS data step processes the data sequentially).

Ksharp
Super User
data have;
  input ID TIME FLAG;
  datalines;
1 2 1
1 3 1
1 4 1
1 5 0
1 6 1
1 7 0
1 8 1
1 9 1
1 10 1
2 2 0
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
;
data want;
 do until(last.flag);
  set have;
  by id flag notsorted;
  if first.flag then want_time=time;
 end;
 
 do until(last.flag);
  set have;
  by id flag notsorted;
  if last.id then output;
 end;
 drop time ;
 run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2887 views
  • 0 likes
  • 3 in conversation