Solved: How do I find first row of last group in SAS, where ordering matters?

kgym · Posted 07-12-2021 04:32 AM

I'd like to ask help in this, as I am new to SAS, but a PROC SQL approach is usable as well.

My dataset has IDs, a time variable, and a flag. After I sort by id and time, I need to find the first flagged observation of the last flagged group/streak. As in:

ID TIME FLAG
1   2    1
1   3    1
1   4    1
1   5    0
1   6    1
1   7    0
1   8    1
1   9    1
1  10    1
2   2    0
2   3    1
2   4    1
2   5    1
2   6    1
2   7    1

Here I want my script to return the row where time is 8 for ID 1, as it is the first observation from the last "streak", or flagged group. For ID 2 it should be where time is 3.

Desired output:

ID TIME FLAG
1   8    1
2   3    1

I'm trying to wrap my head around using first. and last. here, but I suppose the problem here is that I view temporally displaced flagged groups/streaks as different groups, while SAS looks at them as they are only separated by flag, so a simple "take first. from last." is not sufficient.

I was also thinking of collapsing the flags to a string and using a regex lookahead, but I couldn't come up with either the method or the pattern.

Patrick · Posted 07-12-2021 04:52 AM

Something like below should work.

data have;
  input ID TIME FLAG;
  datalines;
1 2 1
1 3 1
1 4 1
1 5 0
1 6 1
1 7 0
1 8 1
1 9 1
1 10 1
2 2 0
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
;

data inter;
  set have;
  by id time;
  if first.id then Cluster=0;
  if lag(flag) ne flag then Cluster+1;
run;

proc sort data=inter;
  by id descending cluster time;
run;

data want;
  set inter;
  by id;
  if first.id then output;
run;

View solution in original post

Patrick · Posted 07-12-2021 04:52 AM

Something like below should work.

data have;
  input ID TIME FLAG;
  datalines;
1 2 1
1 3 1
1 4 1
1 5 0
1 6 1
1 7 0
1 8 1
1 9 1
1 10 1
2 2 0
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
;

data inter;
  set have;
  by id time;
  if first.id then Cluster=0;
  if lag(flag) ne flag then Cluster+1;
run;

proc sort data=inter;
  by id descending cluster time;
run;

data want;
  set inter;
  by id;
  if first.id then output;
run;

kgym · Posted 07-12-2021 05:02 AM

Wow, thank you for the fast reply. So if I understand it correctly, the code finds clusters based on flag changes, sorts it descending based on cluster, and simply pulls the first observation. Splendid!

Patrick · Posted 07-12-2021 05:11 AM

@kgym Yes, you've got it.

This sort of problem where the row sequence is of relevance is much easier to solve with a SAS data step than with SQL (because the SAS data step processes the data sequentially).

Ksharp · Posted 07-12-2021 08:13 AM

data have;
  input ID TIME FLAG;
  datalines;
1 2 1
1 3 1
1 4 1
1 5 0
1 6 1
1 7 0
1 8 1
1 9 1
1 10 1
2 2 0
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
;
data want;
 do until(last.flag);
  set have;
  by id flag notsorted;
  if first.flag then want_time=time;
 end;
 
 do until(last.flag);
  set have;
  by id flag notsorted;
  if last.id then output;
 end;
 drop time ;
 run;

How do I find first row of last group in SAS, where ordering matters?

Re: How do I find first row of last group in SAS, where ordering matters?

Re: How do I find first row of last group in SAS, where ordering matters?

Re: How do I find first row of last group in SAS, where ordering matters?

Re: How do I find first row of last group in SAS, where ordering matters?

Re: How do I find first row of last group in SAS, where ordering matters?

How do I find first row of last group in SAS, where ordering matters?

Re: How do I find first row of last group in SAS, where ordering matters?

Re: How do I find first row of last group in SAS, where ordering matters?

Re: How do I find first row of last group in SAS, where ordering matters?

Re: How do I find first row of last group in SAS, where ordering matters?

Re: How do I find first row of last group in SAS, where ordering matters?

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away