Re: Identifying consecutive values

ChiSAS25 · Posted 12-04-2025 04:43 AM

Can you help me program this algorithm that is giving me trouble?

I have a dataset (as the one I've attached) with patient ID and the difference in days between two dates, with a range of -21 to +7.

I need to identify patients who have two consecutive values of flag_KPI = 1 within the range [-21, 0).

If they have only one in the range [-21, 0), then I consider the case where they have two consecutive values of flag_KPI = 1, one in the window [-21, 0) and the other in the window [0, 7].

LinusH · Posted 12-04-2025 08:11 AM

Please attach data using a data step and datalines.
Also, try to visualize your requirement as an output data set.
What have yo tried so far?

Data never sleeps

ChiSAS25 · Posted 12-04-2025 08:35 AM

Thnks for your reply

Hereafter:

data kpi_data;

infile datalines dsd truncover;

input SubjectId :8.

differences_in_days :8.

flag_KPI :8.;

datalines;

1,-21,1

1,-14,1

1,-7,1

2,-16,1

2,0,1

2,6,1

3,-16,

3,-9,1

3,5,

4,1,

5,-21,1

5,0,1

6,0,

7,-11,

7,1,

7,6,

8,-21,1

8,-11,1

8,0,1

9,-14,

9,-9,

9,-6,

9,1,

9,2,

10,-19,

10,-14,

10,-8,1

10,-5,

10,0,

11,-10,

11,-3,1

11,0,1

11,7,

12,-21,1

12,-4,1

12,1,1

12,2,1

12,7,

13,-15,1

13,2,

14,-15,1

14,-8,

14,3,1

15,-8,

15,1,1

16,-18,1

16,-15,1

16,-11,

16,-8,1

16,-4,

16,-3,

16,-2,

16,-1,1

16,5,1

16,6,1

17,-20,1

17,-14,1

17,-7,1

17,0,1

17,7,1

18,-21,1

18,-20,1

18,-13,1

18,-4,1

18,0,1

18,4,1

18,7,1

13,-17,1

13,-1,1

14,-15,1

14,-8,1

14,0,1

14,7,1

15,-21,1

15,-13,1

15,0,1

16,-21,1

16,-14,1

16,-7,1

16,0,1

16,1,1

17,-16,

17,-9,

17,-2,

17,0,

17,3,

17,4,

18,-17,1

18,-5,1

18,-2,

18,2,

18,5,

19,-15,

19,-1,1

20,-6,

20,-2,

20,6,

21,-21,

21,-20,

21,-18,1

21,-7,1

22,-20,1

22,-15,1

22,-6,1

22,4,

23,-18,

23,-14,

23,-11,

23,1,1

24,-14,1

24,-10,1

24,5,1

25,-1,1

25,-18,

25,-3,

26,-15,

26,0,1

27,-3,1

27,-4,1

28,-14,1

28,3,1

29,-10,1

29,6,1

30,-14,1

30,-11,

30,-8,

30,-1,1

30,3,1

;

run;

I tried using retain by creating two counters (one for days < 0 and one for days ≥ 0), but I wasn’t successful.

Thanks for any help

LinusH · Posted 12-04-2025 11:00 AM

Ok, here's my shot at it:

data want;
	set work.kpi_data;
	retain prev_flag_kpi prev_diff;
	by SubjectId notsorted;
	if first.subjectId then do;
		prev_flag_kpi = .;
		prev_diff = .;
	end;
	else do;
		if prev_flag_kpi = 1 and flag_KPI then do;
			if  (-21 <= prev_diff <= 0 and 0 <= differences_in_days <=7) or
				(-21 <= differences_in_days <= 0 and 0 <= prev_diff <=7)
				then Flag_Consecutive = 1;
		end;
	end;
	if not last.SubjectId then do;
		prev_flag_kpi = flag_KPI;
		prev_diff = differences_in_days;
	end; 
run;

I'm noting the data is not sorted on SubjectId. As a data person I'm missing a clear unique key for each row, like a sequence no, or date.

Data never sleeps

ballardw · Posted 12-04-2025 12:47 PM

@ChiSAS25 wrote:

Can you help me program this algorithm that is giving me trouble?

I have a dataset (as the one I've attached) with patient ID and the difference in days between two dates, with a range of -21 to +7.

I need to identify patients who have two consecutive values of flag_KPI = 1 within the range [-21, 0).

If they have only one in the range [-21, 0), then I consider the case where they have two consecutive values of flag_KPI = 1, one in the window [-21, 0) and the other in the window [0, 7].

Somethings that will help get a good result:

What is the output data set supposed to look like after this process?

What is the result if a patient only has one observation in the data?

What is the result if no consecutive results meet either criteria?

How does the result indicate which criteria was met?

Do any of the patients involved have more than one sequence of values? If so, how might different starts and intervals modify the result?

A general note about about any data set involving date or time intervals. It is usually a very good idea to include variables that contain actual date, time or datetime values with the "base" value included. There are procedures that are intended to work with such for some of these interval types of questions.

A data set as shown without actual dates might get sorted and if you have the same patient with 2 or more sequences then it might be quite awkward getting observations back into the correct order.

ChiSAS25 · Posted 12-05-2025 03:11 AM

If it helps, yesterday I managed to implement the algorithm.

proc sort data=input_ds out=input_ds_sort;
by SubjectId diff_days;
run;

data input_ds_pairs_sel;
set input_ds_sort;
by SubjectId;

retain prev_day
prev_flag1
prev_is_prior
best_dt1 best_dt2 best_gap
pa_dt1 pa_dt2;

if first.SubjectId then do;
prev_day = .;
prev_flag1 = 0;
prev_is_prior = .;
best_dt1 = .; best_dt2 = .; best_gap = .;
pa_dt1 = .; pa_dt2 = .;
end;

is_prior = (-21 <= diff_days < 0);
is_after = (0 <= diff_days <= 7);

curr_flag1 = (flag_KPI = 1);

if not first.SubjectId then do;
if prev_flag1 = 1 and curr_flag1 = 1 then do;

/* PRIOR–PRIOR */
if prev_is_prior = 1 and is_prior = 1 then do;
gap = diff_days - prev_day;

if missing(best_gap)
or (gap < best_gap)
or (gap = best_gap and diff_days > best_dt2) then do;
best_dt1 = prev_day;
best_dt2 = diff_days;
best_gap = gap;
end;
end;

/* PRIOR→AFTER */
else if prev_is_prior = 1 and is_after = 1 then do;
if missing(pa_dt1) then do;
pa_dt1 = prev_day;
pa_dt2 = diff_days;
end;
end;

end;
end;

prev_day = diff_days;
prev_flag1 = curr_flag1;
prev_is_prior = (is_prior = 1);

if last.SubjectId then do;
length source $12;
if not missing(best_dt1) then do;
source = 'prior-prior';
dt1 = best_dt1; dt2 = best_dt2; gap = best_gap;
output;
end;
else if not missing(pa_dt1) then do;
source = 'prior-after';
dt1 = pa_dt1; dt2 = pa_dt2; gap = dt2 - dt1;
output;
end;
end;

keep SubjectId dt1 dt2 gap source ;
run;

Have a nice time

Regards

ballardw · Posted 12-05-2025 12:32 PM

@ChiSAS25 wrote:

If it helps, yesterday I managed to implement the algorithm.

proc sort data=input_ds out=input_ds_sort;
by SubjectId diff_days;
run;

Did you verify before this sort that the SubjectId did not involve more than one sequence of values?

Note that if your data starts in appropriate grouped /ordered values within SubjectId then the Proc Sort is not needed. You should be able to use:

data input_ds_pairs_sel;
set input;
by SubjectId  NOTSORTED;


retain prev_day 
prev_flag1 
prev_is_prior
best_dt1 best_dt2 best_gap 
pa_dt1 pa_dt2;

The NOTSORTED option on the BY statement allows you to process values grouped together as a by variable but without sorting as the BY statement expects and may through an error when not sorted. This is a very useful option.

Identifying consecutive values