BookmarkSubscribeRSS Feed
ChiSAS25
Fluorite | Level 6

Can you help me program this algorithm that is giving me trouble?

I have a dataset (as the one I've attached) with patient ID and the difference in days between two dates, with a range of -21 to +7.

I need to identify patients who have two consecutive values of flag_KPI = 1 within the range [-21, 0).

If they have only one in the range [-21, 0), then I consider the case where they have two consecutive values of flag_KPI = 1, one in the window [-21, 0) and the other in the window [0, 7].

6 REPLIES 6
LinusH
Tourmaline | Level 20
Please attach data using a data step and datalines.
Also, try to visualize your requirement as an output data set.
What have yo tried so far?
Data never sleeps
ChiSAS25
Fluorite | Level 6

Thnks for your reply
 
Hereafter:
data kpi_data;
    infile datalines dsd truncover;
    input SubjectId :8.
          differences_in_days :8.
          flag_KPI :8.;
datalines;
1,-21,1
1,-14,1
1,-7,1
2,-16,1
2,0,1
2,6,1
3,-16,
3,-9,1
3,5,
4,1,
5,-21,1
5,0,1
6,0,
7,-11,
7,1,
7,6,
8,-21,1
8,-11,1
8,0,1
9,-14,
9,-9,
9,-6,
9,1,
9,2,
10,-19,
10,-14,
10,-8,1
10,-5,
10,0,
11,-10,
11,-3,1
11,0,1
11,7,
12,-21,1
12,-4,1
12,1,1
12,2,1
12,2,1
12,7,
13,-15,1
13,2,
14,-15,1
14,-8,
14,3,1
15,-8,
15,1,1
16,-18,1
16,-15,1
16,-11,
16,-8,1
16,-4,
16,-3,
16,-2,
16,-1,1
16,5,1
16,6,1
17,-20,1
17,-14,1
17,-7,1
17,0,1
17,7,1
18,-21,1
18,-20,1
18,-13,1
18,-4,1
18,0,1
18,4,1
18,7,1
13,-17,1
13,-1,1
14,-15,1
14,-8,1
14,0,1
14,7,1
15,-21,1
15,-13,1
15,0,1
16,-21,1
16,-14,1
16,-7,1
16,0,1
16,1,1
17,-16,
17,-9,
17,-2,
17,0,
17,3,
17,4,
18,-17,1
18,-5,1
18,-2,
18,2,
18,5,
19,-15,
19,-1,1
20,-6,
20,-2,
20,6,
21,-21,
21,-20,
21,-18,1
21,-7,1
22,-20,1
22,-15,1
22,-6,1
22,4,
23,-18,
23,-14,
23,-11,
23,1,1
24,-14,1
24,-10,1
24,5,1
25,-1,1
25,-18,
25,-3,
26,-15,
26,0,1
27,-3,1
27,-4,1
28,-14,1
28,3,1
29,-10,1
29,6,1
30,-14,1
30,-11,
30,-8,
30,-1,1
30,3,1
;
run;
 
I tried using retain by creating two counters (one for days < 0 and one for days ≥ 0), but I wasn’t successful.
 
Thanks for any help
 

 

 

LinusH
Tourmaline | Level 20

Ok, here's my shot at it:

data want;
	set work.kpi_data;
	retain prev_flag_kpi prev_diff;
	by SubjectId notsorted;
	if first.subjectId then do;
		prev_flag_kpi = .;
		prev_diff = .;
	end;
	else do;
		if prev_flag_kpi = 1 and flag_KPI then do;
			if  (-21 <= prev_diff <= 0 and 0 <= differences_in_days <=7) or
				(-21 <= differences_in_days <= 0 and 0 <= prev_diff <=7)
				then Flag_Consecutive = 1;
		end;
	end;
	if not last.SubjectId then do;
		prev_flag_kpi = flag_KPI;
		prev_diff = differences_in_days;
	end; 
run;

I'm noting the data is not sorted on SubjectId. As a data person I'm missing a clear unique key for each row, like a sequence no, or date.

 

Data never sleeps
ballardw
Super User

@ChiSAS25 wrote:

Can you help me program this algorithm that is giving me trouble?

I have a dataset (as the one I've attached) with patient ID and the difference in days between two dates, with a range of -21 to +7.

I need to identify patients who have two consecutive values of flag_KPI = 1 within the range [-21, 0).

If they have only one in the range [-21, 0), then I consider the case where they have two consecutive values of flag_KPI = 1, one in the window [-21, 0) and the other in the window [0, 7].


Somethings that will help get a good result:

What is the output data set supposed to look like after this process?

What is the result if a patient only has one observation in the data?

What is the result if no consecutive results meet either criteria?

How does the result indicate which criteria was met?

Do any of the patients involved have more than one sequence of values? If so, how might different starts and intervals modify the result?

 

A general note about about any data set involving date or time intervals. It is usually a very good idea to include variables that contain actual date, time or datetime values with the "base" value included. There are procedures that are intended to work with such for some of these interval types of questions.

A data set as shown without actual dates might get sorted and if you have the same patient with 2 or more sequences then it might be quite awkward getting observations back into the correct order.

ChiSAS25
Fluorite | Level 6
If it helps, yesterday I managed to implement the algorithm.


proc sort data=input_ds out=input_ds_sort;
by SubjectId diff_days;
run;

data input_ds_pairs_sel;
set input_ds_sort;
by SubjectId;


retain prev_day 
prev_flag1 
prev_is_prior
best_dt1 best_dt2 best_gap 
pa_dt1 pa_dt2; 


if first.SubjectId then do;
prev_day = .;
prev_flag1 = 0;
prev_is_prior = .;
best_dt1 = .; best_dt2 = .; best_gap = .;
pa_dt1 = .; pa_dt2 = .;
end;


is_prior = (-21 <= diff_days < 0);
is_after = (0 <= diff_days <= 7);


curr_flag1 = (flag_KPI = 1);


if not first.SubjectId then do;
if prev_flag1 = 1 and curr_flag1 = 1 then do;

/* PRIOR–PRIOR */
if prev_is_prior = 1 and is_prior = 1 then do;
gap = diff_days - prev_day;

if missing(best_gap)
or (gap < best_gap)
or (gap = best_gap and diff_days > best_dt2) then do;
best_dt1 = prev_day;
best_dt2 = diff_days;
best_gap = gap;
end;
end;

/* PRIOR→AFTER */
else if prev_is_prior = 1 and is_after = 1 then do;
if missing(pa_dt1) then do;
pa_dt1 = prev_day;
pa_dt2 = diff_days;
end;
end;

end;
end;


prev_day = diff_days;
prev_flag1 = curr_flag1;
prev_is_prior = (is_prior = 1);


if last.SubjectId then do;
length source $12;
if not missing(best_dt1) then do;
source = 'prior-prior';
dt1 = best_dt1; dt2 = best_dt2; gap = best_gap;
output;
end;
else if not missing(pa_dt1) then do; 
source = 'prior-after';
dt1 = pa_dt1; dt2 = pa_dt2; gap = dt2 - dt1;
output;
end;
end;

keep SubjectId dt1 dt2 gap source ;
run;

 

Have a nice time

Regards

ballardw
Super User

@ChiSAS25 wrote:
If it helps, yesterday I managed to implement the algorithm.


proc sort data=input_ds out=input_ds_sort;
by SubjectId diff_days;
run;

Did you verify before this sort that the SubjectId did not involve more than one sequence of values?

 

Note that if your data starts in appropriate grouped /ordered values within SubjectId then the Proc Sort is not needed. You should be able to use:

 

data input_ds_pairs_sel;
set input;
by SubjectId  NOTSORTED;


retain prev_day 
prev_flag1 
prev_is_prior
best_dt1 best_dt2 best_gap 
pa_dt1 pa_dt2; 

The NOTSORTED option on the BY statement allows you to process values grouped together as a by variable but without sorting as the BY statement expects and may through an error when not sorted. This is a very useful option. 

 

 

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 615 views
  • 1 like
  • 3 in conversation