Hi!
I am using a long dataset that is set up as shown below. I am trying to find the FIRST instance where the "value" variable is >=16 at least 4 times within a given 730-day period. These 4 occurences of value>=16 do not necessarily need to be consecutive, just occuring within 730 days of each other. I would also like to generate another variable that is the date of the 4th occurence of value>=16. The visit_date variable is in the form of days after a certain date that is 0, so they are formatted as continuous numbers.
In the example below I would like it to identify for id=1 that visit_date=1700 is the 4th visit within a 730-day period where value>=16. And I'd like the visit_date of 1700 for this person to be output. For id=2 it would NOT flag anything since there are 4 instances of value>=16, but these do not occur within a 730 day window (1000-124>730).
ID visit_date value
1 100 5
1 220 16
1 330 22
1 700 12
1 915 5
1 1120 2
1 1200 17
1 1300 27
1 1500 30
1 1700 35
1 1922 5
2 124 17
2 450 18
2 555 2
2 740 29
2 1000 23
I have tried using an approach of first.id but haven't been able to get anything close to what I want. I'm not sure if an array of some form may be better for this! I'm working in SAS 9.4.
Thanks for any help!
Here is one way. Note that it will output all observations where there are four or more large values within 730 days.
It is interesting because it actually calls the LAG3() function conditionally so that it only lags the visit dates for the days with high values.
data want;
set have ;
by id visit_date;
row+1;
if first.id then n=0;
if value >= 16 then do;
n+1;
lag_date = lag3(visit_date);
if n=4 then do;
if lag_date + 730 >= visit_date then output;
else n=3;
end;
end;
run;
Can you please post what you've tried so far?
@bgosiker wrote:
Hi!
I am using a long dataset that is set up as shown below. I am trying to find the FIRST instance where the "value" variable is >=16 at least 4 times within a given 730-day period. These 4 occurences of value>=16 do not necessarily need to be consecutive, just occuring within 730 days of each other. I would also like to generate another variable that is the date of the 4th occurence of value>=16. The visit_date variable is in the form of days after a certain date that is 0, so they are formatted as continuous numbers.
In the example below I would like it to identify for id=1 that visit_date=1700 is the 4th visit within a 730-day period where value>=16. And I'd like the visit_date of 1700 for this person to be output. For id=2 it would NOT flag anything since there are 4 instances of value>=16, but these do not occur within a 730 day window (1000-124>730).
ID visit_date value
1 100 5
1 220 16
1 330 22
1 700 12
1 915 5
1 1120 2
1 1200 17
1 1300 27
1 1500 30
1 1700 35
1 1922 5
2 124 17
2 450 18
2 555 2
2 740 29
2 1000 23
I have tried using an approach of first.id but haven't been able to get anything close to what I want. I'm not sure if an array of some form may be better for this! I'm working in SAS 9.4.
Thanks for any help!
I've tried first creating an indicator for when value>=16 and then trying to do a count of that variable but haven't been able to finish the data step to identify the 4 consecutive value>=16 within the period I want.
data visit1; set visit;
if value>=16 then value_gt=1;
if value<16 then value_gt=0;
run;
proc sort data=work.visit1;
by id visit_date value;
run;
data visit1; set visit1;
by id visit_date value;
if first.value then value_count=1
value_count + value_gt;
Keeping track of the cumulative count of records with value>=16 for each ID is the straightforward part. But every time you encounter such a record, you must compare the current visit_date to the 3rd prior such visit_date, as well as seeing if N16>=4. You can do this with the lag3(visit_date) function, as long as you only execute that function when you have a qualifying record in hand:
data have;
input ID visit_date value ;
datalines;
1 100 5
1 220 16
1 330 22
1 700 12
1 915 5
1 1120 2
1 1200 17
1 1300 27
1 1500 30
1 1700 35
1 1922 5
2 124 17
2 450 18
2 555 2
2 740 29
2 1000 23
run;
data want (drop=n16);
set have;
by id;
if first.id then n16=0;
if value>=16 then do;
n16+1;
if n16>=4 and lag3(visit_date)>=visit_date-730 then flag=1;
if flag=1 then n16=-1000000;
end;
run;
Here is one way. Note that it will output all observations where there are four or more large values within 730 days.
It is interesting because it actually calls the LAG3() function conditionally so that it only lags the visit dates for the days with high values.
data want;
set have ;
by id visit_date;
row+1;
if first.id then n=0;
if value >= 16 then do;
n+1;
lag_date = lag3(visit_date);
if n=4 then do;
if lag_date + 730 >= visit_date then output;
else n=3;
end;
end;
run;
Another possibility is to use a DoW loop (with a counter), a WHERE clause, and the DIF function instead of the LAG function:
data want;
do _N_=1 by 1 until(last.id);
set have;
where value>=16;
by id;
if dif3(visit_date)<=730 and _N_>=4 then
output;
end;
run;
But this logic generates ALL qualifying events. I believe the OP only wanted the first such event.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.