Hello,
I am trying to figure out how to select participants who have a BMI>=30 for at least 1 year from clinic visit data. I've created a count variable to indicate streaks of consecutive visits with BMI>=30. I want to write code that will create start and end dates for each streak, some participants have more than one unique streak if their BMI hovers around 30.
In my example data below, IDs 2 and 3 would meet the criteria and ID 1 would be excluded for not having at least 1 year. IDs 1 and 2 are straight forward- just pull the first and last dates and check if these are >1 year apart. However, I can't figure out how to handle ID 3. They have two unique streaks, but only the second streak meets the requirement of >=1 year. How do I indicate that I want each streak within each ID to have unique start and stop dates?
ID | BMI_date | BMI | BMI_Count |
1 | 1/2/2020 | 36 | 1 |
1 | 4/3/2020 | 38 | 2 |
1 | 5/6/2020 | 36 | 3 |
1 | 11/18/2020 | 36 | 4 |
2 | 6/5/2020 | 30 | 1 |
2 | 12/17/2020 | 32 | 2 |
2 | 9/15/2021 | 33 | 3 |
3 | 7/19/2020 | 35 | 1 |
3 | 9/1/2020 | 34 | 2 |
3 | 10/4/2020 | 32 | 3 |
3 | 12/20/2020 | 29 | 0 |
3 | 6/4/2021 | 30 | 1 |
3 | 1/4/2022 | 30 | 2 |
3 | 9/14/2022 | 31 | 3 |
Thank you for reading! I'm using SAS 9.4. This is my first post on here, please let me know if I need to include more information.
Rather than counting the test dates within a group it is better to generate a group number.
So if you have this input data:
data have ;
input ID BMI_date :yymmdd. BMI ;
format bmi_date yymmdd10.;
cards;
1 2020-01-02 36
1 2020-04-03 38
1 2020-05-06 36
1 2020-11-18 36
2 2020-06-05 30
2 2020-12-17 32
2 2021-09-15 33
3 2020-07-19 35
3 2020-09-01 34
3 2020-10-04 32
3 2020-12-20 29
3 2021-06-04 30
3 2022-01-04 30
3 2022-09-14 31
;
You can calculate an OVER flag and a GROUP number using a step like this.
data groups;
set have;
by id bmi_date;
over=(bmi >= 30);
group+(over ne lag(over));
if first.id then group=1;
run;
Now you have all the information you need to calculate the statistics you need.
proc summary data=groups nway;
by id group over;
output out=want(drop=_type_) min(bmi_date)=start max(bmi_date)=end ;
run;
Sounds like you want to get this list:
proc print;
where over and intck('year',start,end,'c');
run;
Result
Obs ID group over _FREQ_ start end 2 2 1 1 3 2020-06-05 2021-09-15 5 3 3 1 3 2021-06-04 2022-09-14
Rather than counting the test dates within a group it is better to generate a group number.
So if you have this input data:
data have ;
input ID BMI_date :yymmdd. BMI ;
format bmi_date yymmdd10.;
cards;
1 2020-01-02 36
1 2020-04-03 38
1 2020-05-06 36
1 2020-11-18 36
2 2020-06-05 30
2 2020-12-17 32
2 2021-09-15 33
3 2020-07-19 35
3 2020-09-01 34
3 2020-10-04 32
3 2020-12-20 29
3 2021-06-04 30
3 2022-01-04 30
3 2022-09-14 31
;
You can calculate an OVER flag and a GROUP number using a step like this.
data groups;
set have;
by id bmi_date;
over=(bmi >= 30);
group+(over ne lag(over));
if first.id then group=1;
run;
Now you have all the information you need to calculate the statistics you need.
proc summary data=groups nway;
by id group over;
output out=want(drop=_type_) min(bmi_date)=start max(bmi_date)=end ;
run;
Sounds like you want to get this list:
proc print;
where over and intck('year',start,end,'c');
run;
Result
Obs ID group over _FREQ_ start end 2 2 1 1 3 2020-06-05 2021-09-15 5 3 3 1 3 2021-06-04 2022-09-14
Yes, this is exactly what I've been trying to figure out. Thank you!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.