I have a longitudinal dataset and I need a rate of tests performed per year per individual:
ID test_date test_value
1 01/01/2014 3.5
1 03/24/2014 12.4
1 02/15/2015 9.3
1 03/01/2015 0.04
1 09/24/2015 4
1 02/03/2016 3
2 03/06/2016 11
2 02/01/2017 2.5
2 10/12/2017 3
I created a extracted the year from the date into a new year variable. I figured that if I can count the number of tests per year per ID, I can then average the count per ID to get the rate per ID.
This is the code I am using but the output gives me a total count per ID instead of per year per ID:
data want;
retain new_var;
set have;
by ID year;
if first.ID then new_var= 1;
else new_var= new_var+1;
run;
I also tried adding "and first.year" in the if statement but I still get the same output.
Is there another way to get the rate with this dataset?
You did get most of the way there. As @Kurt_Bremser pointed out, you are using first.ID when you should be using first.year instead ... except that:
You could do all this using your original data, without having to create a new variable to hold the year. Consider:
proc freq data=have;
tables id * test_date / noprint out=n_tests;
format test_date year4.;
run;
This gives you a data set N_TESTS with a COUNT for each patient, for each year. Note that the TEST_DATE variable still refers to a particular day, the earliest day that was found for that patient, during that year. But there is still just one COUNT for each patient/year. So finish with:
proc means data=n_tests;
var count;
by id;
run;
You did get most of the way there. As @Kurt_Bremser pointed out, you are using first.ID when you should be using first.year instead ... except that:
You could do all this using your original data, without having to create a new variable to hold the year. Consider:
proc freq data=have;
tables id * test_date / noprint out=n_tests;
format test_date year4.;
run;
This gives you a data set N_TESTS with a COUNT for each patient, for each year. Note that the TEST_DATE variable still refers to a particular day, the earliest day that was found for that patient, during that year. But there is still just one COUNT for each patient/year. So finish with:
proc means data=n_tests;
var count;
by id;
run;
Hi,
How about using sql?
proc sql;
select id,year(test_date) as year, avg(test_value) as avg_test_value format=8.2
from have
group by id,calculated year;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.