I have a longitudinal dataset and I need a rate of tests performed per year per individual:
ID test_date test_value
1 01/01/2014 3.5
1 03/24/2014 12.4
1 02/15/2015 9.3
1 03/01/2015 0.04
1 09/24/2015 4
1 02/03/2016 3
2 03/06/2016 11
2 02/01/2017 2.5
2 10/12/2017 3
I created a extracted the year from the date into a new year variable. I figured that if I can count the number of tests per year per ID, I can then average the count per ID to get the rate per ID.
This is the code I am using but the output gives me a total count per ID instead of per year per ID:
data want;
retain new_var;
set have;
by ID year;
if first.ID then new_var= 1;
else new_var= new_var+1;
run;
I also tried adding "and first.year" in the if statement but I still get the same output.
Is there another way to get the rate with this dataset?
You did get most of the way there. As @Kurt_Bremser pointed out, you are using first.ID when you should be using first.year instead ... except that:
You could do all this using your original data, without having to create a new variable to hold the year. Consider:
proc freq data=have;
tables id * test_date / noprint out=n_tests;
format test_date year4.;
run;
This gives you a data set N_TESTS with a COUNT for each patient, for each year. Note that the TEST_DATE variable still refers to a particular day, the earliest day that was found for that patient, during that year. But there is still just one COUNT for each patient/year. So finish with:
proc means data=n_tests;
var count;
by id;
run;
You did get most of the way there. As @Kurt_Bremser pointed out, you are using first.ID when you should be using first.year instead ... except that:
You could do all this using your original data, without having to create a new variable to hold the year. Consider:
proc freq data=have;
tables id * test_date / noprint out=n_tests;
format test_date year4.;
run;
This gives you a data set N_TESTS with a COUNT for each patient, for each year. Note that the TEST_DATE variable still refers to a particular day, the earliest day that was found for that patient, during that year. But there is still just one COUNT for each patient/year. So finish with:
proc means data=n_tests;
var count;
by id;
run;
Hi,
How about using sql?
proc sql;
select id,year(test_date) as year, avg(test_value) as avg_test_value format=8.2
from have
group by id,calculated year;
quit;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.