Solved: Averaging count by two variables

dikshar · Posted 02-24-2019 05:09 PM

I have a longitudinal dataset and I need a rate of tests performed per year per individual:

ID test_date test_value

1 01/01/2014 3.5

1 03/24/2014 12.4

1 02/15/2015 9.3

1 03/01/2015 0.04

1 09/24/2015 4

1 02/03/2016 3

2 03/06/2016 11

2 02/01/2017 2.5

2 10/12/2017 3

I created a extracted the year from the date into a new year variable. I figured that if I can count the number of tests per year per ID, I can then average the count per ID to get the rate per ID.

This is the code I am using but the output gives me a total count per ID instead of per year per ID:

data want;
retain new_var;
set have;
by ID year;
if first.ID then new_var= 1;
else new_var= new_var+1;
run;

I also tried adding "and first.year" in the if statement but I still get the same output.

Is there another way to get the rate with this dataset?

Astounding · Posted 02-24-2019 05:41 PM

You did get most of the way there. As @Kurt_Bremser pointed out, you are using first.ID when you should be using first.year instead ... except that:

You could do all this using your original data, without having to create a new variable to hold the year. Consider:

proc freq data=have;

tables id * test_date / noprint out=n_tests;

format test_date year4.;

run;

This gives you a data set N_TESTS with a COUNT for each patient, for each year. Note that the TEST_DATE variable still refers to a particular day, the earliest day that was found for that patient, during that year. But there is still just one COUNT for each patient/year. So finish with:

proc means data=n_tests;

var count;

by id;

run;

View solution in original post

Kurt_Bremser · Posted 02-24-2019 05:11 PM

You need to test for first.year only.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Astounding · Posted 02-24-2019 05:41 PM

You did get most of the way there. As @Kurt_Bremser pointed out, you are using first.ID when you should be using first.year instead ... except that:

You could do all this using your original data, without having to create a new variable to hold the year. Consider:

proc freq data=have;

tables id * test_date / noprint out=n_tests;

format test_date year4.;

run;

This gives you a data set N_TESTS with a COUNT for each patient, for each year. Note that the TEST_DATE variable still refers to a particular day, the earliest day that was found for that patient, during that year. But there is still just one COUNT for each patient/year. So finish with:

proc means data=n_tests;

var count;

by id;

run;

stat_sas · Posted 02-24-2019 05:50 PM

Hi,

How about using sql?

proc sql;
select id,year(test_date) as year, avg(test_value) as avg_test_value format=8.2
from have
group by id,calculated year;
quit;

Averaging count by two variables

Re: Averaging count by two variables

Re: Averaging count by two variables

Re: Averaging count by two variables

Re: Averaging count by two variables

Averaging count by two variables

Re: Averaging count by two variables

Re: Averaging count by two variables

Re: Averaging count by two variables

Re: Averaging count by two variables

Click image to register for webinar

Classroom Training Available!