BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dikshar
Calcite | Level 5

I have a longitudinal dataset and I need a rate of tests performed per year per individual:

 

ID     test_date     test_value

1       01/01/2014       3.5

1       03/24/2014       12.4

1       02/15/2015       9.3

1       03/01/2015       0.04

1       09/24/2015       4

1       02/03/2016       3

2       03/06/2016       11

2       02/01/2017       2.5

2       10/12/2017       3

 

I created a extracted the year from the date into a new year variable. I figured that if I can count the number of tests per year per ID, I can then average the count per ID to get the rate per ID.

 

This is the code I am using but the output gives me a total count per ID instead of per year per ID:

 

data want;
  retain new_var;
     set have;
      by ID year;
      if first.ID then new_var= 1;
           else new_var= new_var+1;
run;

 

I also tried adding "and first.year" in the if statement but I still get the same output. 

 

Is there another way to get the rate with this dataset?

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

You did get most of the way there.  As @Kurt_Bremser pointed out, you are using first.ID when you should be using first.year instead ... except that:

 

You could do all this using your original data, without having to create a new variable to hold the year.  Consider:

 

proc freq data=have;

tables id * test_date / noprint out=n_tests;

format test_date year4.;

run;

 

This gives you a data set N_TESTS with a COUNT for each patient, for each year.  Note that the TEST_DATE variable still refers to a particular day, the earliest day that was found for that patient, during that year.  But there is still just one COUNT for each patient/year.  So finish with:

 

proc means data=n_tests;

var count;

by id;

run;

View solution in original post

3 REPLIES 3
Astounding
PROC Star

You did get most of the way there.  As @Kurt_Bremser pointed out, you are using first.ID when you should be using first.year instead ... except that:

 

You could do all this using your original data, without having to create a new variable to hold the year.  Consider:

 

proc freq data=have;

tables id * test_date / noprint out=n_tests;

format test_date year4.;

run;

 

This gives you a data set N_TESTS with a COUNT for each patient, for each year.  Note that the TEST_DATE variable still refers to a particular day, the earliest day that was found for that patient, during that year.  But there is still just one COUNT for each patient/year.  So finish with:

 

proc means data=n_tests;

var count;

by id;

run;

stat_sas
Ammonite | Level 13

Hi,

 

How about using sql?

 

proc sql;
select id,year(test_date) as year, avg(test_value) as avg_test_value format=8.2
from have
group by id,calculated year;
quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 863 views
  • 1 like
  • 4 in conversation