BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dikshar
Calcite | Level 5

I have a longitudinal dataset and I need a rate of tests performed per year per individual:

 

ID     test_date     test_value

1       01/01/2014       3.5

1       03/24/2014       12.4

1       02/15/2015       9.3

1       03/01/2015       0.04

1       09/24/2015       4

1       02/03/2016       3

2       03/06/2016       11

2       02/01/2017       2.5

2       10/12/2017       3

 

I created a extracted the year from the date into a new year variable. I figured that if I can count the number of tests per year per ID, I can then average the count per ID to get the rate per ID.

 

This is the code I am using but the output gives me a total count per ID instead of per year per ID:

 

data want;
  retain new_var;
     set have;
      by ID year;
      if first.ID then new_var= 1;
           else new_var= new_var+1;
run;

 

I also tried adding "and first.year" in the if statement but I still get the same output. 

 

Is there another way to get the rate with this dataset?

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

You did get most of the way there.  As @Kurt_Bremser pointed out, you are using first.ID when you should be using first.year instead ... except that:

 

You could do all this using your original data, without having to create a new variable to hold the year.  Consider:

 

proc freq data=have;

tables id * test_date / noprint out=n_tests;

format test_date year4.;

run;

 

This gives you a data set N_TESTS with a COUNT for each patient, for each year.  Note that the TEST_DATE variable still refers to a particular day, the earliest day that was found for that patient, during that year.  But there is still just one COUNT for each patient/year.  So finish with:

 

proc means data=n_tests;

var count;

by id;

run;

View solution in original post

3 REPLIES 3
Astounding
PROC Star

You did get most of the way there.  As @Kurt_Bremser pointed out, you are using first.ID when you should be using first.year instead ... except that:

 

You could do all this using your original data, without having to create a new variable to hold the year.  Consider:

 

proc freq data=have;

tables id * test_date / noprint out=n_tests;

format test_date year4.;

run;

 

This gives you a data set N_TESTS with a COUNT for each patient, for each year.  Note that the TEST_DATE variable still refers to a particular day, the earliest day that was found for that patient, during that year.  But there is still just one COUNT for each patient/year.  So finish with:

 

proc means data=n_tests;

var count;

by id;

run;

stat_sas
Ammonite | Level 13

Hi,

 

How about using sql?

 

proc sql;
select id,year(test_date) as year, avg(test_value) as avg_test_value format=8.2
from have
group by id,calculated year;
quit;

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1347 views
  • 1 like
  • 4 in conversation