Hello all,
I need to calculate the annual and monthly average of values for each study subject. However, the twist is that they all have different start dates that start the window for the year or month of measurements. For example in the code below, I have 3 subjects with different earliest study start dates, so the lookout period for the annual measurement differs. I'm not sure how to create this annual arrays. Hoping to then modify the annual code to also calculate monthly averages.
data have;
input id datetime hr resp;
format datetime datetime22.3.;
cards;
1 19MAR2009:13:15:00.000 90 16
1 19MAR2009:13:17:00.000 70 18
1 19MAR2009:13:18:00.000 80 17
1 19MAR2009:13.22:00.000 100 22
1 23MAR2009:10:36:00.000 83 21
2 29OCT2009:10:36:00.000 121 13
2 29OCT2009:10:38:00.000 83 14
2 08JUL2011:13:04:00.000 131 18
2 07SEP2016:14:26:59.000 50 12
3 06APR2017:13:39:00.000 76 18
3 08JUL2017:13:04:00.000 98 17
3 08JUL2018:13:08:00.000 96 18
3 24OCT2019:15:11:00.000 80 19
;
I'm envisioning the resulting dataset to look like this. Note that the "start_date" variable represents the very first encounter for each id, and then calculates exactly one year from that point to be the next start_date. It does not represent the actual dates of the measurements (except for the very first one)- simply when the clock "starts" for annual averages. And then HR_year and Resp_year represents the average of values for the year. Note that id=2 has missing values because there were no measurements taken in some of the annual time windows.
ID | Start_date | HR_year | Resp_year |
1 | 19MAR2009:13:15:00.000 | 84.6 | 18.8 |
2 | 29OCT2009:10:36:00.000 | 102 | 13.5 |
2 | 29OCT2010:10:36:00.000 | 131 | 18 |
2 | 29OCT2011:10:36:00.000 | . | . |
2 | 29OCT2012:10:36:00.000 | . | . |
2 | 29OCT2013:10:36:00.000 | . | . |
2 | 29OCT2014:10:36:00.000 | . | . |
2 | 29OCT2015:10:36:00.000 | 50 | 12 |
3 | 06APR2017:13:39:00.000 | 90 | 17.667 |
3 | 06APR2018:13:39:00.000 | 96 | 18 |
3 | 06APR2019:13:39:00.000 | 80 | 19 |
Thank you so much for your consideration and help!
First a reminder to make sure that your data step runs and creates what is expected. Yours doesn't provide an informat for datetime and one of your values has a decimal in the wrong place.
This sort of question is one place where the power of the SAS Format comes to the fore. Groups created with a Format can be used in analysis, reporting and most formats will work with graphs. The following shows creating mean, median and standard deviation. Basic statistics are easy. If you want a data set then use the version with output
data have; input id datetime :datetime25. hr resp; format datetime datetime22.3; cards; 1 19MAR2009:13:15:00.000 90 16 1 19MAR2009:13:17:00.000 70 18 1 19MAR2009:13:18:00.000 80 17 1 19MAR2009:13:22:00.000 100 22 1 23MAR2009:10:36:00.000 83 21 2 29OCT2009:10:36:00.000 121 13 2 29OCT2009:10:38:00.000 83 14 2 08JUL2011:13:04:00.000 131 18 2 07SEP2016:14:26:59.000 50 12 3 06APR2017:13:39:00.000 76 18 3 08JUL2017:13:04:00.000 98 17 3 08JUL2018:13:08:00.000 96 18 3 24OCT2019:15:11:00.000 80 19 ; proc means data=have mean median std; class id datetime; format datetime dtmonyy.; var hr resp; run; proc means data=have mean median std; class id datetime; format datetime dtyear.; var hr resp; run;
/* creating an output data set*/ proc means data=have noprint nway; class id datetime; format datetime dtyear.; var hr resp; output out=want mean= median= std= /autoname; run;
Autoname places the statistic as a suffix to the variable name. If you only have one statistic and use median = ; then your result statistic has the same name as the variable. Or you can check syntax on creating your own names.
First a reminder to make sure that your data step runs and creates what is expected. Yours doesn't provide an informat for datetime and one of your values has a decimal in the wrong place.
This sort of question is one place where the power of the SAS Format comes to the fore. Groups created with a Format can be used in analysis, reporting and most formats will work with graphs. The following shows creating mean, median and standard deviation. Basic statistics are easy. If you want a data set then use the version with output
data have; input id datetime :datetime25. hr resp; format datetime datetime22.3; cards; 1 19MAR2009:13:15:00.000 90 16 1 19MAR2009:13:17:00.000 70 18 1 19MAR2009:13:18:00.000 80 17 1 19MAR2009:13:22:00.000 100 22 1 23MAR2009:10:36:00.000 83 21 2 29OCT2009:10:36:00.000 121 13 2 29OCT2009:10:38:00.000 83 14 2 08JUL2011:13:04:00.000 131 18 2 07SEP2016:14:26:59.000 50 12 3 06APR2017:13:39:00.000 76 18 3 08JUL2017:13:04:00.000 98 17 3 08JUL2018:13:08:00.000 96 18 3 24OCT2019:15:11:00.000 80 19 ; proc means data=have mean median std; class id datetime; format datetime dtmonyy.; var hr resp; run; proc means data=have mean median std; class id datetime; format datetime dtyear.; var hr resp; run;
/* creating an output data set*/ proc means data=have noprint nway; class id datetime; format datetime dtyear.; var hr resp; output out=want mean= median= std= /autoname; run;
Autoname places the statistic as a suffix to the variable name. If you only have one statistic and use median = ; then your result statistic has the same name as the variable. Or you can check syntax on creating your own names.
Thanks @ballardw , and sorry about the data step not running correctly. It looks like this isn't creating annual means in the example output because it's giving a monthly average, based on months that are available. Instead, I would like an annual average that starts from each individual's unique first datetime encounter. If an individual does not have any qualifying measurements for the year, then it will be set to missing (which is what happens with id=2).
Thank you again for your thoughts!
i'm so sorry. The code does work once I print the report. Is there a way to get a row for years that have missing data though? For example, in the example, 2011, 2012, 2013, 2014 would need a row with missing values for id=2.
Apologies again and thank you!
You are going to have provide your definition of "month" perhaps for use as "anniversary" dates. Will you accept the same day of the month in the following month? Where does that leave things that want Feb29 as an anniversary (since that only appears once every 4 years generally)?
See if this indicates the number of 'years' or 'months' correctly from your anniversary rules.
Everything in the first "year" will have 0 (same year) for yearint and then increase when more than a year. Similar with monthint
data need; set have; by id; /*<= implies data is sorted by ID*/ retain basedatetime; if first.id then basedatetime=datetime; yearint = intck('dtyear',basedatetime,datetime,'C'); monthint = intck('dtmon',basedatetime,datetime,'C'); run;
If the boundaries look right then you can play with something to make more meaningful year/month indicator than the yearint/monthint. Or perhaps a single variable using intck('dtday' ) could be used.
I noticed that your OUTPUT included a great many values that were not in the example input data. So I had no way to check if the results created from the example data matched your actual want.
Thank you @ballardw ! The first method ended up fitting my needs! Appreciate all of your help.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.