Hi there! I have repeated measures data in long format. Several subjects have multiple observations within the same binned time period. I would like to average the values for that subject within that time period (rather than selecting just one).
For example, (I've pasted the data frame below) subject 002E5 has three observations for time point -0.5 (variable name timecat), and three observations for time point 0.5; subject 003E5 has two observations for time point -3.0, etc. I am looking for a more efficient way to get the average value at time point x for subject y with all of the time points in one data set as opposed to splitting the data into each time category and getting the time point and subject specific averages and then merging it all together again.
I know how to get the average by unique ID, but can't figure out how to add the additional condition of timecat.
I've attached a snippet of the data in sas7bdat format.
Thanks!
Obs POST logvl timecat ID
0 | 5.2 | -0.5 | 002E5 |
0 | 3.9 | -0.5 | 002E5 |
0 | 3.6 | -0.5 | 002E5 |
0 | 2.7 | 0.0 | 002E5 |
0 | 5.5 | 0.5 | 002E5 |
0 | 3.4 | 0.5 | 002E5 |
0 | 2.7 | 0.5 | 002E5 |
0 | 0.0 | -3.0 | 003E5 |
0 | 3.5 | -3.0 | 003E5 |
0 | 3.0 | -2.5 | 003E5 |
0 | 4.9 | -1.0 | 003E5 |
0 | 5.1 | -0.5 | 003E5 |
0 | 4.8 | -0.5 | 003E5 |
0 | 1.7 | 0.5 | 003E5 |
0 | 0.0 | 0.5 | 003E5 |
0 | 3.5 | 1.0 | 003E5 |
0 | 2.1 | 1.0 | 003E5 |
0 | 0.0 | 1.0 | 003E5 |
0 | 0.0 | 1.0 | 003E5 |
0 | 0.0 | 1.5 | 003E5 |
@kastafford wrote:
For example, (I've pasted the data frame below) subject 002E5 has three observations for time point -0.5 (variable name timecat), and three observations for time point 0.5; subject 003E5 has two observations for time point -3.0, etc. I am looking for a more efficient way to get the average value at time point x for subject y with all of the time points in one data set as opposed to splitting the data into each time category and getting the time point and subject specific averages and then merging it all together again.
I know how to get the average by unique ID, but can't figure out how to add the additional condition of timecat.
This is exactly what PROC SUMMARY was designed to do.
UNTESTED CODE
proc summary data=have nway;
class id timecat;
var post;
output out=want mean=;
run;
@kastafford wrote:
I know how to get the average by unique ID, but can't figure out how to add the additional condition of timecat.
How are you currently doing it? I would assume adding TIMECAT to your BY, CLASS or GROUP statement would work.
This is a standard proc means/summary type question where you place ALL your grouping variables in the BY or CLASS statement.
https://github.com/statgeek/SAS-Tutorials/blob/master/proc_means_basic.sas
@kastafford wrote:
Hi there! I have repeated measures data in long format. Several subjects have multiple observations within the same binned time period. I would like to average the values for that subject within that time period (rather than selecting just one).
For example, (I've pasted the data frame below) subject 002E5 has three observations for time point -0.5 (variable name timecat), and three observations for time point 0.5; subject 003E5 has two observations for time point -3.0, etc. I am looking for a more efficient way to get the average value at time point x for subject y with all of the time points in one data set as opposed to splitting the data into each time category and getting the time point and subject specific averages and then merging it all together again.
I know how to get the average by unique ID, but can't figure out how to add the additional condition of timecat.
I've attached a snippet of the data in sas7bdat format.
Thanks!
Obs POST logvl timecat ID
0 5.2 -0.5 002E5 0 3.9 -0.5 002E5 0 3.6 -0.5 002E5 0 2.7 0.0 002E5 0 5.5 0.5 002E5 0 3.4 0.5 002E5 0 2.7 0.5 002E5 0 0.0 -3.0 003E5 0 3.5 -3.0 003E5 0 3.0 -2.5 003E5 0 4.9 -1.0 003E5 0 5.1 -0.5 003E5 0 4.8 -0.5 003E5 0 1.7 0.5 003E5 0 0.0 0.5 003E5 0 3.5 1.0 003E5 0 2.1 1.0 003E5 0 0.0 1.0 003E5 0 0.0 1.0 003E5 0 0.0 1.5 003E5
Hi @Reeza,
Thanks for your quick reply. As a procedure, this works to get me the values, but I'm trying to do it in a data step to create a new variable
proc means data=cancer.sample;
class ID timecat;
var logvl;
run;
After they by ID; if first.ID, I'm not sure how to add the conditional timecat variable. I'm thinking either a do loop or array but I don't construct them well.
Thanks again!
@kastafford wrote:
After they by ID; if first.ID, I'm not sure how to add the conditional timecat variable. I'm thinking either a do loop or array but I don't construct them well.
Hint for future SAS usage: don't try to write your own code to do simple things like means and standard deviations and minimums and maximums and so on. SAS has already done this for you, plus they have built in error-checking and verified the results. In fact, even for more complicated analyses, if a SAS PROC exists that does what you want, don't write your own code. Spend some time learning what SAS PROCs are available and what they do.
@kastafford wrote:
For example, (I've pasted the data frame below) subject 002E5 has three observations for time point -0.5 (variable name timecat), and three observations for time point 0.5; subject 003E5 has two observations for time point -3.0, etc. I am looking for a more efficient way to get the average value at time point x for subject y with all of the time points in one data set as opposed to splitting the data into each time category and getting the time point and subject specific averages and then merging it all together again.
I know how to get the average by unique ID, but can't figure out how to add the additional condition of timecat.
This is exactly what PROC SUMMARY was designed to do.
UNTESTED CODE
proc summary data=have nway;
class id timecat;
var post;
output out=want mean=;
run;
Hi @PaigeMiller! Thank you! That was what I was looking for.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.