I am trying to run piece wise linear regression on a longitudinal dataset (because growth curve modeling is giving results which are clinically not plausible at certain time points). The sample dataset is attached in excel format and has the following columns/variables
1. Subject ID
2. Clinically planned event name (total 7 time points possible for a subject  1 month, 6 months, 1 year, 2 years, 3 years, 4 years, 5 years)
3. Time (time in years)
4. Summary Score (the dependent variable in the model)
5. m6 (dummy variable for the first 6 months)
6. post6m (dummy variable for post  6 months)
7. Group (treatment group)
The model I currently have is as follows:
proc glimmix data=PLR;
class subjectid;
model summaryscore = m6 post6m/solution;
random intercept m6 post6m/ subject=subjectid type=chol;
run;
Here is the question I have:
How do I get mean summary score by treatment group and difference between treatment groups in mean summary score (along with 95% CI and pvalues) at the 7 different time points in the study? In other words, can someone help me with the syntax to add treatment group and time, and the interaction between treatment and time as covariates in the model, to get the mean and mean difference in summary scores between treatment groups, at various time points?
Here is what I need:

Predicted Mean Values (95% CI) 
Predicted Mean Difference 
Pvalue 


TRT1 
TRT2 


Summary Score 




1 Month 
xx.x (xx.x, xx.x) 
xx.x (xx.x, xx.x) 
xx.x (xx.x, xx.x) 
0.xxxx 
6 Months 
xx.x (xx.x, xx.x) 
xx.x (xx.x, xx.x) 
xx.x (xx.x, xx.x) 
0.xxxx 
1 Year 
xx.x (xx.x, xx.x) 
xx.x (xx.x, xx.x) 
xx.x (xx.x, xx.x) 
0.xxxx 
2 Years 
xx.x (xx.x, xx.x) 
xx.x (xx.x, xx.x) 
xx.x (xx.x, xx.x) 
0.xxxx 
3 Years 
xx.x (xx.x, xx.x) 
xx.x (xx.x, xx.x) 
xx.x (xx.x, xx.x) 
0.xxxx 
4 Years 
xx.x (xx.x, xx.x) 
xx.x (xx.x, xx.x) 
xx.x (xx.x, xx.x) 
0.xxxx 
5 Years 
xx.x (xx.x, xx.x) 
xx.x (xx.x, xx.x) 
xx.x (xx.x, xx.x) 
0.xxxx 
Any suggestions are greatly appreciated!
Your desired comparisons could be obtained by using time as a categorical factor in a mixed model ANOVA, rather than by using time as a continuous factor in a random coefficients regression.
When I plot your data, using
proc sgpanel data=plr noautolegend;
panelby group;
series x=time y=summary_score / group=subject_id markers lineattrs=(pattern=1);
run;
I wonder, Why do you want a piecewise linear regression? How many breakpoints do you need? Do you know where the breakpoint(s) for the pieces is(are) (for example, at 6 months), or do you need to estimate the location(s) of the breakpoint(s)?
Is your actual dataset larger than the one you posted? (In the future, please post as a CSV file, rather than Excel.) Few subjects in your posted dataset have values for all 7 times (5 out of 27 subjects), and many have only 1 value (8 out of 27). Only 14 subjects have data at both 1 month and 6 months. Can you sensibly fit a model (with or without random slopes) for multiple linear pieces using a data set that is so incomplete? I'd say, probably not. I also would be concerned about potential bias in either ANOVA or regression models due to missing data and why data are missing.
I hope this helps.
I wonder, Why do you want a piecewise linear regression? How many breakpoints do you need? Do you know where the breakpoint(s) for the pieces is(are) (for example, at 6 months), or do you need to estimate the location(s) of the breakpoint(s)?
Well, from prior clinical knowledge, there is no significant effect of treatment on summary score beyond the 6 month timepoint. Therefore, there is need for only 2 breakpoints, one each at 1 month and 6 month. I included only one breakpoint at 6 month in my code as an example.
Is your actual dataset larger than the one you posted? (In the future, please post as a CSV file, rather than Excel.)
Yes, much larger.
Few subjects in your posted dataset have values for all 7 times (5 out of 27 subjects), and many have only 1 value (8 out of 27). Only 14 subjects have data at both 1 month and 6 months.
Correct  some subjects have summary score data at all times, and some don't. The data, although dummy, mimics the data from an actual trial. So, missing data is quite common.
Can you sensibly fit a model (with or without random slopes) for multiple linear pieces using a data set that is so incomplete? I'd say, probably not. I also would be concerned about potential bias in either ANOVA or regression models due to missing data and why data are missing.
Multiple methods (paired ttests, ANCOVA, mixed effects models) are employed in analyzing these data. Also, an underlying assumption of longitudinal growth curve models is that the missing data is missing at random.
So, any help with the syntax in running a piecewise regression to fill the table in my original post would be great. Let me know if more data (in CSV format this time) would be helpful in carrying out this task!
@kc wrote:
I wonder, Why do you want a piecewise linear regression? How many breakpoints do you need? Do you know where the breakpoint(s) for the pieces is(are) (for example, at 6 months), or do you need to estimate the location(s) of the breakpoint(s)?
Well, from prior clinical knowledge, there is no significant effect of treatment on summary score beyond the 6 month timepoint. Therefore, there is need for only 2 breakpoints, one each at 1 month and 6 month. I included only one breakpoint at 6 month in my code as an example.
I'm thinking of a breakpoint as a value at which the slope changes, i.e., the boundary between the segments. You don't have any data prior to 1 month, so you can't have a breakpoint there.
Multiple methods (paired ttests, ANCOVA, mixed effects models) are employed in analyzing these data. Also, an underlying assumption of longitudinal growth curve models is that the missing data is missing at random.
It is certainly convenient to assume that data are missing completely at random. But convenience does not necessarily make it true. If data are not MCAR, then any statistical method will be subject to bias.
So, any help with the syntax in running a piecewise regression to fill the table in my original post would be great. Let me know if more data (in CSV format this time) would be helpful in carrying out this task!
Here is some code to consider; I provide no guarantees so you'll want to understand it thoroughly. It includes some graphics that might help you understand what the model is doing and confirm visually that it might be doing what you want.
/* Create variable for breakpoint */
data plr;
set plr;
time_6 = max(time, 0.5); /* Breakpoint at 0.5 */
run;
proc tabulate data=plr;
class time time_6;
table time, time_6;
run;
/* Fit random coefficients model */
proc glimmix data=plr;
class subject_id group;
model summary_score = grouptime grouptime_6 / solution ;
random intercept time time_6 / subject=subject_id type=un g gcorr; /* random intercepts, random slopes */
output out=plr_out2 pred(noblup)=predpa pred=pred;
run;
proc sort data=plr_out2 out=plr_out2_srt;
by time;
/* Plot fitted regression for each subject and populationaveraged regression */
proc sgpanel data=plr_out2_srt;
panelby group;
series x=time y=pred / group=subject_id markers;
series x=time y=predpa / lineattrs=(thickness=2 color=black);
run;
/* Plot populationaveraged regression by group in one figure for comparison */
proc sgplot data=plr_out2_srt;
series x=time y=predpa / group=group;
run;
/* Plot observed data for each subject with its regression */
proc sgpanel data=plr_out2_srt;
panelby subject_id / columns=3;
series x=time y=pred / markers lineattrs=(thickness=2 color=black);
series x=time y=summary_score / markers;
run;
I'll give you a hint about acquiring the mean comparisons that you want: use the LSMEANS statement with the AT option.
I hope this helps.
Thank you! Will update the post after working through the code.
SAS Innovate 2025 is scheduled for May 69 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.