BookmarkSubscribeRSS Feed
kc
Quartz | Level 8 kc
Quartz | Level 8

I am trying to run piece wise linear regression on a longitudinal dataset (because growth curve modeling is giving results which are clinically not plausible at certain time points). The sample dataset is attached in excel format and has the following columns/variables 

1. Subject ID

2. Clinically planned event name (total 7 time points possible for a subject - 1 month, 6 months, 1 year, 2 years, 3 years, 4 years, 5 years)

3. Time (time in years)

4. Summary Score (the dependent variable in the model)

5. m6 (dummy variable for the first 6 months)

6. post6m (dummy variable for post - 6 months)

7. Group (treatment group)

 

The model I currently have is as follows:

 

proc glimmix data=PLR;
class subjectid;
model summaryscore = m6 post6m/solution;
random intercept m6 post6m/ subject=subjectid type=chol;
run;

 

Here is the question I have:

How do I get mean summary score by treatment group and difference between treatment groups in mean summary score (along with 95% CI and p-values) at the 7 different time points in the study? In other words, can someone help me with the syntax to add treatment group and time, and the interaction between treatment and time as covariates in the model, to get the mean and mean difference in summary scores between treatment groups, at various time points?

 

Here is what I need:

 

Predicted Mean Values (95% CI)

Predicted Mean Difference
(TRT1-TRT2), 95%CI

P-value

 

TRT1

TRT2

 

Summary Score

 

 

 

 

1 Month

xx.x (xx.x, xx.x)

xx.x (xx.x, xx.x)

xx.x (xx.x, xx.x)

0.xxxx

6 Months

xx.x (xx.x, xx.x)

xx.x (xx.x, xx.x)

xx.x (xx.x, xx.x)

0.xxxx

1 Year

xx.x (xx.x, xx.x)

xx.x (xx.x, xx.x)

xx.x (xx.x, xx.x)

0.xxxx

2 Years

xx.x (xx.x, xx.x)

xx.x (xx.x, xx.x)

xx.x (xx.x, xx.x)

0.xxxx

3 Years

xx.x (xx.x, xx.x)

xx.x (xx.x, xx.x)

xx.x (xx.x, xx.x)

0.xxxx

4 Years

xx.x (xx.x, xx.x)

xx.x (xx.x, xx.x)

xx.x (xx.x, xx.x)

0.xxxx

5 Years

xx.x (xx.x, xx.x)

xx.x (xx.x, xx.x)

xx.x (xx.x, xx.x)

0.xxxx

 

Any suggestions are greatly appreciated!

4 REPLIES 4
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Your desired comparisons could be obtained by using time as a categorical factor in a mixed model ANOVA, rather than by using time as a continuous factor in a random coefficients regression.

 

When I plot your data, using

 

proc sgpanel data=plr noautolegend;
    panelby group;
    series x=time y=summary_score / group=subject_id markers lineattrs=(pattern=1);
    run;

I wonder, Why do you want a piecewise linear regression? How many breakpoints do you need? Do you know where the breakpoint(s) for the pieces is(are) (for example, at 6 months), or do you need to estimate the location(s) of the breakpoint(s)?

 

Is your actual dataset larger than the one you posted? (In the future, please post as a CSV file, rather than Excel.) Few subjects in your posted dataset have values for all 7 times (5 out of 27 subjects), and many have only 1 value (8 out of 27). Only 14 subjects have data at both 1 month and 6 months. Can you sensibly fit a model (with or without random slopes) for multiple linear pieces using a data set that is so incomplete? I'd say, probably not. I also would be concerned about potential bias in either ANOVA or regression models due to missing data and why data are missing.

 

I hope this helps.

 

kc
Quartz | Level 8 kc
Quartz | Level 8

I wonder, Why do you want a piecewise linear regression? How many breakpoints do you need? Do you know where the breakpoint(s) for the pieces is(are) (for example, at 6 months), or do you need to estimate the location(s) of the breakpoint(s)?

Well, from prior clinical knowledge, there is no significant effect of treatment on summary score beyond the 6 month timepoint. Therefore, there is need for only 2 breakpoints, one each at 1 month and 6 month. I included only one breakpoint at 6 month in my code as an example.

 

Is your actual dataset larger than the one you posted? (In the future, please post as a CSV file, rather than Excel.)

Yes, much larger.

Few subjects in your posted dataset have values for all 7 times (5 out of 27 subjects), and many have only 1 value (8 out of 27). Only 14 subjects have data at both 1 month and 6 months.

Correct - some subjects have summary score data at all times, and some don't. The data, although dummy, mimics the data from an actual trial. So, missing data is quite common.

Can you sensibly fit a model (with or without random slopes) for multiple linear pieces using a data set that is so incomplete? I'd say, probably not. I also would be concerned about potential bias in either ANOVA or regression models due to missing data and why data are missing.

Multiple methods (paired t-tests, ANCOVA, mixed effects models) are employed in analyzing these data. Also, an underlying assumption of longitudinal growth curve models is that the missing data is missing at random.

 

So, any help with the syntax in running a piecewise regression to fill the table in my original post would be great. Let me know if more data (in CSV format this time) would be helpful in carrying out this task!

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

@kc wrote:

I wonder, Why do you want a piecewise linear regression? How many breakpoints do you need? Do you know where the breakpoint(s) for the pieces is(are) (for example, at 6 months), or do you need to estimate the location(s) of the breakpoint(s)?

Well, from prior clinical knowledge, there is no significant effect of treatment on summary score beyond the 6 month timepoint. Therefore, there is need for only 2 breakpoints, one each at 1 month and 6 month. I included only one breakpoint at 6 month in my code as an example.

 

 


I'm thinking of a breakpoint as a value at which the slope changes, i.e., the boundary between the segments. You don't have any data prior to 1 month, so you can't have a breakpoint there.

 


Multiple methods (paired t-tests, ANCOVA, mixed effects models) are employed in analyzing these data. Also, an underlying assumption of longitudinal growth curve models is that the missing data is missing at random.

 

 

It is certainly convenient to assume that data are missing completely at random. But convenience does not necessarily make it true. If data are not MCAR, then any statistical method will be subject to bias.

 

So, any help with the syntax in running a piecewise regression to fill the table in my original post would be great. Let me know if more data (in CSV format this time) would be helpful in carrying out this task!

 


 

Here is some code to consider; I provide no guarantees so you'll want to understand it thoroughly. It includes some graphics that might help you understand what the model is doing and confirm visually that it might be doing what you want.

 

/*  Create variable for breakpoint */
data plr;
    set plr;
    time_6 = max(time, 0.5); /* Breakpoint at 0.5 */
    run;
proc tabulate data=plr;
    class time time_6;
    table time, time_6;
    run;
/*  Fit random coefficients model */
proc glimmix data=plr;
    class subject_id group;
    model summary_score = group|time group|time_6 / solution ;
    random intercept time time_6 / subject=subject_id type=un g gcorr; /* random intercepts, random slopes */
    output out=plr_out2 pred(noblup)=predpa pred=pred;
    run;
proc sort data=plr_out2 out=plr_out2_srt;
    by time;
/*  Plot fitted regression for each subject and population-averaged regression */
proc sgpanel data=plr_out2_srt;
    panelby group;
    series x=time y=pred / group=subject_id markers;
    series x=time y=predpa / lineattrs=(thickness=2 color=black);
    run;
/*  Plot population-averaged regression by group in one figure for comparison */
proc sgplot data=plr_out2_srt;
    series x=time y=predpa / group=group;
    run;
/*  Plot observed data for each subject with its regression */
proc sgpanel data=plr_out2_srt;
    panelby subject_id / columns=3;
    series x=time y=pred / markers lineattrs=(thickness=2 color=black);
    series x=time y=summary_score / markers;
    run;

I'll give you a hint about acquiring the mean comparisons that you want: use the LSMEANS statement with the AT option. 

 

I hope this helps.

 

kc
Quartz | Level 8 kc
Quartz | Level 8

Thank you! Will update the post after working through the code.

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 5641 views
  • 1 like
  • 2 in conversation