BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BarryDeCicco
Obsidian | Level 7

Hello,

 

 I'm trying to fit a mixture model (or, with transformations, an piecewise regression) where I assume that there are from 1 to three groups, sequentially along the independent variable.   I would love to do this in Proc FMM, but when I tried, it seemed to assume that any point could be in any group (mixed groups).  I would like sequential groups.

 

Has anybody succeeded in doing this?

 

Thanks,

 

Barry

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

To add to PGStats's coments, a spline effect enables you to define piecewise polynomials of any degree. If you put knots at the cutpoints, then you will get a continuous piecewise fit.  You can also add additional knots at the cutpoints to achieve a discontinuous fit, as shown in Figure 117.24 (Discontinuous Spline Fit) in the documentation for the TRANSREG procedure.

 

 

With regard to the FMM code you posted, I don't think yourcode makes sense.  Your code tells SAS to "fit a linear model, then try to find as many as three normal components such that the distribution of the residuals are best described as being a mixture of these component distributions."  But that's not the problem you are trying to solve! Instead, you want to fit three separate regression models, each with a different slope and intercept.  That requires an class variable that indicates the domain of each PL model.

 

If you use the sample data that I provided earlyier, this syntax works in FMM:

proc fmm data=a; 
class c;
model y = x | c; 
output out=FMM_model pred=mixpred class=mixcomp; 
run;

proc sgplot data=FMM_Model;
scatter x=x y=y / group=c;
series  x=x y=mixpred;
run;

Notice, however, that this really isn't a finite mixture model. There is only one component used to model the residual distribution. Therefore you can use GLM or ROBUSTREG or TRANSREG to get the same model.

 

 

View solution in original post

8 REPLIES 8
Rick_SAS
SAS Super FREQ

Please post the code that you are using.

 

An important consideration in PL models is whether you assume that the cutpoints are known or whether they need to be estimated. Also important is whether the regression model is continuous or discontinuous.

 

For known cutpoints and a discontinuous model, you can simply discretize the X axis and include a categorical variable that specifies the domain for the model.  For example, the following data specifies three domains [0,1], [1,2], and [2,3] with a different model define on each interval.  You can use PROC GLM to solve for the parameter estimates on each interval.  You can even get a crdue fit plot, although of course GLM doesn't know that the prediction lines should be restricted to the intervals:

 

data a;
call streaminit(1);
do x = 0 to 1 by 0.1;
   c = 1;
   y = 2 - 3*x + rand("normal");
   output;
end;
do x = 1 to 2 by 0.1;
   c = 2;
   y = -2 + 3*x + rand("normal");
   output;
end;
do x = 2 to 3 by 0.1;
   c = 3;
   y = 0 + x + rand("normal");
   output;
end;
run;

proc glm data=A plots=fitplot;
class c;
model y = c | x / solution;
run;
BarryDeCicco
Obsidian | Level 7

 

Rick, thanks for a quick response!

Here's the code:

proc fmm data=a.transreg gconv=0 plots=all ;
model Log_Log = Log_KM / kmax=3 krestart;
output out=FMM_model pred=mixpred resid=mixresid class=mixcomp;
ods output ParameterEstimates=FMM_parms ;
ods output FitStatistics=FMM_FitStatistics;
run;


When I plot a scatterplot of the actual vs. predicted values,
with the class variable 'mixcomp' as a group, it's clear that the two groups are intermingled.

 

BarryDeCicco
Obsidian | Level 7

This system stripped all line breaks!

Is there any way to fix that?
ballardw
Super User

@BarryDeCicco wrote:

This system stripped all line breaks!

Is there any way to fix that?

Did you post using the "run" icon box or the {i} box?

Code from the SAS editor works best if posted using the Run icon box.

BarryDeCicco
Obsidian | Level 7
I did that, and it was all so pretty...... 🙂
PGStats
Opal | Level 21

I don't know if this is pertinent to your problem. An easy way to fit a piecewise linear model (with fixed breaks) is with proc ROBUSTREG and a spline of degree 1:

 

proc robustreg data=sashelp.fish;
effect spl = spline(length1 / degree=1 knotmethod=list(20));
model weight = spl;
output out=fishPred p=predweight;
run;

proc sort data=fishPred; by length1; run;

proc sgplot data=fishPred;
scatter x=length1 y=weight;
series x=length1 y=predweight;
run;

SGPlot5.png

PG
Rick_SAS
SAS Super FREQ

To add to PGStats's coments, a spline effect enables you to define piecewise polynomials of any degree. If you put knots at the cutpoints, then you will get a continuous piecewise fit.  You can also add additional knots at the cutpoints to achieve a discontinuous fit, as shown in Figure 117.24 (Discontinuous Spline Fit) in the documentation for the TRANSREG procedure.

 

 

With regard to the FMM code you posted, I don't think yourcode makes sense.  Your code tells SAS to "fit a linear model, then try to find as many as three normal components such that the distribution of the residuals are best described as being a mixture of these component distributions."  But that's not the problem you are trying to solve! Instead, you want to fit three separate regression models, each with a different slope and intercept.  That requires an class variable that indicates the domain of each PL model.

 

If you use the sample data that I provided earlyier, this syntax works in FMM:

proc fmm data=a; 
class c;
model y = x | c; 
output out=FMM_model pred=mixpred class=mixcomp; 
run;

proc sgplot data=FMM_Model;
scatter x=x y=y / group=c;
series  x=x y=mixpred;
run;

Notice, however, that this really isn't a finite mixture model. There is only one component used to model the residual distribution. Therefore you can use GLM or ROBUSTREG or TRANSREG to get the same model.

 

 

BarryDeCicco
Obsidian | Level 7

Thank you very much, Rick!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2473 views
  • 5 likes
  • 4 in conversation