- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In short, @StatsMan's suggestion is equivalent to creating a sliced fit plot. Many SAS regression procedures support the STORE statement to store the model in a SAS item store. You can then use PROC PLM to create the plot, as shown in the linked article.
The article also shows how to create a sliced fit plot when you want to include the observations or when you are using a procedure that does not support the STORE statement. What you do is to create a data set that contains the "slice" you want to score on. As StatsMan says, you use missing values for the response. Then you score the data and create the plot of the predicted values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Post your data?
I think there are some issue with what you describe. You say you have repeated measurement (time), but it is not clear how you want to handle that. Do you want to plot at a particular time such as t=0?
Here are some general suggestions:
1. Consider whether you should use the OUTPM= option to output the mean value, not the individual predictions.
2. Use the BAND statement to plot the lower and upper limits (possibly use GROUP=SEX)
3. Use the SERIES statement to draw the predicted curves (possibly use GROUP=SEX)
4. Use the SCATTER statement to plot the raw data (possibly use GROUP=SEX)
Here are some simulated data that are modeled by IGNORING (averaging over) the time variable. Maybe you or someone else can modify this approach when you tell us how to handle time.
DATA MITO;
call streaminit(1);
input mrn age sex $;
do time = 0 to 5;
respiration = 50 + 2*(sex='F') + 0.05*age + 0.1*time + rand("normal");
output;
end;
datalines;
1 18 F
2 26 F
3 19 M
4 25 M
5 22 F
6 30 M
;
run;
proc mixed data=mito method=ml;
class mrn sex;
model respiration=age sex /*time*/ / s outpm=pred_resp;
random intercept / subject=mrn type=VC;
run;
proc sort data=pred_resp; by age; run;
proc sgplot data=pred_resp;
band x=age lower=Lower upper=Upper / group=sex;
series x=age y=pred / group=sex;
scatter x=age y=respiration / group=sex;
YAXIS GRID;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello Rick,
Thank you so much for the fast response and for helping me out!
My apologies for not posting the data but it has not been published yet. However, I can publish some data that is not for publishing to use as an example if it helps.
Time in my model is a continuous variable so there are no fixed timepoints. This is clinical data from hospitalized patients and measurements differ by time collected so we need to adjust for time of measurement and hence, include time in the models. It is basically the time from hospital admission to actual measurement.
I ran your script and when group=sex is in, it looks good, but when I remove it from the sgplot then I get a curved line rather than a straight line.
The plot I am trying to re-create looks like this (below), where values are the observed ones and line of best fit is the one from the prediction equation obtained form the mixed regression model:
Also, can you please let me know what is the difference between OUTP and OUTPM? I used both and the pred column in the output dataset is per individual in both cases so cannot see where they differ. Also, I ran the script you sent with the pred from OUTP and OUTPM and is the same. I am really new to SAS so still trying to figure basic stuff out!
Thanks a ton!
Vic
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The graph you show is typical for a fixed-effect model of response vs time. In your model, you have additional covariates, so you need to specify how to handle them in a 2D plot. Technically, you are fitting many curves, one for each subject, so it is not clear how you want to condense that information into one line.
For example, do you want to show the model for the mean value of the age? The median? Do you want two curves for each value of the sex variable or do you want separate graphs for male/female? These are all questions to consider in a model that has three covariates.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I see, thank you Rick!
Yes, I meant to plot it by using the mean value for all the covariates not represented in the graph. I might need to add the group function too. In my current model I actually have more predictors as last night I added one more. But I guess since I just take the means of the factors not represented in y x axes it does not matter.
Thank you 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for the number of messages - so is there a way to achieve this type of graph?
Thanks!
Vic
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to do the same thing. I have a generalized linear mixed model with several covariates that I have modeled in glimmix:
proc glimmix;
proc glimmix;
class horse trainer;
model freq_PMN/TNCC= trainer resp age inhale ups glucan resp*lps ;
random int/ subject=horse type=ar(1);
random _residual_;
run;
I am interested in plotting the predicted and 95% CI for freq_PMN/TNCC versus resp at the mean levels of the other covariates and overlaying the observed values. Is this possible?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hey Kivester,
Unfortunately, I never found out how to do this in SAS...so a colleague just did it for me in R. I bet there is a way to do this in SAS too.
I even asked a statistician at my institution that uses SAS a lot and works in epidemiology, but told me she has never done this before because they usually just present the beta coefficients and they do not need to present a plot of this. However, I was submitting a paper in a clinical journal and clinicians appear to like plots so had to find a way to do this.
Sorry!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you post some sample data (it can be made up), then we could make suggestions. I proposed some sample data and said to the OP, "Maybe you ...can modify this approach [and] tell us how to handle time." The OP never addressed that question.
@papagalini Since you claim that you were able to produce the graph you want in other software, please post the graph so that we can understand what you are looking for.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Rick,
Here is some made up data. Thanks for agreeing to help!
Data MadeUP;
Input horse Freq_PMN TNCC trainer resp age inhal lps glucan;
datalines;
1 6 300 8 0.35 4.2 0.34 1.4 53.1
2 4 112 8 0.064 4.416 1.492 9.143 256.411
3 20 189 4 0.191 0.137 3.105 5.427 262.197
4 17 291 4 0.124 4.188 3.126 4.923 208.315
5 18 385 1 0.058 2.956 3.289 0.945 254.197
5 0 123 1 0.025 0.464 1.784 7.311 283.147
7 7 226 2 0.175 3.965 0.668 6.183 186.320
8 17 15 4 0.171 1.774 3.091 4.858 121.950
8 17 320 4 0.148 1.699 1.088 9.965 257.282
8 2 1 4 0.129 2.507 1.012 7.521 322.331
8 17 56 4 0.179 2.538 1.924 0.651 110.123
9 15 113 3 0.110 4.037 2.661 0.585 333.831
10 3 44 1 0.182 4.877 0.077 5.208 292.303
14 20 299 7 0.177 2.238 2.490 7.207 327.078
15 17 246 1 0.043 0.581 0.930 6.753 64.075
15 7 78 1 0.013 3.826 0.622 5.364 311.640
15 9 285 1 0.032 2.302 3.636 5.400 107.441
18 8 152 5 0.016 2.722 1.801 7.350 341.055
19 14 392 5 0.160 0.737 0.999 5.862 314.837
20 10 398 7 0.112 2.973 3.691 7.995 322.038
20 20 328 7 0.039 2.221 3.991 9.583 163.122
21 13 206 4 0.148 1.245 3.986 0.648 322.854
22 17 187 2 0.197 0.926 1.096 3.555 130.459
23 20 253 1 0.171 2.663 3.850 8.469 114.644
24 11 97 1 0.086 2.594 1.739 0.959 310.298
25 16 83 4 0.072 3.385 3.934 6.602 198.959
26 14 248 8 0.002 0.546 1.020 8.069 66.514
26 14 364 8 0.027 3.378 1.559 9.159 292.377
26 14 26 8 0.107 1.262 0.825 6.589 135.586
29 3 119 1 0.169 0.517 3.756 2.659 303.409
30 9 74 1 0.059 4.357 1.232 5.881 83.229
31 6 280 1 0.035 3.508 2.747 9.789 10.971
33 6 251 1 0.118 2.325 3.156 3.601 306.553
;
proc glimmix;
class horse trainer;
model freq_PMN/TNCC= trainer resp age inhal lps glucan resp*lps;
random int/ subject=horse type=ar(1);
random _residual_;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Before diving into graphics for predicted values, you'll want to identify a correct model that works for your data. I have some concerns about both data and model.
Does your actual dataset have more than 24 horses and 33 observations? Your model is trying to estimate 13 fixed effects parameters, and you would need a lot more data for the model to do that well.
In the MadeUp dataset, there are two observations with Freq_PMN > TNCC. These observations are rejected by the binomial distribution because Freq_PMN must be <= TNCC. Is this a problem that appears in your actual dataset?
The model is regressing Freq_PMN/TNCC on resp (plus the other continuous predictors). Are you content to fit a random intercepts model and consequently to assume that the slopes of the regressions do not vary by horse (i.e., to not incorporate random slopes)? In the MadeUp dataset, only 5 of the 24 horses have multiple measures, ranging from n= 2 to 4, which is not enough information to fit random slopes. If the actual dataset has little by way of repeated measures on horses, then random slopes is a moot point.
I doubt that your RANDOM statements are actually doing what you intended them to do. The best resource for this topic is Walt Stroup's text
One way to acquire predicted values for graphing is to score a new data set using the PLM procedure. See
https://support.sas.com/resources/papers/proceedings10/258-2010.pdf
and
http://support.sas.com/kb/33/307.html
and
https://support.sas.com/resources/papers/proceedings15/SAS1919-2015.pdf
Pay attention to the information about scoring a mixed model in this last link.
Hope this helps some.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@sld,
Thanks for your help. I haven't read all of the links that you have provided yet, but to answer some of your questions, my actual dataset has 67 observations. Yes, I have some subjects that I have only one observation for, and others that I have 2 or 3 observations.
The instances of PMN>TNCC is an accident of the random data I used. This does not occur in my data set.
Yes, I am happy to fit a random intercept model. I have included the random _residual_ statement due to apparent over dispersion.
Thanks!
Katy
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Katy,
Thank you for the update.
There are rules of thumb for the ratio of number of observations to number of estimated parameters that range from 10:1 to 30:1, or more or less. (It's a "rule of thumb", so that's what we can expect.) Regardless, with only 67 observations spread over an unknown but smaller number of horses, your current model is attempting too much (i.e., it is overfitted). I suggest that you try to find a more parsimonious model with fewer predictor variables.
You do want to assess whether data may be overdispersed with respect to the binomial distribution. But first you should consider your other RANDOM statement:
random int/ subject=horse type=ar(1);
This statement specifies that horses are not independent--it specifies that horses are correlated according to an AR(1) structure, which makes no sense to me. What are you attempting to do with this RANDOM statement?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello Rick,
Thank you for your response. I have attached the graph that my colleague produced in other software.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@kivester There is no "mean level" for the Trainer variable. For categorical variables, you can specify a reference value on the CLASS statement and then use that reference level for the summary plot. For example, you can say
CLASS horse trainer(ref='1');
to use Trainer=1 as the reference level.