Statistical Procedures

papagalini · Posted 09-06-2017 11:20 PM

Hello SAS Community!

So I am new to SAS, using SAS University edition (and not the most statistically savvy person). I have ran done mixed regression models using the Proc Mixed command while blocking on participants to control for repeated observations. Just an example of a code I used for one model:

proc mixed data=mito method=ml;

class mrn sex ;

model respiration=age sex time / s outp=pred_resp;

random intercept / subject=mrn type=VC;

run;

So basically then I plot the fitted line from the mixed model in the Observed values but the only thing I managed to do yet is get a graph of predicted and absolute values (in one or separate graphs) but the line is not really the line from the equation I get from model applied above:

proc sgplot data=pred_resp;

reg x=age y=respiration / clm;

reg x=age y=pred / clm;

YAXIS GRID VALUES = (0 TO 100 BY 20);

run;

I just wanted to plot the fitted line in the observed values (original data) and also add the confidence limits and prediction intervals. Have lost days looking for this!

Thank you all!

Vic

Rick_SAS · Posted 01-03-2018 03:47 PM

In short, @StatsMan's suggestion is equivalent to creating a sliced fit plot. Many SAS regression procedures support the STORE statement to store the model in a SAS item store. You can then use PROC PLM to create the plot, as shown in the linked article.

The article also shows how to create a sliced fit plot when you want to include the observations or when you are using a procedure that does not support the STORE statement. What you do is to create a data set that contains the "slice" you want to score on. As StatsMan says, you use missing values for the response. Then you score the data and create the plot of the predicted values.

How to create a sliced fit plot in SAS

View solution in original post

Rick_SAS · Posted 09-07-2017 01:46 PM

Post your data?

I think there are some issue with what you describe. You say you have repeated measurement (time), but it is not clear how you want to handle that. Do you want to plot at a particular time such as t=0?

Here are some general suggestions:

1. Consider whether you should use the OUTPM= option to output the mean value, not the individual predictions.

2. Use the BAND statement to plot the lower and upper limits (possibly use GROUP=SEX)

3. Use the SERIES statement to draw the predicted curves (possibly use GROUP=SEX)

4. Use the SCATTER statement to plot the raw data (possibly use GROUP=SEX)

Here are some simulated data that are modeled by IGNORING (averaging over) the time variable. Maybe you or someone else can modify this approach when you tell us how to handle time.

DATA MITO;
call streaminit(1);
input mrn age sex $;
do time = 0 to 5;
   respiration = 50 + 2*(sex='F') + 0.05*age + 0.1*time + rand("normal");
   output;
end;
datalines;
1 18 F
2 26 F
3 19 M
4 25 M
5 22 F
6 30 M
;
run;

proc mixed data=mito method=ml;
class mrn sex;
model respiration=age sex /*time*/ / s outpm=pred_resp;
random intercept / subject=mrn type=VC;
run;

proc sort data=pred_resp; by age; run;

proc sgplot data=pred_resp; 
band x=age lower=Lower upper=Upper / group=sex;
series x=age y=pred / group=sex;
scatter x=age  y=respiration / group=sex;
YAXIS GRID;
run;

papagalini · Posted 09-07-2017 06:48 PM

Hello Rick,

Thank you so much for the fast response and for helping me out!

My apologies for not posting the data but it has not been published yet. However, I can publish some data that is not for publishing to use as an example if it helps.

Time in my model is a continuous variable so there are no fixed timepoints. This is clinical data from hospitalized patients and measurements differ by time collected so we need to adjust for time of measurement and hence, include time in the models. It is basically the time from hospital admission to actual measurement.

I ran your script and when group=sex is in, it looks good, but when I remove it from the sgplot then I get a curved line rather than a straight line.

The plot I am trying to re-create looks like this (below), where values are the observed ones and line of best fit is the one from the prediction equation obtained form the mixed regression model:

Layout 1.jpg

Also, can you please let me know what is the difference between OUTP and OUTPM? I used both and the pred column in the output dataset is per individual in both cases so cannot see where they differ. Also, I ran the script you sent with the pred from OUTP and OUTPM and is the same. I am really new to SAS so still trying to figure basic stuff out!

Thanks a ton!

Vic

Rick_SAS · Posted 09-08-2017 05:56 AM

The graph you show is typical for a fixed-effect model of response vs time. In your model, you have additional covariates, so you need to specify how to handle them in a 2D plot. Technically, you are fitting many curves, one for each subject, so it is not clear how you want to condense that information into one line.

For example, do you want to show the model for the mean value of the age? The median? Do you want two curves for each value of the sex variable or do you want separate graphs for male/female? These are all questions to consider in a model that has three covariates.

papagalini · Posted 09-08-2017 12:36 PM

I see, thank you Rick!

Yes, I meant to plot it by using the mean value for all the covariates not represented in the graph. I might need to add the group function too. In my current model I actually have more predictors as last night I added one more. But I guess since I just take the means of the factors not represented in y x axes it does not matter.

Thank you 🙂

papagalini · Posted 09-11-2017 12:36 PM

Sorry for the number of messages - so is there a way to achieve this type of graph?

Thanks!

Vic

kivester · Posted 12-06-2017 03:59 PM

I am trying to do the same thing. I have a generalized linear mixed model with several covariates that I have modeled in glimmix:

proc glimmix;

proc glimmix;
 class horse trainer;
 model freq_PMN/TNCC= trainer resp age inhale ups glucan resp*lps ;
 random int/ subject=horse type=ar(1);
 random _residual_;
run;

I am interested in plotting the predicted and 95% CI for freq_PMN/TNCC versus resp at the mean levels of the other covariates and overlaying the observed values. Is this possible?

Thanks!

papagalini · Posted 12-06-2017 04:09 PM

Hey Kivester,

Unfortunately, I never found out how to do this in SAS...so a colleague just did it for me in R. I bet there is a way to do this in SAS too.

I even asked a statistician at my institution that uses SAS a lot and works in epidemiology, but told me she has never done this before because they usually just present the beta coefficients and they do not need to present a plot of this. However, I was submitting a paper in a clinical journal and clinicians appear to like plots so had to find a way to do this.

Sorry!

Rick_SAS · Posted 12-06-2017 07:48 PM

If you post some sample data (it can be made up), then we could make suggestions. I proposed some sample data and said to the OP, "Maybe you ...can modify this approach [and] tell us how to handle time." The OP never addressed that question.

@papagalini Since you claim that you were able to produce the graph you want in other software, please post the graph so that we can understand what you are looking for.

kivester · Posted 12-06-2017 08:16 PM

Rick,

Here is some made up data. Thanks for agreeing to help!

Data MadeUP;
Input  horse Freq_PMN TNCC trainer resp age inhal lps glucan;
datalines;
1	6	300	8	0.35	4.2	0.34	1.4	53.1
2	4	112	8	0.064	4.416	1.492	9.143	256.411
3	20	189	4	0.191	0.137	3.105	5.427	262.197
4	17	291	4	0.124	4.188	3.126	4.923	208.315
5	18	385	1	0.058	2.956	3.289	0.945	254.197
5	0	123	1	0.025	0.464	1.784	7.311	283.147
7	7	226	2	0.175	3.965	0.668	6.183	186.320
8	17	15	4	0.171	1.774	3.091	4.858	121.950
8	17	320	4	0.148	1.699	1.088	9.965	257.282
8	2	1	4	0.129	2.507	1.012	7.521	322.331
8	17	56	4	0.179	2.538	1.924	0.651	110.123
9	15	113	3	0.110	4.037	2.661	0.585	333.831
10	3	44	1	0.182	4.877	0.077	5.208	292.303
14	20	299	7	0.177	2.238	2.490	7.207	327.078
15	17	246	1	0.043	0.581	0.930	6.753	64.075
15	7	78	1	0.013	3.826	0.622	5.364	311.640
15	9	285	1	0.032	2.302	3.636	5.400	107.441
18	8	152	5	0.016	2.722	1.801	7.350	341.055
19	14	392	5	0.160	0.737	0.999	5.862	314.837
20	10	398	7	0.112	2.973	3.691	7.995	322.038
20	20	328	7	0.039	2.221	3.991	9.583	163.122
21	13	206	4	0.148	1.245	3.986	0.648	322.854
22	17	187	2	0.197	0.926	1.096	3.555	130.459
23	20	253	1	0.171	2.663	3.850	8.469	114.644
24	11	97	1	0.086	2.594	1.739	0.959	310.298
25	16	83	4	0.072	3.385	3.934	6.602	198.959
26	14	248	8	0.002	0.546	1.020	8.069	66.514
26	14	364	8	0.027	3.378	1.559	9.159	292.377
26	14	26	8	0.107	1.262	0.825	6.589	135.586
29	3	119	1	0.169	0.517	3.756	2.659	303.409
30	9	74	1	0.059	4.357	1.232	5.881	83.229
31	6	280	1	0.035	3.508	2.747	9.789	10.971
33	6	251	1	0.118	2.325	3.156	3.601	306.553
;
proc glimmix;
 class horse trainer;
 model freq_PMN/TNCC= trainer resp age inhal lps glucan resp*lps;
random int/ subject=horse type=ar(1);
random _residual_;
run;

sld · Posted 12-07-2017 12:37 PM

Before diving into graphics for predicted values, you'll want to identify a correct model that works for your data. I have some concerns about both data and model.

Does your actual dataset have more than 24 horses and 33 observations? Your model is trying to estimate 13 fixed effects parameters, and you would need a lot more data for the model to do that well.

In the MadeUp dataset, there are two observations with Freq_PMN > TNCC. These observations are rejected by the binomial distribution because Freq_PMN must be <= TNCC. Is this a problem that appears in your actual dataset?

The model is regressing Freq_PMN/TNCC on resp (plus the other continuous predictors). Are you content to fit a random intercepts model and consequently to assume that the slopes of the regressions do not vary by horse (i.e., to not incorporate random slopes)? In the MadeUp dataset, only 5 of the 24 horses have multiple measures, ranging from n= 2 to 4, which is not enough information to fit random slopes. If the actual dataset has little by way of repeated measures on horses, then random slopes is a moot point.

I doubt that your RANDOM statements are actually doing what you intended them to do. The best resource for this topic is Walt Stroup's text

https://www.crcpress.com/Generalized-Linear-Mixed-Models-Modern-Concepts-Methods-and-Applications/St...

One way to acquire predicted values for graphing is to score a new data set using the PLM procedure. See

https://support.sas.com/resources/papers/proceedings10/258-2010.pdf

and

http://support.sas.com/kb/33/307.html

and

https://support.sas.com/resources/papers/proceedings15/SAS1919-2015.pdf

Pay attention to the information about scoring a mixed model in this last link.

Hope this helps some.

kivester · Posted 12-07-2017 09:24 PM

@sld,

Thanks for your help. I haven't read all of the links that you have provided yet, but to answer some of your questions, my actual dataset has 67 observations. Yes, I have some subjects that I have only one observation for, and others that I have 2 or 3 observations.

The instances of PMN>TNCC is an accident of the random data I used. This does not occur in my data set.

Yes, I am happy to fit a random intercept model. I have included the random _residual_ statement due to apparent over dispersion.

Thanks!

Katy

sld · Posted 12-07-2017 09:55 PM

Hi Katy,

Thank you for the update.

There are rules of thumb for the ratio of number of observations to number of estimated parameters that range from 10:1 to 30:1, or more or less. (It's a "rule of thumb", so that's what we can expect.) Regardless, with only 67 observations spread over an unknown but smaller number of horses, your current model is attempting too much (i.e., it is overfitted). I suggest that you try to find a more parsimonious model with fewer predictor variables.

You do want to assess whether data may be overdispersed with respect to the binomial distribution. But first you should consider your other RANDOM statement:

random int/ subject=horse type=ar(1);

This statement specifies that horses are not independent--it specifies that horses are correlated according to an AR(1) structure, which makes no sense to me. What are you attempting to do with this RANDOM statement?

papagalini · Posted 12-07-2017 01:07 AM

Hello Rick,

Thank you for your response. I have attached the graph that my colleague produced in other software.

Rick_SAS · Posted 12-07-2017 06:10 AM

@kivester There is no "mean level" for the Trainer variable. For categorical variables, you can specify a reference value on the CLASS statement and then use that reference level for the summary plot. For example, you can say

CLASS horse trainer(ref='1');

to use Trainer=1 as the reference level.

Statistical Procedures

How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Re: How to plot regression line from mixed models in my observed values?

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...