BookmarkSubscribeRSS Feed
mgrasmussen
Quartz | Level 8

Dear SAS experts

 

I am a non-statistician who is seeking some help with getting basic syntax running.

 

I am trying to run a model using proc mixed, but I appear to be having an issue when using lsmeans.

 

First, I have created restricted cubic basis functions (4 new variables; spl1-spl4) based on the variable days. Days is a continuous variable that includes days since 1st of January 2020 to 31st of December 2021.

 

For each value of days, there is a measure of Hba1c. Hba1c is a clinical measure of blood sugar levels during the last three months. It varies in values from 7-173.

 

To begin with (just to get the basic syntax working), I want to model the splines of days and the variable sex and estimate the general between-sex difference in mean hba1c. I am using a mixed-effect model to take into account that there are multiple measurements of hba1c for each individual (the amount varies between subjects).

 

I have written the following syntax in proc mixed:

 

proc mixed data=dataset_name;

class pseudo_id sex;

model hba1c=spl: sex sex*spl1 sex*spl2 sex*spl3 sex*spl4 / solution;           

random intercept days / subject = pseudo_id type=un;

lsmeans sex / pdiff;

run;

 

The log output is:

1          ;*';*";*/;quit;run;

2          OPTIONS PAGENO=MIN;

3          %LET _CLIENTTASKLABEL='7. Hba1c_analysis';

4          %LET _CLIENTPROCESSFLOWNAME='Process Flow';

5          %LET _CLIENTPROJECTPATH='';

6          %LET _CLIENTPROJECTPATHHOST='';

7          %LET _CLIENTPROJECTNAME='';

8          %LET _SASPROGRAMFILE='T:\ouh\afd\covid_forskningsartikel\Programfiler og output\7. Hba1c_analysis.sas';

9          %LET _SASPROGRAMFILEHOST='L111353';

10        

11         ODS _ALL_ CLOSE;

12         OPTIONS DEV=PNG;

13         GOPTIONS XPIXELS=0 YPIXELS=0;

14         FILENAME EGSR TEMP;

15         ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR

16             STYLE=HtmlBlue

17             STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/HtmlBlue.css")

18             NOGTITLE

19             NOGFOOTNOTE

20             GPATH=&sasworklocation

SYMBOLGEN:  Macro variable SASWORKLOCATION resolves to "E:\Work\_TD2300848_SRVESBAPPSAS34V_\Prc2/"

21             ENCODING=UTF8

22             options(rolap="on")

23         ;

NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR

24        

25         GOPTIONS ACCESSIBLE;

26         proc mixed data=hba1c_sdco6;

27         class pseudo_id køn_20;

28         model visres_numerisk=spl: køn_20 køn_20*spl1 køn_20*spl2 køn_20*spl3 køn_20*spl4 / solution outp=pred6r outpm = pred6f;        

29         random intercept prvtdato_num / subject = pseudo_id type=un;

30         lsmeans køn_20 / pdiff at means;

31         run;

 

NOTE: 203 observations are not included because of missing values.

NOTE: Convergence criteria met.

MVA_DSIO.OPEN_CLOSE| _DISARM|         STOP| _DISARM| 2023-03-03T11:49:59,790+01:00| _DISARM| WorkspaceServer| _DISARM| | _DISARM| |

_DISARM| | _DISARM| 28151808| _DISARM| 11| _DISARM| 18| _DISARM| 194607184| _DISARM| 3544470331| _DISARM| | _DISARM| | _DISARM| |

_DISARM| | _DISARM| | _DISARM| | _ENDDISARM

NOTE: The data set WORK.PRED6R has 22721 observations and 26 variables.

NOTE: Compressing data set WORK.PRED6R decreased size by 27.40 percent.

      Compressed is 53 pages; un-compressed would require 73 pages.

MVA_DSIO.OPEN_CLOSE| _DISARM|         STOP| _DISARM| 2023-03-03T11:49:59,790+01:00| _DISARM| WorkspaceServer| _DISARM| | _DISARM| |

_DISARM| | _DISARM| 28151808| _DISARM| 11| _DISARM| 18| _DISARM| 194936189| _DISARM| 3544799946| _DISARM| | _DISARM| | _DISARM| |

_DISARM| | _DISARM| | _DISARM| | _ENDDISARM

NOTE: The data set WORK.PRED6F has 22721 observations and 26 variables.

NOTE: Compressing data set WORK.PRED6F decreased size by 27.40 percent.

      Compressed is 53 pages; un-compressed would require 73 pages.

MVA_DSIO.OPEN_CLOSE| _DISARM|         STOP| _DISARM| 2023-03-03T11:49:59,790+01:00| _DISARM| WorkspaceServer| _DISARM| | _DISARM| |

_DISARM| | _DISARM| 28151808| _DISARM| 11| _DISARM| 18| _DISARM| 195132996| _DISARM| 3544997465| _DISARM| | _DISARM| | _DISARM| |

_DISARM| | _DISARM| | _DISARM| | _ENDDISARM

PROCEDURE| _DISARM|         STOP| _DISARM| 2023-03-03T11:49:59,790+01:00| _DISARM| WorkspaceServer| _DISARM| | _DISARM| | _DISARM|

| _DISARM| 28151808| _DISARM| 11| _DISARM| 18| _DISARM| 195398608| _DISARM| 3544999001| _DISARM| | _DISARM| | _DISARM| | _DISARM| |

_DISARM| | _DISARM| | _ENDDISARM

NOTE: PROCEDURE MIXED used (Total process time):

      real time           55.48 seconds

      cpu time            55.43 seconds

     

2                                                          The SAS System                                10:41 Friday, March 3, 2023

 

 

32        

33         GOPTIONS NOACCESSIBLE;

34         %LET _CLIENTTASKLABEL=;

35         %LET _CLIENTPROCESSFLOWNAME=;

36         %LET _CLIENTPROJECTPATH=;

37         %LET _CLIENTPROJECTPATHHOST=;

38         %LET _CLIENTPROJECTNAME=;

39         %LET _SASPROGRAMFILE=;

40         %LET _SASPROGRAMFILEHOST=;

41        

42         ;*';*";*/;quit;run;

43         ODS _ALL_ CLOSE;

44        

45        

46         QUIT; RUN;

47        

 

 

The results are:

Dimensions

Covariance Parameters

4

Columns in X

15

Columns in Z per Subject

2

Subjects

3434

Max Obs per Subject

27

 

Number of Observations

Number of Observations Read

22721

Number of Observations Used

22518

Number of Observations Not Used

203

 

Iteration History

Iteration

Evaluations

-2 Res Log Like

Criterion

0

1

178669.59663948

 

1

3

161030.12170074

0.00065048

2

1

160985.21896392

0.00005265

3

1

160981.89111021

0.00000044

4

1

160981.86465805

0.00000000

 

Convergence criteria met.

 

Covariance Parameter Estimates

Cov Parm

Subject

Estimate

UN(1,1)

pseudo_id

137.38

UN(2,1)

pseudo_id

-0.07511

UN(2,2)

pseudo_id

0.000190

Residual

 

42.5146

 

Fit Statistics

-2 Res Log Likelihood

160981.9

AIC (Smaller is Better)

160989.9

AICC (Smaller is Better)

160989.9

BIC (Smaller is Better)

161014.4

 

Null Model Likelihood Ratio Test

DF

Chi-Square

Pr > ChiSq

3

17687.73

<.0001

 

Solution for Fixed Effects

Effect

køn_20

Estimate

Standard
Error

DF

t Value

Pr > |t|

Intercept

 

61.0501

0.3376

3432

180.82

<.0001

spl1

 

-0.01390

0.001344

16E3

-10.34

<.0001

spl2

 

0.000203

0.000021

16E3

9.66

<.0001

spl3

 

-0.00050

0.000054

16E3

-9.33

<.0001

spl4

 

0.000442

0.000050

16E3

8.78

<.0001

Sex

Female

0.2017

0.5318

16E3

0.38

0.7045

Sex

Male

0

.

.

.

.

spl1*Sex

Female

0.01093

0.002127

16E3

5.14

<.0001

spl1*Sex

Male

0

.

.

.

.

spl2*Sex

Female

-0.00019

0.000033

16E3

-5.56

<.0001

spl2*Sex

Male

0

.

.

.

.

spl3*Sex

Female

0.000450

0.000085

16E3

5.26

<.0001

spl3*Sex

Male

0

.

.

.

.

spl4*Sex

Female

-0.00039

0.000080

16E3

-4.87

<.0001

spl4*Sex

Male

0

.

.

.

.

 

Type 3 Tests of Fixed Effects

Effect

Num DF

Den DF

F Value

Pr > F

spl1

1

16E3

62.99

<.0001

spl2

1

16E3

93.40

<.0001

spl3

1

16E3

87.02

<.0001

spl4

1

16E3

77.01

<.0001

Sex

1

16E3

0.14

0.7045

spl1*Sex

1

16E3

26.40

<.0001

spl2*Sex

1

16E3

30.88

<.0001

spl3*Sex

1

16E3

27.70

<.0001

spl4*Sex

1

16E3

23.68

<.0001

 

Least Squares Means

Effect

Sex

spl1

spl2

spl3

spl4

prvtdato_num

Estimate

Standard
Error

DF

t Value

Pr > |t|

Sex

Female

407.07

105217

52728

19728

407.07

60.1981

0.2895

16E3

207.93

<.0001

Sex

Male

407.07

105217

52728

19728

407.07

Non-est

.

.

.

.

 

Differences of Least Squares Means

Effect

Sex

_Sex

spl1

spl2

spl3

spl4

prvtdato_num

Estimate

Standard
Error

DF

t Value

Pr > |t|

Sex

Female

Male

407.07

105217

52728

19728

407.07

Non-est

.

.

.

.

 

As you can see there is no estimate for males, and thus the differences between the sexes cannot be estimated.

Lsmeans produces an output for males when I only include the interaction: Sex*spl1 (but not for the other interactions).

 

Can anyone help me solve this issue?

 

Thank you

10 REPLIES 10
StatsMan
SAS Super FREQ

Finding the cause of the non-estimable lsmeans will be difficult without the data. You can add the /E option to the LSMEANS statement to see the linear combination used to estimate the problematic lsmean.

You can let GLIMMIX do the dirty work of forming the spline basis using the EFFECT statement as in this example

Fitting random coefficients on top of the spline fit might be more than you can do with this data. If you do want those random coefficients for each subject, do you want those on the spline effect rather than a linear effect of days? That might be more than the data can handle, too. You could try fitting just the random intercept and see if that does better for you. 

mgrasmussen
Quartz | Level 8

Dear Statsman 

 

I much appreciate your reply.


1. "You can let GLIMMIX do the dirty work of forming the spline basis using the EFFECT statement as in this example"

 

I am not sure exactly what you mean by this. The EFFECT statement is not supported by Proc Mixed. Moreover, would it make a difference if it was? On another note I am actually using the EFFECT statement in proc glmselect to create the spline basis:

 

proc glmselect data=dataset_withoutsplines outdesign(addinputvars fullmodel)=dataset_withsplines; 
effect spl = spline(hba1c / naturalcubic basis=tpf(noint) knotmethod=percentiles(5)); model hba1c = spl / selection=none; 
quit;

 

2. a) "Fitting random coefficients on top of the spline fit might be more than you can do with this data." and

b) "You could try fitting just the random intercept and see if that does better for you."

Right, I understand that the model might be too complex given the data. Regarding 2b), even when I completely remove the random statement I get the same outcome of "Non-est". This suggest to be that there is an issue with the interactions of sex and spl2-spl4. As I mentioned in my original post I don't get "Non-est" when I only include the interaction spl1*sex. However, I do not know if this is an appropriate model.

 

3. "If you do want those random coefficients for each subject, do you want those on the spline effect rather than a linear effect of days?"

Are you suggesting that I could include all the spline variables here?

random intercept SPLINE VARIABLES / subject = pseudo_id type=un;

 

I have not considered this. Could one argue for and against doing such?

 

Best regards

PaigeMiller
Diamond | Level 26

Often, a Least Squares Mean is non-estimable because you have cells in your experiment where there is no data. For example, if it happens that the interaction spl2*sex has no data in one of the cells, then you would get the Non-Est. And as said by @StatsMan , the model you have chosen to fit "might be more than you can do with this data."

--
Paige Miller
mgrasmussen
Quartz | Level 8

Dear PaigeMiller

 

Thank you very much for your reply.

 

"Often, a Least Squares Mean is non-estimable because you have cells in your experiment where there is no data."

Missing data in general is not an issue in this dataset, but their might be a specific issue relating to the interaction terms. Is there some simple way to check for this? As far as I see it, there are four spline variables and the first spline variable corresponds to the variable Days - from which the spline variables are derived - in a 1:1 fashion. As mentioned, if I create a model that includes only the interaction spl1*sex (and thus I do not include spl2*sex, spl3*sex and spl4*sex) I don't get "Non-est". But only if I remove all three. Removing just one of them does not change the outcome.

 

"And as said by @StatsMan , the model you have chosen to fit "might be more than you can do with this data."

I realize that there might be an issue here. When I completely remove the random statement I still get the "Non-est" outcome, which suggest to me that there is an issue with the abovementioned interaction terms.

 

Thank you

 

Best regards

PaigeMiller
Diamond | Level 26

@mgrasmussen wrote:

Dear PaigeMiller

 

Thank you very much for your reply.

 

"Often, a Least Squares Mean is non-estimable because you have cells in your experiment where there is no data."

Missing data in general is not an issue in this dataset, but their might be a specific issue relating to the interaction terms. Is there some simple way to check for this? As far as I see it, there are four spline variables and the first spline variable corresponds to the variable Days - from which the spline variables are derived - in a 1:1 fashion. As mentioned, if I create a model that includes only the interaction spl1*sex (and thus I do not include spl2*sex, spl3*sex and spl4*sex) I don't get "Non-est". But only if I remove all three. Removing just one of them does not change the outcome.


Sure sounds like an empty cell in your data — which is not the same as "missing" data (although it can be caused by missing data as well). You can check this via running PROC FREQ to investigate all possible interactions.

 

proc freq data=whatever;
     tables sex*(spl1 spl2 spl3 spl4);
run;

 

Even this may not find the missing cell, it may be that you have to look at higher order interactions by using PROC FREQ to find the cause.

--
Paige Miller
mgrasmussen
Quartz | Level 8

Right, it is an issue about whether or not both males and females are represented at each level of spl1-spl4?

 

Thanks for the suggested syntax. I ran it and there are many places with empty cells/cell with 0 observations and this is true for all four spline variables. This is also the case for spl1 and Sex. As mentioned when I include the interaction between the two in the model I don't get the "Non-est". This particular model appears to work.

jiltao
SAS Super FREQ

As pointed out by @PaigeMiller , missing data cell is the most common cause of a non-estimable lsmeans. You either need to take out the interaction term(s) with missing data cell, or maybe combine your data categories to get rid of missing data cells.

Thanks,

Jill

mgrasmussen
Quartz | Level 8

Dear Jill

 

Thank you for the feedback.

 

I will bring all of the feedback I have gotten, including yours most recently, back to my group and we will discuss the analyses.

 

Best regards

 

Martin

SteveDenham
Jade | Level 19

In order to understand what you are attempting, I need to know why you are fitting five different spline terms to the same data and then placing all of them in the same analysis (spl spl1 spl2 spl3 spl4). Do the different splines reflect different hypotheses about the time by hba1c relationship? Is there any reason to believe that time as interpreted by the spline function has a fixed effect on the outcome, and you want to see if it differs by sex? As an example for that, look at this in the PROC GLIMMIX documentation: https://documentation.sas.com/doc/en/pgmsascdc/v_036/statug/statug_glimmix_examples09.htm

Or is it possible that there is a single factor response in time, with the variance being what needs a spline fit.  As an example look at this in the PROC GLIMMIX documentation:   https://documentation.sas.com/doc/en/pgmsascdc/v_036/statug/statug_glimmix_examples20.htm 

 

SteveDenham

mgrasmussen
Quartz | Level 8

Dear Steve

 

Thank you very much for your reply. 

 

To be frank I do not have a straight answer to your question. I am a non-statistician who tried to build a model (only with brief statistical consult), but I realize now that I might not fully understand the working of the splines.

 

In simple terms what I want to do is to visually illustrate and statistically test for the differences in slopes in hba1c for men and women. More specifically, I want to investigate whether there is a difference in mean hba1c over time (time on the x-axis) between men and women. However, I also want to be able to test means differences at specific time points.

 

My understanding was that the four spline variables all-in-all represented time and thus that I had to model an interaction term for each splines variable. I sort of "transferred" the logic I would apply to a normal OLS regression to this new model. In OLS - here assumping a linear relationship between days/time and hba1c - I would write the following to test for interaction of sex in the relationship between time and hba1c:

 

proc reg data=datasetname;

class sex;

model hba1c=sex days sex*days;

run;

 

For the time being disregard the fact that I am actually using a mixed effect model. The point here is merely an attempt to try to explain the background for how I am applying the splines in the analysis. However incorrectly. 

 

I realize that I may not be using the correct statistical jargon. 

 

I would be happy to receive any feedback and suggestion on moving forward.

 

I mentioned earlier, when I only include the first spline variable I don't experience an issue with computing LSmeans. But I am not sure about the significance of this.

 

Thank you 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2167 views
  • 8 likes
  • 5 in conversation