Re: Potential issue with lsmeans in proc mixed (output: Non-est)

mgrasmussen · Posted 03-03-2023 07:07 AM

Dear SAS experts

I am a non-statistician who is seeking some help with getting basic syntax running.

I am trying to run a model using proc mixed, but I appear to be having an issue when using lsmeans.

First, I have created restricted cubic basis functions (4 new variables; spl1-spl4) based on the variable days. Days is a continuous variable that includes days since 1^st of January 2020 to 31^st of December 2021.

For each value of days, there is a measure of Hba1c. Hba1c is a clinical measure of blood sugar levels during the last three months. It varies in values from 7-173.

To begin with (just to get the basic syntax working), I want to model the splines of days and the variable sex and estimate the general between-sex difference in mean hba1c. I am using a mixed-effect model to take into account that there are multiple measurements of hba1c for each individual (the amount varies between subjects).

I have written the following syntax in proc mixed:

proc mixed data=dataset_name;

class pseudo_id sex;

model hba1c=spl: sex sex*spl1 sex*spl2 sex*spl3 sex*spl4 / solution;

random intercept days / subject = pseudo_id type=un;

lsmeans sex / pdiff;

run;

The log output is:

1 ;*';*";*/;quit;run;

2 OPTIONS PAGENO=MIN;

3 %LET _CLIENTTASKLABEL='7. Hba1c_analysis';

4 %LET _CLIENTPROCESSFLOWNAME='Process Flow';

5 %LET _CLIENTPROJECTPATH='';

6 %LET _CLIENTPROJECTPATHHOST='';

7 %LET _CLIENTPROJECTNAME='';

8 %LET _SASPROGRAMFILE='T:\ouh\afd\covid_forskningsartikel\Programfiler og output\7. Hba1c_analysis.sas';

9 %LET _SASPROGRAMFILEHOST='L111353';

10

11 ODS _ALL_ CLOSE;

12 OPTIONS DEV=PNG;

13 GOPTIONS XPIXELS=0 YPIXELS=0;

14 FILENAME EGSR TEMP;

15 ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR

16 STYLE=HtmlBlue

17 STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/HtmlBlue.css")

18 NOGTITLE

19 NOGFOOTNOTE

20 GPATH=&sasworklocation

SYMBOLGEN: Macro variable SASWORKLOCATION resolves to "E:\Work\_TD2300848_SRVESBAPPSAS34V_\Prc2/"

21 ENCODING=UTF8

22 options(rolap="on")

23 ;

NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR

24

25 GOPTIONS ACCESSIBLE;

26 proc mixed data=hba1c_sdco6;

27 class pseudo_id køn_20;

28 model visres_numerisk=spl: køn_20 køn_20*spl1 køn_20*spl2 køn_20*spl3 køn_20*spl4 / solution outp=pred6r outpm = pred6f;

29 random intercept prvtdato_num / subject = pseudo_id type=un;

30 lsmeans køn_20 / pdiff at means;

31 run;

NOTE: 203 observations are not included because of missing values.

NOTE: Convergence criteria met.

NOTE: The data set WORK.PRED6R has 22721 observations and 26 variables.

NOTE: Compressing data set WORK.PRED6R decreased size by 27.40 percent.

Compressed is 53 pages; un-compressed would require 73 pages.

NOTE: The data set WORK.PRED6F has 22721 observations and 26 variables.

NOTE: Compressing data set WORK.PRED6F decreased size by 27.40 percent.

Compressed is 53 pages; un-compressed would require 73 pages.

_DISARM| | _DISARM| | _ENDDISARM

NOTE: PROCEDURE MIXED used (Total process time):

real time 55.48 seconds

cpu time 55.43 seconds

2 The SAS System 10:41 Friday, March 3, 2023

32

33 GOPTIONS NOACCESSIBLE;

34 %LET _CLIENTTASKLABEL=;

35 %LET _CLIENTPROCESSFLOWNAME=;

36 %LET _CLIENTPROJECTPATH=;

37 %LET _CLIENTPROJECTPATHHOST=;

38 %LET _CLIENTPROJECTNAME=;

39 %LET _SASPROGRAMFILE=;

40 %LET _SASPROGRAMFILEHOST=;

41

42 ;*';*";*/;quit;run;

43 ODS _ALL_ CLOSE;

44

45

46 QUIT; RUN;

47

The results are:

Dimensions
Covariance Parameters	4
Columns in X	15
Columns in Z per Subject	2
Subjects	3434
Max Obs per Subject	27

Number of Observations
Number of Observations Read	22721
Number of Observations Used	22518
Number of Observations Not Used	203

Iteration History
Iteration	Evaluations	-2 Res Log Like	Criterion
0	1	178669.59663948
1	3	161030.12170074	0.00065048
2	1	160985.21896392	0.00005265
3	1	160981.89111021	0.00000044
4	1	160981.86465805	0.00000000

Convergence criteria met.

Covariance Parameter Estimates
Cov Parm	Subject	Estimate
UN(1,1)	pseudo_id	137.38
UN(2,1)	pseudo_id	-0.07511
UN(2,2)	pseudo_id	0.000190
Residual		42.5146

Fit Statistics
-2 Res Log Likelihood	160981.9
AIC (Smaller is Better)	160989.9
AICC (Smaller is Better)	160989.9
BIC (Smaller is Better)	161014.4

Null Model Likelihood Ratio Test
DF	Chi-Square	Pr > ChiSq
3	17687.73	<.0001

Solution for Fixed Effects
Effect	køn_20	Estimate	Standard Error	DF	t Value	Pr > \|t\|
Intercept		61.0501	0.3376	3432	180.82	<.0001
spl1		-0.01390	0.001344	16E3	-10.34	<.0001
spl2		0.000203	0.000021	16E3	9.66	<.0001
spl3		-0.00050	0.000054	16E3	-9.33	<.0001
spl4		0.000442	0.000050	16E3	8.78	<.0001
Sex	Female	0.2017	0.5318	16E3	0.38	0.7045
Sex	Male	0	.	.	.	.
spl1*Sex	Female	0.01093	0.002127	16E3	5.14	<.0001
spl1*Sex	Male	0	.	.	.	.
spl2*Sex	Female	-0.00019	0.000033	16E3	-5.56	<.0001
spl2*Sex	Male	0	.	.	.	.
spl3*Sex	Female	0.000450	0.000085	16E3	5.26	<.0001
spl3*Sex	Male	0	.	.	.	.
spl4*Sex	Female	-0.00039	0.000080	16E3	-4.87	<.0001
spl4*Sex	Male	0	.	.	.	.

Type 3 Tests of Fixed Effects
Effect	Num DF	Den DF	F Value	Pr > F
spl1	1	16E3	62.99	<.0001
spl2	1	16E3	93.40	<.0001
spl3	1	16E3	87.02	<.0001
spl4	1	16E3	77.01	<.0001
Sex	1	16E3	0.14	0.7045
spl1*Sex	1	16E3	26.40	<.0001
spl2*Sex	1	16E3	30.88	<.0001
spl3*Sex	1	16E3	27.70	<.0001
spl4*Sex	1	16E3	23.68	<.0001

Least Squares Means
Effect	Sex	spl1	spl2	spl3	spl4	prvtdato_num	Estimate	Standard Error	DF	t Value	Pr > \|t\|
Sex	Female	407.07	105217	52728	19728	407.07	60.1981	0.2895	16E3	207.93	<.0001
Sex	Male	407.07	105217	52728	19728	407.07	Non-est	.	.	.	.

Differences of Least Squares Means
Effect	Sex	_Sex	spl1	spl2	spl3	spl4	prvtdato_num	Estimate	Standard Error	DF	t Value	Pr > \|t\|
Sex	Female	Male	407.07	105217	52728	19728	407.07	Non-est	.	.	.	.

As you can see there is no estimate for males, and thus the differences between the sexes cannot be estimated.

Lsmeans produces an output for males when I only include the interaction: Sex*spl1 (but not for the other interactions).

Can anyone help me solve this issue?

Thank you

StatsMan · Posted 03-03-2023 08:37 AM

Finding the cause of the non-estimable lsmeans will be difficult without the data. You can add the /E option to the LSMEANS statement to see the linear combination used to estimate the problematic lsmean.

You can let GLIMMIX do the dirty work of forming the spline basis using the EFFECT statement as in this example

Fitting random coefficients on top of the spline fit might be more than you can do with this data. If you do want those random coefficients for each subject, do you want those on the spline effect rather than a linear effect of days? That might be more than the data can handle, too. You could try fitting just the random intercept and see if that does better for you.

mgrasmussen · Posted 03-07-2023 03:27 AM

Dear Statsman

I much appreciate your reply.

1. "You can let GLIMMIX do the dirty work of forming the spline basis using the EFFECT statement as in this example"

I am not sure exactly what you mean by this. The EFFECT statement is not supported by Proc Mixed. Moreover, would it make a difference if it was? On another note I am actually using the EFFECT statement in proc glmselect to create the spline basis:

proc glmselect data=dataset_withoutsplines outdesign(addinputvars fullmodel)=dataset_withsplines;
effect spl = spline(hba1c / naturalcubic basis=tpf(noint) knotmethod=percentiles(5)); model hba1c = spl / selection=none;
quit;

2. a) "Fitting random coefficients on top of the spline fit might be more than you can do with this data." and

b) "You could try fitting just the random intercept and see if that does better for you."

Right, I understand that the model might be too complex given the data. Regarding 2b), even when I completely remove the random statement I get the same outcome of "Non-est". This suggest to be that there is an issue with the interactions of sex and spl2-spl4. As I mentioned in my original post I don't get "Non-est" when I only include the interaction spl1*sex. However, I do not know if this is an appropriate model.

3. "If you do want those random coefficients for each subject, do you want those on the spline effect rather than a linear effect of days?"

Are you suggesting that I could include all the spline variables here?

random intercept SPLINE VARIABLES / subject = pseudo_id type=un;

I have not considered this. Could one argue for and against doing such?

Best regards

PaigeMiller · Posted 03-03-2023 09:02 AM

Often, a Least Squares Mean is non-estimable because you have cells in your experiment where there is no data. For example, if it happens that the interaction spl2*sex has no data in one of the cells, then you would get the Non-Est. And as said by @StatsMan , the model you have chosen to fit "might be more than you can do with this data."

--
Paige Miller

mgrasmussen · Posted 03-07-2023 03:37 AM

Dear PaigeMiller

Thank you very much for your reply.

"Often, a Least Squares Mean is non-estimable because you have cells in your experiment where there is no data."

Missing data in general is not an issue in this dataset, but their might be a specific issue relating to the interaction terms. Is there some simple way to check for this? As far as I see it, there are four spline variables and the first spline variable corresponds to the variable Days - from which the spline variables are derived - in a 1:1 fashion. As mentioned, if I create a model that includes only the interaction spl1*sex (and thus I do not include spl2*sex, spl3*sex and spl4*sex) I don't get "Non-est". But only if I remove all three. Removing just one of them does not change the outcome.

"And as said by @StatsMan , the model you have chosen to fit "might be more than you can do with this data."

I realize that there might be an issue here. When I completely remove the random statement I still get the "Non-est" outcome, which suggest to me that there is an issue with the abovementioned interaction terms.

Thank you

Best regards

PaigeMiller · Posted 03-07-2023 04:48 AM

@mgrasmussen wrote:

Dear PaigeMiller

Thank you very much for your reply.

"Often, a Least Squares Mean is non-estimable because you have cells in your experiment where there is no data."

Missing data in general is not an issue in this dataset, but their might be a specific issue relating to the interaction terms. Is there some simple way to check for this? As far as I see it, there are four spline variables and the first spline variable corresponds to the variable Days - from which the spline variables are derived - in a 1:1 fashion. As mentioned, if I create a model that includes only the interaction spl1*sex (and thus I do not include spl2*sex, spl3*sex and spl4*sex) I don't get "Non-est". But only if I remove all three. Removing just one of them does not change the outcome.

Sure sounds like an empty cell in your data — which is not the same as "missing" data (although it can be caused by missing data as well). You can check this via running PROC FREQ to investigate all possible interactions.

proc freq data=whatever;
     tables sex*(spl1 spl2 spl3 spl4);
run;

Even this may not find the missing cell, it may be that you have to look at higher order interactions by using PROC FREQ to find the cause.

--
Paige Miller

mgrasmussen · Posted 03-07-2023 05:30 AM

Right, it is an issue about whether or not both males and females are represented at each level of spl1-spl4?

Thanks for the suggested syntax. I ran it and there are many places with empty cells/cell with 0 observations and this is true for all four spline variables. This is also the case for spl1 and Sex. As mentioned when I include the interaction between the two in the model I don't get the "Non-est". This particular model appears to work.

jiltao · Posted 03-07-2023 09:15 AM

As pointed out by @PaigeMiller , missing data cell is the most common cause of a non-estimable lsmeans. You either need to take out the interaction term(s) with missing data cell, or maybe combine your data categories to get rid of missing data cells.

Thanks,

Jill

mgrasmussen · Posted 03-09-2023 02:38 AM

Dear Jill

Thank you for the feedback.

I will bring all of the feedback I have gotten, including yours most recently, back to my group and we will discuss the analyses.

Best regards

Martin

SteveDenham · Posted 03-09-2023 02:38 PM

In order to understand what you are attempting, I need to know why you are fitting five different spline terms to the same data and then placing all of them in the same analysis (spl spl1 spl2 spl3 spl4). Do the different splines reflect different hypotheses about the time by hba1c relationship? Is there any reason to believe that time as interpreted by the spline function has a fixed effect on the outcome, and you want to see if it differs by sex? As an example for that, look at this in the PROC GLIMMIX documentation: https://documentation.sas.com/doc/en/pgmsascdc/v_036/statug/statug_glimmix_examples09.htm

Or is it possible that there is a single factor response in time, with the variance being what needs a spline fit. As an example look at this in the PROC GLIMMIX documentation: https://documentation.sas.com/doc/en/pgmsascdc/v_036/statug/statug_glimmix_examples20.htm

SteveDenham

mgrasmussen · Posted 03-13-2023 07:10 AM

Dear Steve

Thank you very much for your reply.

To be frank I do not have a straight answer to your question. I am a non-statistician who tried to build a model (only with brief statistical consult), but I realize now that I might not fully understand the working of the splines.

In simple terms what I want to do is to visually illustrate and statistically test for the differences in slopes in hba1c for men and women. More specifically, I want to investigate whether there is a difference in mean hba1c over time (time on the x-axis) between men and women. However, I also want to be able to test means differences at specific time points.

My understanding was that the four spline variables all-in-all represented time and thus that I had to model an interaction term for each splines variable. I sort of "transferred" the logic I would apply to a normal OLS regression to this new model. In OLS - here assumping a linear relationship between days/time and hba1c - I would write the following to test for interaction of sex in the relationship between time and hba1c:

proc reg data=datasetname;

class sex;

model hba1c=sex days sex*days;

run;

For the time being disregard the fact that I am actually using a mixed effect model. The point here is merely an attempt to try to explain the background for how I am applying the splines in the analysis. However incorrectly.

I realize that I may not be using the correct statistical jargon.

I would be happy to receive any feedback and suggestion on moving forward.

I mentioned earlier, when I only include the first spline variable I don't experience an issue with computing LSmeans. But I am not sure about the significance of this.

Thank you

SAS Innovate 2025: Call for Content