Hello there,
I am trying to fit a multilevel random effect model on my data.
The model looks like this:
proc glimmix data=dataname initglm /*abspconv=1E-4*/ method=quad;
model smoking(event="1") = dum60 dum70 dum80 age age*dum60 age*dum70 age*dum80 / dist=binary link=logit cl covb s;
random intercept age / subject=id;
by sex;
run; quit;
Smoking is a dichotomous outcome variable (yes = 1, no = 0), age as a continuous variable is one of the predictors and there is an interaction of age with generation. I made dummies (dum50 (reference category), dum60, dum70, dum80) for each generation. Generations are defined by their age at baseline. So, we have a generation which is age 60-69 at baseline (dum60), etc.
I am using a long dataset and we have 4 observations for the subjects (cohort study). Hence we are using proc glimmix, to correct for repeated measurements.
I have a few problems with this model.
- Note the statement "by sex". I want to analyse men and women separately. When I run the model, everything works perfectly for men. For women, the model "doesn't work". I get an error: ERROR: Infeasible parameter values for evaluation of objective function with 1 quadrature point. I tried to google this, but I couldn't find it anywhere. I talked to a statistician already, but he couldn't really help me, except that he told me to change the converging criteria. I did a few things:
1) nloptions gconv = 1E-3 fconv = 1E-3.
2) abspconv=1E-4.
3) change method from=quad to method=laplace.
All didn't work.
- Another, totally different problem, is that when the model works with age and intercept as random effects, the prevalence estimates do not correspond with the prevalence I observe when making frequency tables.
For example, I ran some estimates together with the model:
estimate 'gen 60-69 at age = 70' intercept 1 gendum60 1 leeftijd 70 leeftijd*gendum60 70/ilink;
It gave me a prevalence at age 70 of 1% or something, using the ilink feature, which calcs the outcome back to prevalences (right?). When I run frequency tables for this generation, I see that at age 70, they have a prevalence of ~20%. And it's not only for this specific generation at this age, it's for every estimate I do. The model doesn't represent my data well, at all. Somethimes it even gives prevalences of 1E-7, which is of course very weird. Again, I talked to the statistician and we tried to run the model w/o random intercept and age. The prevalences as estimated by the model were very accurate as compared to the real data! But my question is: what happens when you remove the random intercept and age? I understand where you correct for, when using them. But when removing them, am I still correcting for repeated measurements for every subject? One thing that I noted is that without random effects, 'Subjects in Blocks' is 1, instead of the ~1000 I usually have for men.
Lots of stuff, I hope you guys can help me
See my reply in the SAS Procedures forum. After thinking another 10 minutes, and seeing what is going on when you remove the random effect of age, I am more convinced than ever that you need to include the random effect of age in your estimate statement.
Try:
estimate 'gen 60-69 at age = 70' intercept 1 gendum60 1 leeftijd 70 leeftijd*gendum60 70 | leeftijd 70 /ilink; /* Added leeftijd as a random effect */
This is untested, and I will be curious if it gives you what you might be looking for.
Steve Denham
Smoking full model
The GLIMMIX Procedure
sex respondent=1
Data Set | ** |
---|---|
Response Variable | smoker |
Response Distribution | Binary |
Link Function | Logit |
Variance Function | Default |
Variance Matrix Blocked By | id |
Estimation Technique | Maximum Likelihood |
Likelihood Approximation | Gauss-Hermite Quadrature |
Degrees of Freedom Method | Containment |
Number of Observations Read | 5790 |
---|---|
Number of Observations Used | 3473 |
Ordered Value | smoker | Total Frequency |
1 | 0 | 2651 |
---|---|---|
2 | 1 | 822 |
The GLIMMIX procedure is modeling
the probability that roker='1'.
G-side Cov. Parameters | 2 |
---|---|
Columns in X | 8 |
Columns in Z per Subject | 2 |
Subjects (Blocks in V) | 931 |
Max Obs per Subject | 6 |
Optimization Technique | Dual Quasi-Newton |
---|---|
Parameters in Optimization | 10 |
Lower Boundaries | 2 |
Upper Boundaries | 0 |
Fixed Effects | Not Profiled |
Starting From | GLM estimates |
Quadrature Points | 7 |
0 | 0 | 4 | 2664.2076557 | . | 247518.4 |
---|---|---|---|---|---|
1 | 0 | 15 | 2531.8794796 | 132.32817606 | 10382.23 |
2 | 0 | 3 | 2413.6629589 | 118.21652075 | 12986.84 |
3 | 0 | 4 | 2411.987591 | 1.67536786 | 12781.79 |
4 | 0 | 4 | 2411.7530459 | 0.23454511 | 12714.24 |
5 | 0 | 3 | 2410.6125111 | 1.14053478 | 12417.07 |
6 | 0 | 4 | 2389.3924796 | 21.22003154 | 2125.985 |
7 | 0 | 2 | 2370.5918749 | 18.80060463 | 8539.3 |
8 | 0 | 2 | 2340.5986904 | 29.99318457 | 10467.24 |
9 | 0 | 3 | 2333.1846772 | 7.41401318 | 20957.11 |
10 | 0 | 4 | 2309.6450045 | 23.53967266 | 3675.545 |
11 | 0 | 3 | 2305.5393632 | 4.10564133 | 3792.51 |
12 | 0 | 3 | 2303.766906 | 1.77245715 | 1238.087 |
13 | 0 | 3 | 2303.3039486 | 0.46295749 | 929.9581 |
14 | 0 | 3 | 2303.0939216 | 0.21002696 | 224.6986 |
15 | 0 | 4 | 2299.8431727 | 3.25074887 | 2301.9 |
16 | 0 | 2 | 2297.2327543 | 2.61041844 | 1237.504 |
17 | 0 | 2 | 2293.4149064 | 3.81784791 | 535.4416 |
18 | 0 | 2 | 2288.5668127 | 4.84809373 | 2638.173 |
19 | 0 | 4 | 2287.8921513 | 0.67466138 | 1248.713 |
20 | 0 | 3 | 2287.5832383 | 0.30891300 | 125.6845 |
21 | 0 | 3 | 2287.5658631 | 0.01737515 | 127.9899 |
22 | 0 | 2 | 2287.5444346 | 0.02142856 | 101.2652 |
23 | 0 | 6 | 2287.0016694 | 0.54276516 | 394.4775 |
24 | 0 | 3 | 2286.6363686 | 0.36530084 | 150.4604 |
25 | 0 | 3 | 2286.4653755 | 0.17099304 | 193.7834 |
26 | 0 | 3 | 2286.3540602 | 0.11131532 | 36.6376 |
27 | 0 | 3 | 2286.3430099 | 0.01105026 | 35.43862 |
28 | 0 | 3 | 2286.3427712 | 0.00023876 | 9.727946 |
29 | 0 | 3 | 2286.3427548 | 0.00001641 | 3.908154 |
Convergence criterion (GCONV=1E-8) satisfied. |
-2 Log Likelihood | 2286.34 |
---|---|
AIC (smaller is better) | 2306.34 |
AICC (smaller is better) | 2306.41 |
BIC (smaller is better) | 2354.71 |
CAIC (smaller is better) | 2364.71 |
HQIC (smaller is better) | 2324.79 |
-2 log L(smoker| r. effects) | 690.83 |
---|---|
Pearson Chi-Square | 690.37 |
Pearson Chi-Square / DF | 0.20 |
Covariance Parameter Estimates | |||
Cov Parm |
Subject | Estimate | Standard Error |
Intercept | respnr | 24.9367 | 4.9515 |
---|---|---|---|
age | respnr | 0.002104 | . |
Solutions for Fixed Effects | ||||||||
Effect | Estimate | Standard Error | DF | t Value | Pr > |t| | Alpha | Lower | Upper |
Intercept | 10.7707 | 2.1945 | 927 | 4.91 | <.0001 | 0.05 | 6.4641 | 15.0774 |
---|---|---|---|---|---|---|---|---|
dum60 | 1.6981 | 3.0139 | 1610 | 0.56 | 0.5732 | 0.05 | -4.2135 | 7.6096 |
dum70 | 7.3854 | 4.1763 | 1610 | 1.77 | 0.0772 | 0.05 | -0.8062 | 15.5770 |
dum80 | 3.7303 | 9.8024 | 1610 | 0.38 | 0.7036 | 0.05 | -15.4965 | 22.9571 |
age | -0.2081 | 0.03392 | 928 | -6.13 | <.0001 | 0.05 | -0.2747 | -0.1415 |
dum60*age | -0.03188 | 0.04490 | 1610 | -0.71 | 0.4778 | 0.05 | -0.1200 | 0.05619 |
dum70*age | -0.07526 | 0.05662 | 1610 | -1.33 | 0.1840 | 0.05 | -0.1863 | 0.03580 |
dum80*age | -0.02561 | 0.1173 | 1610 | -0.22 | 0.8272 | 0.05 | -0.2557 | 0.2045 |
Type III Tests of Fixed Effects | ||||
Effect | Num DF | Den DF | F Value | Pr > F |
dum60 | 1 | 1610 | 0.32 | 0.5732 |
---|---|---|---|---|
dum70 | 1 | 1610 | 3.13 | 0.0772 |
dum80 | 1 | 1610 | 0.14 | 0.7036 |
age | 1 | 928 | 37.62 | <.0001 |
dum60*age | 1 | 1610 | 0.50 | 0.4778 |
dum70*age | 1 | 1610 | 1.77 | 0.1840 |
dum80*age | 1 | 1610 | 0.05 | 0.8272 |
Covariance matrix for fixed effects | |||||||||
Intercept | 1 | 4.8156 | -4.6886 | -4.6077 | -4.6707 | -0.07191 | 0.06941 | 0.06868 | 0.06894 |
---|---|---|---|---|---|---|---|---|---|
dum60 | 2 | -4.6886 | 9.0835 | 4.8457 | 4.8053 | 0.06926 | -0.1317 | -0.07167 | -0.07161 |
dum70 | 3 | -4.6077 | 4.8457 | 17.4418 | 4.8857 | 0.06754 | -0.07216 | -0.2319 | -0.07319 |
dum80 | 4 | -4.6707 | 4.8053 | 4.8857 | 96.0872 | 0.06902 | -0.07145 | -0.07225 | -1.1404 |
age | 5 | -0.07191 | 0.06926 | 0.06754 | 0.06902 | 0.001151 | -0.00110 | -0.00108 | -0.00109 |
dum60*age | 6 | 0.06941 | -0.1317 | -0.07216 | -0.07145 | -0.00110 | 0.002016 | 0.001139 | 0.001138 |
dum70*age | 7 | 0.06868 | -0.07167 | -0.2319 | -0.07225 | -0.00108 | 0.001139 | 0.003206 | 0.001154 |
dum80*age | 8 | 0.06894 | -0.07161 | -0.07319 | -1.1404 | -0.00109 | 0.001138 | 0.001154 | 0.01377 |
Estimates used
estimate 'gen 55-59 vs. gen 60-69 at age = 80' dum60 -1 age*dum60 -70 / cl;
estimate 'gen 55-59 at age = 70' intercept 1 age 70 / ilink;
estimate 'gen 60-69 at age = 70' intercept 1 dum60 1 age 70 age*dum60 70 / ilink;
Label | Estimate | Standard Error | DF | t Value | Pr > |t| | Alpha | Lower | Upper | Mean |
Standard Error Mean |
Lower Mean |
Upper Mean |
---|---|---|---|---|---|---|---|---|---|---|---|---|
gen 60-69 vs. gen 70-79 at age = 80 | -2.2174 | 0.7119 | 1610 | -3.11 | 0.0019 | 0.05 | -3.6138 | -0.8210 | Non-est | . | . | . |
gen 60-69 at age = 80 | -6.7284 | 0.5968 | 1610 | -11.27 | <.0001 | 0.05 | -7.8991 | -5.5577 | 0.001195 | 0.000712 | 0.000371 | 0.003843 |
gen 70-79 at age = 80 | -4.5110 | 0.5080 | 1610 | -8.88 | <.0001 | 0.05 | -5.5074 | -3.5147 | 0.01087 | 0.005461 | 0.004040 | 0.02890 |
This is the output of my analysis (as asked for in the other topic, but let's continue here).
I will try adding the random effect in the estimate statement. Do I need to add the random effect in all my estimates? So, also in the first estimate for the difference between two generations (gen 60-69 vs. gen 70-79 at age = 80)? If so, do I put it in the same way as for the other statement? (| leeftijd 70)?
Thnx a lot Steve, I will let you know what happens.
Adding the random effect of age in the estimate statement (did it for the second and third estimate, not the first) did cause the prevalence to rise, but not near as high as it 'should' be / as I want it to be / as the raw data looks like.
After seeing the estimates of the random effect, I don't think that will put things in the right order of magnitude. The logit of the response is log (822/3473) - log (2651/3473) = -1.171. When I plug in values, I get values that are off by the magnitude you are reporting.
What happens when you use the LSMEANS statement?
Try
lsmeans gendum60/at means ilink e;
lsmeans gendum60/at age=70 ilink e;
Now I don't expect this to really work, since I think you have correctly specified the estimate statement, but it might trigger something in my dinosaur brain on a Monday morning. If the e option doesn't give back what is going into the estimate statement, then we have what I am looking for.
There is something niggling at my thoughts--Why not make the 4 age cohorts class variables? I think we can get at the comparisons of interest through the LSMESTIMATE statement. It would also enable the use of the noint option on the model statement. Something like:
proc glimmix data=dataname initglm /*abspconv=1E-4*/ method=quad;
class cohort(ref=first);/* Assumes that this is on the dataset already, and sets the age50 cohort as the reference group */
model smoking(event="1") = cohort age age*cohort/ noint dist=binary link=logit cl covb s;
random intercept age / subject=id;
lsmeans cohort/at means ilink e;
lsmeans cohort/at age=70 ilink e;
lsmeans cohort/at age=80 ilink e;
by sex;
run; quit;
See if these give you values that look more appropriate. If they look about right, then we can work on comparisons of interest. Some of these will have values, but be meaningless--for instance, the 80-89 cohort at age=70. If the lsmeans on the ilink scale are still off by several orders of magnitude, then I think some examination of data, etc. is where we need to go.
Steve Denham
Message was edited by: Steve Denham
Hey Steve,
Thanks again.
I am far from an expert in sas, but I had looked around for myself a bit the other day and I already thought that it should be possible somehow to model what I want to model, and that lsmeans/lsmestimate looked like the good solution of first sight. I had no clue how to do it though, since I only know the statistics procedure that I have done in the past. Everything else is new to me, as is this.
I talked to my statistician yesterday and now we decided to go another way (a way I do not really like so much, especially if your suggestions turns out to be working), but for now I will go on with what I have already, since time is running out for my project. I expect to have more time near the end of this month / start of next month (as I will be reviewing my article with the co-authors, I will have plenty of time inbetween I think) and that is when I want to try your method. So, I will let this here for now, but I will get back to it sometime. I just think its impossible that there is no way in sas to do what I want to do. Therefore I am interested in experimenting with your suggestion somewhere in the near future.
Good luck. I hope that your method works out (just guessing that you will be calculating odds ratios using frequency tables).
Steve Denham
We will just present the difference in prevalences based on the raw data. In order to say it those differences are significant, we will look at the model.
So, if we see a difference of let's say 8% between generation 55-59 (reference) and 60-69at age 70, in the raw data, we will present that number. Then, we will look at this estimate:
estimate 'gen 55-59 vs. gen 60-69 at age = 70' dum60 -1 age*dum60 -70 / cl;
If this is significant ("Pr > |t|" < 0.05) then we will say: 8% difference is significant.
I don't like this method so much....
Me either, since the individual estimates are so poorly aligned with the raw data.
Picking battles that you can win is the key to military success. Also to statistical consulting.
Steve Denham
Here I am again. The results of my analyses were so poor that I decided to try this.
After seeing the estimates of the random effect, I don't think that will put things in the right order of magnitude. The logit of the response is log (822/3473) - log (2651/3473) = -1.171. When I plug in values, I get values that are off by the magnitude you are reporting.
I don't know exactly what you are calculating there and if it is important, but when I calculate this, I get -0.50853791?
What happens when you use the LSMEANS statement?
Try
lsmeans gendum60/at means ilink e;
lsmeans gendum60/at age=70 ilink e;
lsmeans gendum60/at means ilink e;
ERROR: Only class variables allowed in this effect.
Smoking Full |
The GLIMMIX Procedure
sex=1
Data Set | WORK.dataname |
---|---|
Response Variable | smoking |
Response Distribution | Binary |
Link Function | Logit |
Variance Function | Default |
Variance Matrix Blocked By | id |
Estimation Technique | Maximum Likelihood |
Likelihood Approximation | Gauss-Hermite Quadrature |
Degrees of Freedom Method | Containment |
ageclass | 4 | 1 2 3 4 |
---|
Number of Observations Read | 11964 |
---|---|
Number of Observations Used | 8182 |
1 | 0 | 5605 |
---|---|---|
2 | 1 | 2577 |
G-side Cov. Parameters | 2 |
---|---|
Columns in X | 9 |
Columns in Z per Subject | 2 |
Subjects (Blocks in V) | 2991 |
Max Obs per Subject | 3 |
Optimization Technique | Dual Quasi-Newton |
---|---|
Parameters in Optimization | 10 |
Lower Boundaries | 2 |
Upper Boundaries | 0 |
Fixed Effects | Not Profiled |
Starting From | GLM estimates |
Quadrature Points | 7 |
0 | 0 | 4 | 7924.0808061 | . | 296043.2 |
---|---|---|---|---|---|
1 | 0 | 14 | 7464.3488984 | 459.73190764 | 11431.85 |
2 | 0 | 3 | 7235.5407215 | 228.80817687 | 1259.722 |
3 | 0 | 4 | 7222.9621138 | 12.57860771 | 1491.717 |
4 | 0 | 4 | 7217.0681408 | 5.89397306 | 1635.135 |
5 | 0 | 4 | 7216.2814485 | 0.78669222 | 1605.412 |
6 | 0 | 4 | 7206.613118 | 9.66833050 | 3964.965 |
7 | 0 | 4 | 7142.9150265 | 63.69809154 | 10381.62 |
8 | 0 | 3 | 7138.4727633 | 4.44226324 | 3083.605 |
9 | 0 | 2 | 7131.8228528 | 6.64991052 | 4798.037 |
10 | 0 | 4 | 7111.9430664 | 19.87978634 | 2394.794 |
11 | 0 | 2 | 7101.0562478 | 10.88681862 | 4136.373 |
12 | 0 | 3 | 7096.4972303 | 4.55901754 | 2491.948 |
13 | 0 | 3 | 7094.4543614 | 2.04286881 | 918.8764 |
14 | 0 | 4 | 7086.9710653 | 7.48329611 | 4893.221 |
15 | 0 | 2 | 7082.7288234 | 4.24224195 | 1764.441 |
16 | 0 | 3 | 7080.0529968 | 2.67582655 | 442.8867 |
17 | 0 | 3 | 7079.6473463 | 0.40565057 | 864.7367 |
18 | 0 | 2 | 7079.4226727 | 0.22467353 | 608.6162 |
19 | 0 | 2 | 7079.0440924 | 0.37858034 | 45.46735 |
20 | 0 | 4 | 7078.2098257 | 0.83426673 | 1435.201 |
21 | 0 | 4 | 7076.2935373 | 1.91628834 | 893.0885 |
22 | 0 | 3 | 7075.9743026 | 0.31923472 | 297.7529 |
23 | 0 | 3 | 7075.8962293 | 0.07807334 | 104.861 |
24 | 0 | 4 | 7075.6627013 | 0.23352799 | 1142.108 |
25 | 0 | 4 | 7073.7881887 | 1.87451260 | 1057.395 |
26 | 0 | 3 | 7073.0449224 | 0.74326630 | 340.2352 |
27 | 0 | 3 | 7072.9644057 | 0.08051665 | 157.9887 |
28 | 0 | 3 | 7072.957057 | 0.00734868 | 67.15604 |
29 | 0 | 3 | 7072.9554633 | 0.00159374 | 35.61595 |
30 | 0 | 4 | 7072.9476656 | 0.00779774 | 52.25779 |
31 | 0 | 3 | 7072.9459177 | 0.00174785 | 6.514212 |
32 | 0 | 3 | 7072.9457961 | 0.00012161 | 0.446825 |
33 | 0 | 3 | 7072.945793 | 0.00000312 | 0.040506 |
Convergence criterion (GCONV=1E-8) satisfied. |
-2 Log Likelihood | 7072.95 |
---|---|
AIC (smaller is better) | 7092.95 |
AICC (smaller is better) | 7092.97 |
BIC (smaller is better) | 7152.98 |
CAIC (smaller is better) | 7162.98 |
HQIC (smaller is better) | 7114.54 |
-2 log L(smoking | r. effects) | 2009.74 |
---|---|
Pearson Chi-Square | 1444.48 |
Pearson Chi-Square / DF | 0.18 |
Intercept | id | 23.5344 | 2.4922 |
---|---|---|---|
age | id | 0.004184 | 0.000835 |
ageclass | 1 | 0.8444 | 0.7731 | 2209 | 1.09 | 0.2749 | 0.05 | -0.6717 | 2.3605 |
---|---|---|---|---|---|---|---|---|---|
ageclass | 2 | 1.7424 | 0.7304 | 2209 | 2.39 | 0.0171 | 0.05 | 0.3101 | 3.1746 |
ageclass | 3 | 1.0363 | 0.9537 | 2209 | 1.09 | 0.2774 | 0.05 | -0.8340 | 2.9066 |
ageclass | 4 | 9.0983 | 1.6102 | 2209 | 5.65 | <.0001 | 0.05 | 5.9406 | 12.2560 |
age | -0.2268 | 0.02837 | 2978 | -8.00 | <.0001 | 0.05 | -0.2825 | -0.1712 | |
age*ageclass | 1 | 0.1352 | 0.03717 | 2209 | 3.64 | 0.0003 | 0.05 | 0.06233 | 0.2081 |
age*ageclass | 2 | 0.1237 | 0.03288 | 2209 | 3.76 | 0.0002 | 0.05 | 0.05918 | 0.1881 |
age*ageclass | 3 | 0.1466 | 0.03351 | 2209 | 4.37 | <.0001 | 0.05 | 0.08086 | 0.2123 |
age*ageclass | 4 | 0 | . | . | . | . | . | . | . |
ageclass | 4 | 2209 | 9.67 | <.0001 |
---|---|---|---|---|
age | 1 | 2978 | 107.62 | <.0001 |
age*ageclass | 3 | 2209 | 7.02 | 0.0001 |
ageclass | 1 | 1 | 0.5977 | 0.002968 | 0.004430 | 0.01476 | -0.00031 | -0.01738 | 0.000216 | 0.000188 | |
---|---|---|---|---|---|---|---|---|---|---|---|
ageclass | 2 | 2 | 0.002968 | 0.5334 | 0.009498 | 0.03279 | -0.00068 | 0.000594 | -0.01201 | 0.000424 | |
ageclass | 3 | 3 | 0.004430 | 0.009498 | 0.9096 | 0.04726 | -0.00098 | 0.000860 | 0.000693 | -0.01710 | |
ageclass | 4 | 4 | 0.01476 | 0.03279 | 0.04726 | 2.5928 | -0.04467 | 0.04389 | 0.04336 | 0.04314 | |
age | 5 | -0.00031 | -0.00068 | -0.00098 | -0.04467 | 0.000805 | -0.00079 | -0.00078 | -0.00077 | ||
age*ageclass | 1 | 6 | -0.01738 | 0.000594 | 0.000860 | 0.04389 | -0.00079 | 0.001382 | 0.000763 | 0.000759 | |
age*ageclass | 2 | 7 | 0.000216 | -0.01201 | 0.000693 | 0.04336 | -0.00078 | 0.000763 | 0.001081 | 0.000753 | |
age*ageclass | 3 | 8 | 0.000188 | 0.000424 | -0.01710 | 0.04314 | -0.00077 | 0.000759 | 0.000753 | 0.001123 | |
age*ageclass | 4 | 9 |
ageclass | 1 | 1 | |||
---|---|---|---|---|---|
ageclass | 2 | 1 | |||
ageclass | 3 | 1 | |||
ageclass | 4 | 1 | |||
age | 45.45 | 45.45 | 45.45 | 45.45 | |
age*ageclass | 1 | 45.45 | |||
age*ageclass | 2 | 45.45 | |||
age*ageclass | 3 | 45.45 | |||
age*ageclass | 4 | 45.45 |
1 | 45.45 | 514520 | -3.3197 | 0.5074 | 2209 | -6.54 | <.0001 | 0.03490 | 0.01709 |
---|---|---|---|---|---|---|---|---|---|
2 | 45.45 | 514520 | -2.9478 | 0.2642 | 2209 | -11.16 | <.0001 | 0.04984 | 0.01251 |
3 | 45.45 | 514520 | -2.6121 | 0.2471 | 2209 | -10.57 | <.0001 | 0.06837 | 0.01574 |
4 | 45.45 | 514520 | -1.2118 | 0.4404 | 2209 | -2.75 | 0.0060 | 0.2294 | 0.07785 |
ageclass | 1 | 1 | |||
---|---|---|---|---|---|
ageclass | 2 | 1 | |||
ageclass | 3 | 1 | |||
ageclass | 4 | 1 | |||
age | 40 | 40 | 40 | 40 | |
age*ageclass | 1 | 40 | |||
age*ageclass | 2 | 40 | |||
age*ageclass | 3 | 40 | |||
age*ageclass | 4 | 40 |
1 | 40.00 | 514520 | -2.8204 | 0.4056 | 2209 | -6.95 | <.0001 | 0.05623 | 0.02153 |
---|---|---|---|---|---|---|---|---|---|
2 | 40.00 | 514520 | -2.3854 | 0.2292 | 2209 | -10.41 | <.0001 | 0.08429 | 0.01769 |
3 | 40.00 | 514520 | -2.1746 | 0.2810 | 2209 | -7.74 | <.0001 | 0.1021 | 0.02575 |
4 | 40.00 | 514520 | 0.02444 | 0.5534 | 2209 | 0.04 | 0.9648 | 0.5061 | 0.1383 |
ageclass | 1 | 1 | |||
---|---|---|---|---|---|
ageclass | 2 | 1 | |||
ageclass | 3 | 1 | |||
ageclass | 4 | 1 | |||
leeftijd | 50 | 50 | 50 | 50 | |
age*ageclass | 1 | 50 | |||
age*ageclass | 2 | 50 | |||
age*ageclass | 3 | 50 | |||
age*ageclass | 4 | 50 |
1 | 50.00 | 514520 | -3.7366 | 0.6025 | 2209 | -6.20 | <.0001 | 0.02328 | 0.01370 |
---|---|---|---|---|---|---|---|---|---|
2 | 50.00 | 514520 | -3.4173 | 0.3153 | 2209 | -10.84 | <.0001 | 0.03176 | 0.009695 |
3 | 50.00 | 514520 | -2.9773 | 0.2523 | 2209 | -11.80 | <.0001 | 0.04846 | 0.01163 |
4 | 50.00 | 514520 | -2.2440 | 0.3700 | 2209 | -6.07 | <.0001 | 0.09587 | 0.03207 |
Basically I just run the model you suggested and presented you the output.
The ref=first statement somehow didn't work so I left it out. The model is now using category 4 as reference, right? I don't think it should matter much, when it is about the estimates/lsmeans that we are interested in? Though it would be nice to still have the youngest ageclass as reference. I tried order = data, but that didn't change things.
Note: I used other age classes now (20-29, 30-39, 40-49, 50-59), since I am using another dataset now (I have two datasets that I use for this project). But it shouldn't matter, since I had the problems of not being able to estimate the correct prevalences in both datasets. I think the method is incorrect and therefore don't expect the datasets to influence the outcomes.
So, what do we get here? We have three lsmeans statements. One that estimates for every ageclass the prevalence at their mean age? And two others which estimate the prevalence at age = 40 and age = 50? Looking at the tables, it looks like it still gives low prevalences? In these age classes, based in the raw data, I expect prevalences of around 30% (ranging from 20% to 40%) depending on the ageclass of interest, but definately not the 2-10% we are seeing now, am I right?
The mean age lsmeans are at the mean across all groups, not within each group.
The more this goes on, I become worried that age is missing for a lot of these observations, but that your calculation of raw prevalences still includes those subjects with missing age values. Is that a possibility?
Steve Denham
I took a look, but age is not missing. It's actually quite complete.
Any miscodings? Possible high leverage points? I am at a loss for explanations at this point.
Steve Denham
I do not expect something to be wrong with the data. It has been used for many other projects and it has been around for years. It is also constantly managed by someone who does all the data operations on it. I am only making use of the data as it is and did some minor editing, which I checked and I am sure are all correct.
If we are not able to find the cause of this, then too bad I'd say. Thanks a lot anyway.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.