I have a large, longitudinal data set -- seven years of data for approximately 115,000 students (approximately 500,000 observations), attending approximately 600 schools. I am doing cross-classified individual growth modeling (cross-classified because students change schools). Since the data set is so large, I am estimating models with small (2 percent) subsamples of the data set. However, with the 2 percent subsamples, the data are sparse; and so PROC MIXED cannot estimate the covariance parameters. Therefore, I am using a combination of HPMIXED and MIXED. This way I can estimate the covariance parameters in HPMIXED, and then pass them along to MIXED. Then I can compute sequential F-tests and make adjustments for multiple comparisons in testing differences between LSMEANS. The grouping variable of interest is ProFnc: 1=proficient, 0=non-proficient. When I put the grouping variable in the model with no interactions, the fixed effects produced by HPMIXED and MIXED are the same. However, HPMIXED estimates a fixed effect for ProFnc=1, where as MIXED estimates an effect for ProFnc=0. Then, when I add to the model interactions with the grouping variable, the fixed effects seem to behave strangely. I don't know if I should ignore the fixed effects that HPMIXED produces, and just go with the fixed effects that MIXED produces. And if that is the case, can I trust the covariance parameters coming out of HPMIXED? Can I trust the fixed effects that MIXED is producing? If there is a problem, is there a change I can make in the code to fix this? Below I have pasted the code and output for the two models I described above. Again, I first have the model with the grouping variable but no interactions; and second, I have the model with the grouping variable and its interactions. (Note: I realize these particular interactions are not significant, but I am trying to understand what SAS is doing here and which numbers I can trust.) Thank you in advance for any help with this. /* MODEL 1 - the grouping variable (ProFnc) and NO interactions */ /* HPMIXED */ PROC HPMIXED DATA=sub2pct_long noclprint; CLASS stdpseudoid schcode ProFnc; MODEL zELA = Timec|Timec|Timec|Timec Female PrEdClGrd PrEdSmClHS FRL ProFnc zCeldt /solution ; RANDOM intercept Timec /subject=stdpseudoid type=un; RANDOM intercept /subject=schcode type=un; ODS OUTPUT covparms=hpmcov2pct; RUN; /* MIXED */ PROC MIXED DATA=sub2pct_long noclprint covtest lognote method=reml; CLASS stdpseudoid schcode ProFnc; MODEL zELA = Timec|Timec|Timec|Timec Female PrEdClGrd PrEdSmClHS FRL zCeldt ProFnc /solution htype=1; RANDOM intercept Timec /subject=stdpseudoid type=un; RANDOM intercept /subject=schcode type=un; PARMS/PDATA=hpmcov2pct hold=1,2,3,4,5 noiter; RUN; THE HPMIXED PROCEDURE Data Set WORK.SUB2PCT_LONG Response Variable zELA Estimation Method Restricted Maximum Likelihood (REML) Degrees of Freedom Method Residual Number of Observations Read 10175 Number of Observations Used 8898 Dimensions G-side Cov. Parameters 4 R-side Cov. Parameters 1 Columns in X 12 Columns in Z 4969 Subjects (Blocks in V) 1 Optimization Information Optimization Technique Dual Quasi-Newton Parameters in Optimization 4 Lower Boundaries 3 Upper Boundaries 0 Residual Variance Profiled Iteration History Iterations Evaluations Objective Function Change Max Gradient 0 4 18310.276802 . 3333 425.1763 1 4 18306.028084 4.24871820 56.80605 2 3 18305.679162 0.34892203 74.52385 3 3 18305.05507 0.62409210 39.65128 4 4 18305.032432 0.02263798 38.72127 5 4 18304.982517 0.04991456 4.723682 6 3 18304.981792 0.00072531 0.027503 7 3 18304.981792 0.00000002 0.000091 Convergence criterion (GCONV=1E-8) satisfied. Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) StdPseudoId 0.6356 UN(2,1) StdPseudoId -0.00929 UN(2,2) StdPseudoId 0.01124 UN(1,1) SchCode 0.009595 Residual 0.2287 Fit Statistics -2 Res Log Likelihood 18305 AIC (smaller is better) 18315 AICC (smaller is better) 18315 BIC (smaller is better) 18305 CAIC (smaller is better) 18310 HQIC (smaller is better) 18305 Solution for Fixed Effects Effect ProFnc Estimate Standard Error DF t Value Pr > |t| Intercept -0.1662 0.04025 8887 -4.13 <.0001 Timec -0.02220 0.03404 8887 -0.65 0.5144 Timec*Timec -0.00221 0.02929 8887 -0.08 0.9398 Timec*Timec*Timec 0.002277 0.008339 8887 0.27 0.7848 Time*Time*Time*Timec -0.00020 0.000738 8887 -0.26 OC0.7913 Female 0.05291 0.03625 8887 1.46 0.1444 PrEdClGrd 0.1948 0.09722 8887 2.00 0.0452 PrEdSmClHS 0.1982 0.04100 8887 4.83 <.0001 FRL -0.04520 0.03694 8887 -1.22 0.2212 ProFnc 0 0 . . . . ProFnc 1 0.2171 0.03673 8887 5.91 <.0001 zCeldt 0.2361 0.01803 8887 13.10 <.0001 THE MIXED PROCEDURE Data Set WORK.SUB2PCT_LONG Dependent Variable zELA Covariance Structures Unstructured, Variance Components Subject Effects StdPseudoId, SchCode Estimation Method REML Residual Variance Method Parameter Fixed Effects SE Method Model-Based Degrees of Freedom Method Containment Covariance Parameters 5 Columns in X 20 Columns in Z 4969 Subjects 1 Max Obs Per Subject 10175 Number of Observations Number of Observations Read 10175 Number of Observations Used 8898 Number of Observations Not Used 1277 Parameter Search CovP1 CovP2 CovP3 CovP4 CovP5 Res Log Like -2 Res Log Like 0.6356 -0.00929 0.01124 0.009595 0.2287 -9152.4909 18304.9818 Covariance Parameter Estimates Cov Parm Subject Estimate Standard Error Z Value Pr Z UN(1,1) StdPseudoId 0.6356 0 . . UN(2,1) StdPseudoId -0.00929 0 . . UN(2,2) StdPseudoId 0.01124 0 . . UN(1,1) SchCode 0.009595 0 . . Residual 0.2287 0 . . Fit Statistics -2 Res Log Likelihood 18305.0 AIC (smaller is better) 18305.0 AICC (smaller is better) 18305.0 BIC (smaller is better) 18305.0 Solution for Fixed Effects Effect ProFnc Estimate Standard Error DF t Value Pr > |t| Intercept 0.05087 0.04413 425 1.15 0.2497 Timec -0.02220 0.03404 2040 -0.65 0.5144 Timec*Timec -0.00221 0.02929 4190 -0.08 0.9398 Timec*Timec*Timec 0.002277 0.008339 4190 0.27 0.7848 Time*Time*Time*Timec -0.00020 0.000738 4190 -0.26 0.7913 Female 0.05291 0.03625 4190 1.46 0.1444 PrEdClGrd 0.1948 0.09722 4190 2.00 0.0452 PrEdSmClHS 0.1982 0.04100 4190 4.83 <.0001 FRL -0.04520 0.03694 4190 -1.22 0.2212 ProFnc 0 -0.2171 0.03673 4190 -5.91 <.0001 ProFnc 1 0 . . . . zCeldt 0.2361 0.01803 4190 13.10 <.0001 Type I Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F Timec 1 2040 0.66 0.4156 Timec*Timec 1 4190 10.33 0.0013 Timec*Timec*Timec 1 4190 0.03 0.8713 Time*Time*Time*Timec 1 4190 0.01 0.9256 Female 1 4190 5.38 0.0204 PrEdClGrd 1 4190 3.38 0.0659 PrEdSmClHS 1 4190 31.73 <.0001 FRL 1 4190 0.56 0.4555 ProFnc 1 4190 33.85 <.0001 zCeldt 1 4190 171.49 <.0001 /* MODEL 2 - the grouping variable (ProFnc), AND its interactions */ /* HPMIXED */ PROC HPMIXED DATA=sub2pct_long noclprint; CLASS stdpseudoid schcode ProFnc; MODEL zELA = Timec|Timec|Timec|Timec Female PrEdClGrd PrEdSmClHS FRL zCeldt ProFnc ProFnc*Female ProFnc*PrEdClGrd ProFnc*PrEdSmClHS ProFnc*zCeldt /solution ; RANDOM intercept Timec /subject=stdpseudoid type=un; RANDOM intercept /subject=schcode type=un; ODS OUTPUT covparms=hpmcov2pct; RUN; /* MIXED */ PROC MIXED DATA=sub2pct_long noclprint covtest lognote method=reml; CLASS stdpseudoid schcode ProFnc; MODEL zELA = Timec|Timec|Timec|Timec Female PrEdClGrd PrEdSmClHS FRL ProFnc zCeldt ProFnc*Female ProFnc*PrEdClGrd ProFnc*PrEdSmClHS ProFnc*zCeldt /solution htype=1; RANDOM intercept Timec /subject=stdpseudoid type=un; RANDOM intercept /subject=schcode type=un; PARMS/PDATA=hpmcov2pct hold=1,2,3,4,5 noiter; RUN; THE HPMIXED PROCEDURE Data Set WORK.SUB2PCT_LONG Response Variable zELA Estimation Method Restricted Maximum Likelihood (REML) Degrees of Freedom Method Residual Number of Observations Read 10175 Number of Observations Used 8898 Dimensions G-side Cov. Parameters 4 R-side Cov. Parameters 1 Columns in X 20 Columns in Z 4969 Subjects (Blocks in V) 1 Optimization Information Optimization Technique Dual Quasi-Newton Parameters in Optimization 4 Lower Boundaries 3 Upper Boundaries 0 Residual Variance Profiled Iteration History Iteration Evaluations Objective Function Change Max Gradient 0 4 18320.233278 . 430.4562 1 4 18315.885135 4.34814326 57.74109 2 3 18315.533274 0.35186101 75.8847 3 3 18314.895565 0.63770890 40.89073 4 4 18314.872387 0.02317834 39.85029 5 4 18314.819313 0.05307395 3.502647 6 3 18314.818918 0.00039410 0.020809 7 3 18314.818918 0.00000001 0.000075 Convergence criterion (GCONV=1E-8) satisfied. Covariance Parameter Estimates CovParm Subject Estimate UN(1,1) StdPseudoId 0.6363 UN(2,1) StdPseudoId -0.00940 UN(2,2) StdPseudoId 0.01123 UN(1,1) SchCode 0.009609 Residual 0.2287 Fit Statistics -2 Res Log Likelihood 18315 AIC (smaller is better) 18325 AICC (smaller is better) 18325 BIC (smaller is better) 18315 CAIC (smaller is better) 18320 HQIC (smaller is better) 18315 Solution for Fixed Effects Effects ProFnc Estimate Standard Error DF t Value Pr > |t| Intercept -0.1545 0.04406 8883 -3.51 0.0005 Timec -0.02224 0.03404 8883 -0.65 0.5136 Timec*Timec -0.00219 0.02929 8883 -0.07 0.9404 Timec*Timec*Timec 0.002277 0.008339 8883 0.27 0.7848 Time*Time*Time*Timec -0.00020 0.000738 8883 -0.27 0.7909 Female 0 . . . . PrEdClGrd 0 . . . . PrEdSmClHS 0 . . . . FRL -0.04623 0.03708 8883 -1.25 0.2125 ProFnc 0 0 . . . . ProFnc 1 0.1951 0.05881 8883 3.32 0.0009 zCeldt 0 . . . . Female*ProFnc 0 0.06340 0.04752 8883 1.33 0.1822 Female*ProFnc 1 0.03835 0.05611 8883 0.68 0.4943 PrEdClGrd*ProFnc 0 0.1479 0.1253 8883 1.18 0.2377 PrEdClGrd*ProFnc 1 0.2608 0.1544 8883 1.69 0.0911 PrEdSmClHS*ProFnc 0 0.1508 0.05290 8883 2.85 0.0044 PrEdSmClHS*ProFnc 1 0.2672 0.06477 8883 4.13 <.0001 zCeldt*ProFnc 0 0.2515 0.02414 8883 10.42 <.0001 zCeldt*ProFnc 1 0.2173 0.02710 8883 8.02 <.0001 THE MIXED PROCEDURE Data Set WORK.SUB2PCT_LONG Dependent Variable zELA Covariance Structures Unstructured, Variance Components Subject Effects StdPseudoId, SchCode Estimation Method REML Residual Variance Method Parameter Fixed Effects SE Method Model-Based Degrees of Freedom Method Containment Dimensions Covariance Parameters 5 Columns in X 20 Columns in Z 4969 Subjects 1 Max Obs Per Subject 10175 Number of Observations Read 10175 Number of Observations Used 8898 Number of Observations Not Used 1277 Parameter Search CovP1 CovP2 CovP3 CovP4 CovP5 Res Log Like -2 Res Log Like 0.6363 -0.00940 0.01123 0.009609 0.2287 -9146.8531 18293.7063 Covariance Parameter Estimates Cov Parm Subject Estimate Standard Error Z Value Pr Z UN(1,1) StdPseudoId 0.6363 0 . . UN(2,1) StdPseudoId -0.00940 0 . . UN(2,2) StdPseudoId 0.01123 0 . . UN(1,1) SchCode 0.009609 0 . . Residual 0.2287 0 . . Fit Statistics -2 Res Log Likelihood 18293.7 AIC (smaller is better) 18293.7 AICC (smaller is better) 18293.7 BIC (smaller is better) 18293.7 Solution for Fixed Effects Effect ProFnc Estimate Standard Error DF t Value Pr > |t| Intercept 0.03238 15327 425 0.00 1.0000 Timec -0.02224 0.03404 2040 -0.65 0.5136 Timec*Timec -0.00219 0.02929 4190 -0.07 0.9404 Timec*Timec*Timec 0.002277 0.008339 4190 0.27 0.7848 Time*Time*Time*Timec -0.00020 0.000738 4190 -0.27 0.7909 Female 0.03835 0.05611 4190 0.68 0.4943 PrEdClGrd 0.2608 0.1544 4190 1.69 0.0911 PrEdSmClHS 0.2672 0.06477 4190 4.13 <.0001 FRL -0.04623 0.03708 4190 -1.25 0.2125 ProFnc 0 -0.1869 15327 4190 -0.00 1.0000 ProFnc 1 0.008199 15327 4190 0.00 1.0000 zCeldt 0.2173 0.02710 4190 8.02 <.0001 Female*ProFnc 0 0.02505 0.07352 4190 0.34 0.7333 Female*ProFnc 1 0 . . . . PrEdClGrd*ProFnc 0 -0.1129 0.1988 4190 -0.57 0.5701 PrEdClGrd*ProFnc 1 0 . . . . PrEdSmClHS*ProFnc 0 -0.1165 0.08351 4190 -1.39 0.1631 PrEdSmClHS*ProFnc 1 0 . . . . zCeldt*ProFnc 0 0.03423 0.03626 4190 0.94 0.3453 zCeldt*ProFnc 1 0 . . . . Type I Tests of Fixed Effects Effect Num DF Den DF F Value Pr > F Timec 1 2040 0.63 0.4270 Timec*Timec 1 4190 10.39 0.0013 Timec*Timec*Timec 1 4190 0.01 0.9113 Time*Time*Time*Timec 1 4190 0.01 0.9377 Female 1 4190 5.35 0.0208 PrEdClGrd 1 4190 3.39 0.0657 PrEdSmClHS 1 4190 31.90 <.0001 FRL 1 4190 0.52 0.4690 ProFnc 1 4190 33.97 <.0001 zCeldt 1 4190 171.07 <.0001 Female*ProFnc 1 4190 0.10 0.7482 PrEdClGrd*ProFnc 1 4190 0.14 0.7094 PrEdSmClHS*ProFnc 1 4190 1.74 0.1877 zCeldt*ProFnc 1 4190 0.89 0.3453
... View more