I've come across this nasty bug in proc surveyreg. Nasty because you get wrong results without any warning. I've reported it to SAS. My summary so far:
Required conditions: Proc surveyreg with jackknife or BRR replication using a class statement with a formatted variable.
Other required conditions: The format statement reorders the categories. This is reported in http://support.sas.com/kb/36/068.html.
OR a ref= option is used which changes the reference category from the default (this was my experience).
Outcome: Variance estimates (std errors, F statistics etc) are wrong. (Std errors are too large)
The SAS note 36068 says this was fixed in 9.2 TS2M3 for some systems, but not for all. No fix release stated for Windows versions.
What version of SAS are you on that is still experiencing this error? I'll admit it's weird a 7 year old bug wouldn't be fixed....
I'm using 9.4 TS1M3 on windows (X64_7PRO)
Please describe how you know the results are wrong.
Can you provide a non-sensitive data set and code that replicates the error and the correct values?
Might take some time to put together a dataset. But this is why I conclude the variance estimates are wrong:
1) I run the estimation twice, the only difference is that I use (ref=first) on the class statement. The class variable YOB has two categories (after formatting). I get the output shown below which shows different F values of the YOB variable. This shouldn't change just because you change the reference value.
2) This dataset has both replicate weights and survey design information. They usually give very similar results. In this case, using ref=first does not change the Taylor series estimates (except for the obvious reparameterisation) but does dramatically change the jackknife estimates.
proc format; value yob10ybb 1905-1914 = "1905-14" 1915-1924 = "1915-24" 1925-1934 = "1925-34" 1935-1944 = "1935-44" 1945-1954 = "1945-54" 1955-1964 = "1955-64" 1965-1974 = "1965-74" 1975-1984 = "1975-84" 1985-1994 = "1985-94" ; run; title "(Apparently) correct estimates"; proc surveyreg data=hildadem.srprob plots=none ; class yob ; format yob yob10ybb.; weight _hhwtrp; repweights _rwrp1--_rwrp45; model single = yob /solution ; run; title "Estimates with different reference category. Note different F for YOB"; proc surveyreg data=hildadem.srprob plots=none ; class yob (ref=first); format yob yob10ybb.; weight _hhwtrp; repweights _rwrp1--_rwrp45; model single = yob /solution ; run;
(Apparently) correct estimates 14:55 Friday, September 30, 2016 1 The SURVEYREG Procedure Regression Analysis for Dependent Variable Single Data Summary Number of Observations 7076 Sum of Weights 7947616.5 Weighted Mean of Single 0.17554 Weighted Sum of Single 1395160.1 Fit Statistics R-Square 0.003363 Root MSE 0.3798 Denominator DF 45 Variance Estimation Method Jackknife Replicate Weights SRPROB Number of Replicates 45 Class Level Information CLASS Variable Levels Values YOB 2 1935-44 1945-54 Tests of Model Effects Effect Num DF F Value Pr > F Model 1 4.90 0.0319 Intercept 1 378.78 <.0001 (Apparently) correct estimates 14:55 Friday, September 30, 2016 2 The SURVEYREG Procedure Regression Analysis for Dependent Variable Single Tests of Model Effects Effect Num DF F Value Pr > F YOB 1 4.90 0.0319 NOTE: The denominator degrees of freedom for the F tests is 45. Estimated Regression Coefficients Standard Parameter Estimate Error t Value Pr > |t| Intercept 0.1942445 0.01116659 17.40 <.0001 YOB 1935-44 -0.0447309 0.02020202 -2.21 0.0319 YOB 1945-54 0.0000000 0.00000000 . . NOTE: The degrees of freedom for the t tests is 45. Matrix X'WX is singular and a generalized inverse was used to solve the normal equations. Estimates are not unique.
************************** Problematic results below *****************************************
Estimates with different reference category. Note different F for YOB 3 14:55 Friday, September 30, 2016 The SURVEYREG Procedure Regression Analysis for Dependent Variable Single Data Summary Number of Observations 7076 Sum of Weights 7947616.5 Weighted Mean of Single 0.17554 Weighted Sum of Single 1395160.1 Fit Statistics R-Square 0.003363 Root MSE 0.3798 Denominator DF 45 Variance Estimation Method Jackknife Replicate Weights SRPROB Number of Replicates 45 Class Level Information CLASS Variable Levels Values YOB 2 1945-54 1935-44 Estimates with different reference category. Note different F for YOB 4 14:55 Friday, September 30, 2016 The SURVEYREG Procedure Regression Analysis for Dependent Variable Single Tests of Model Effects Effect Num DF F Value Pr > F Model 1 0.01 0.9402 Intercept 1 378.78 <.0001 YOB 1 0.01 0.9402 NOTE: The denominator degrees of freedom for the F tests is 45. Estimated Regression Coefficients Standard Parameter Estimate Error t Value Pr > |t| Intercept 0.14951351 0.29650288 0.50 0.6165 YOB 1945-54 0.04473095 0.59317890 0.08 0.9402 YOB 1935-44 0.00000000 0.00000000 . . NOTE: The degrees of freedom for the t tests is 45. Matrix X'WX is singular and a generalized inverse was used to solve the normal equations. Estimates are not unique.
SAS support have provided me with more information:
- The bug described in alert note 36068 has in fact been fixed (in 9.2 TS2M3). The note was in error in saying that it hadn't been fixed for Windows.
- My problem is actually a separate bug. They have found that it also occurs with unformatted data. An alert note will be coming out soon.
My recommendation: Do not use the REF= option when using replicate weights in SURVEYREG until this is fixed.
And SAS support has supplied the following code to replicate the problem.
data a; input x RepWt_1 RepWt_2 RepWt_3 RepWt_4; w=1; y=x+rannor(23456); datalines; 1 0 1 1 1 1 1 0 1 1 2 1 1 0 1 2 1 1 1 0 ; title 'JK Repweights, Default'; proc surveyreg data=a varmethod=jk; ods select ParameterEstimates; class x; model y=x/solution; weight w; repweights RepWt_1 - RepWt_4; run; title 'BUG: JK Repweights, REF=FIRST'; title2 'Standard Error for X level 2 should be the same as for X level 1 in the previous run'; proc surveyreg data=a varmethod=jk; ods select ParameterEstimates; class x(ref=first); model y=x/solution; weight w; repweights RepWt_1 - RepWt_4; run; title;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.