I've come across this nasty bug in proc surveyreg. Nasty because you get wrong results without any warning. I've reported it to SAS. My summary so far:
Required conditions: Proc surveyreg with jackknife or BRR replication using a class statement with a formatted variable.
Other required conditions: The format statement reorders the categories. This is reported in http://support.sas.com/kb/36/068.html.
OR a ref= option is used which changes the reference category from the default (this was my experience).
Outcome: Variance estimates (std errors, F statistics etc) are wrong. (Std errors are too large)
The SAS note 36068 says this was fixed in 9.2 TS2M3 for some systems, but not for all. No fix release stated for Windows versions.
What version of SAS are you on that is still experiencing this error? I'll admit it's weird a 7 year old bug wouldn't be fixed....
I'm using 9.4 TS1M3 on windows (X64_7PRO)
Please describe how you know the results are wrong.
Can you provide a non-sensitive data set and code that replicates the error and the correct values?
Might take some time to put together a dataset. But this is why I conclude the variance estimates are wrong:
1) I run the estimation twice, the only difference is that I use (ref=first) on the class statement. The class variable YOB has two categories (after formatting). I get the output shown below which shows different F values of the YOB variable. This shouldn't change just because you change the reference value.
2) This dataset has both replicate weights and survey design information. They usually give very similar results. In this case, using ref=first does not change the Taylor series estimates (except for the obvious reparameterisation) but does dramatically change the jackknife estimates.
proc format; value yob10ybb 1905-1914 = "1905-14" 1915-1924 = "1915-24" 1925-1934 = "1925-34" 1935-1944 = "1935-44" 1945-1954 = "1945-54" 1955-1964 = "1955-64" 1965-1974 = "1965-74" 1975-1984 = "1975-84" 1985-1994 = "1985-94" ; run; title "(Apparently) correct estimates"; proc surveyreg data=hildadem.srprob plots=none ; class yob ; format yob yob10ybb.; weight _hhwtrp; repweights _rwrp1--_rwrp45; model single = yob /solution ; run; title "Estimates with different reference category. Note different F for YOB"; proc surveyreg data=hildadem.srprob plots=none ; class yob (ref=first); format yob yob10ybb.; weight _hhwtrp; repweights _rwrp1--_rwrp45; model single = yob /solution ; run;
(Apparently) correct estimates 14:55 Friday, September 30, 2016 1
The SURVEYREG Procedure
Regression Analysis for Dependent Variable Single
Data Summary
Number of Observations 7076
Sum of Weights 7947616.5
Weighted Mean of Single 0.17554
Weighted Sum of Single 1395160.1
Fit Statistics
R-Square 0.003363
Root MSE 0.3798
Denominator DF 45
Variance Estimation
Method Jackknife
Replicate Weights SRPROB
Number of Replicates 45
Class Level Information
CLASS
Variable Levels Values
YOB 2 1935-44 1945-54
Tests of Model Effects
Effect Num DF F Value Pr > F
Model 1 4.90 0.0319
Intercept 1 378.78 <.0001
(Apparently) correct estimates 14:55 Friday, September 30, 2016 2
The SURVEYREG Procedure
Regression Analysis for Dependent Variable Single
Tests of Model Effects
Effect Num DF F Value Pr > F
YOB 1 4.90 0.0319
NOTE: The denominator degrees of freedom for the F tests is 45.
Estimated Regression Coefficients
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 0.1942445 0.01116659 17.40 <.0001
YOB 1935-44 -0.0447309 0.02020202 -2.21 0.0319
YOB 1945-54 0.0000000 0.00000000 . .
NOTE: The degrees of freedom for the t tests is 45.
Matrix X'WX is singular and a generalized inverse was used to solve the normal equations. Estimates are not unique.
************************** Problematic results below *****************************************
Estimates with different reference category. Note different F for YOB 3
14:55 Friday, September 30, 2016
The SURVEYREG Procedure
Regression Analysis for Dependent Variable Single
Data Summary
Number of Observations 7076
Sum of Weights 7947616.5
Weighted Mean of Single 0.17554
Weighted Sum of Single 1395160.1
Fit Statistics
R-Square 0.003363
Root MSE 0.3798
Denominator DF 45
Variance Estimation
Method Jackknife
Replicate Weights SRPROB
Number of Replicates 45
Class Level Information
CLASS
Variable Levels Values
YOB 2 1945-54 1935-44
Estimates with different reference category. Note different F for YOB 4
14:55 Friday, September 30, 2016
The SURVEYREG Procedure
Regression Analysis for Dependent Variable Single
Tests of Model Effects
Effect Num DF F Value Pr > F
Model 1 0.01 0.9402
Intercept 1 378.78 <.0001
YOB 1 0.01 0.9402
NOTE: The denominator degrees of freedom for the F tests is 45.
Estimated Regression Coefficients
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept 0.14951351 0.29650288 0.50 0.6165
YOB 1945-54 0.04473095 0.59317890 0.08 0.9402
YOB 1935-44 0.00000000 0.00000000 . .
NOTE: The degrees of freedom for the t tests is 45.
Matrix X'WX is singular and a generalized inverse was used to solve the normal equations. Estimates are not unique.
SAS support have provided me with more information:
- The bug described in alert note 36068 has in fact been fixed (in 9.2 TS2M3). The note was in error in saying that it hadn't been fixed for Windows.
- My problem is actually a separate bug. They have found that it also occurs with unformatted data. An alert note will be coming out soon.
My recommendation: Do not use the REF= option when using replicate weights in SURVEYREG until this is fixed.
And SAS support has supplied the following code to replicate the problem.
data a; input x RepWt_1 RepWt_2 RepWt_3 RepWt_4; w=1; y=x+rannor(23456); datalines; 1 0 1 1 1 1 1 0 1 1 2 1 1 0 1 2 1 1 1 0 ; title 'JK Repweights, Default'; proc surveyreg data=a varmethod=jk; ods select ParameterEstimates; class x; model y=x/solution; weight w; repweights RepWt_1 - RepWt_4; run; title 'BUG: JK Repweights, REF=FIRST'; title2 'Standard Error for X level 2 should be the same as for X level 1 in the previous run'; proc surveyreg data=a varmethod=jk; ods select ParameterEstimates; class x(ref=first); model y=x/solution; weight w; repweights RepWt_1 - RepWt_4; run; title;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.