BookmarkSubscribeRSS Feed
BruceBrad
Lapis Lazuli | Level 10

I've come across this nasty bug in proc surveyreg. Nasty because you get wrong results without any warning. I've reported it to SAS. My summary so far:

 

Required conditions: Proc surveyreg with jackknife or BRR replication using a class statement with a formatted variable.

Other required conditions: The format statement reorders the categories. This is reported in http://support.sas.com/kb/36/068.html.

OR a ref= option is used which changes the reference category from the default (this was my experience).

 

Outcome: Variance estimates (std errors, F statistics etc) are wrong. (Std errors are too large)

 

The SAS note 36068 says this was fixed in 9.2 TS2M3 for some systems, but not for all. No fix release stated for Windows versions.

 

6 REPLIES 6
Reeza
Super User

What version of SAS are you on that is still experiencing this error? I'll admit it's weird a 7 year old bug wouldn't be fixed....

BruceBrad
Lapis Lazuli | Level 10

I'm using 9.4 TS1M3 on windows (X64_7PRO)

ballardw
Super User

Please describe how you know the results are wrong.

Can you provide a non-sensitive data set and code that replicates the error and the correct values?

BruceBrad
Lapis Lazuli | Level 10

Might take some time to put together a dataset. But this is why I conclude the variance estimates are wrong:

1) I run the estimation twice, the only difference is that I use (ref=first) on the class statement. The class variable YOB has two categories (after formatting). I get the output shown below which shows different F values of the YOB variable. This shouldn't change just because you change the reference value.

2) This dataset has both replicate weights and survey design information. They usually give very similar results. In this case, using ref=first does not change the Taylor series estimates (except for the obvious reparameterisation) but does dramatically change the jackknife estimates.

 

proc format;
  value yob10ybb 1905-1914 = "1905-14" 1915-1924 = "1915-24" 1925-1934 = "1925-34" 1935-1944 = "1935-44" 1945-1954 = "1945-54"
1955-1964 = "1955-64" 1965-1974 = "1965-74" 1975-1984 = "1975-84" 1985-1994 = "1985-94"
; 
run;

title "(Apparently) correct estimates";
proc surveyreg    data=hildadem.srprob  plots=none ;
  class yob ;
  format yob yob10ybb.;
  weight _hhwtrp;
  repweights _rwrp1--_rwrp45;
  model single = yob /solution ;
  run;

title "Estimates with different reference category. Note different F for YOB";
proc surveyreg    data=hildadem.srprob  plots=none ;
  class yob (ref=first);
  format yob yob10ybb.;
  weight _hhwtrp;
  repweights _rwrp1--_rwrp45;
  model single = yob /solution ;
  run;

   

                                                      (Apparently) correct estimates                  14:55 Friday, September 30, 2016   1

                                                         The SURVEYREG Procedure
 
                                            Regression Analysis for Dependent Variable Single

                                                              Data Summary

                                                  Number of Observations           7076
                                                  Sum of Weights              7947616.5
                                                  Weighted Mean of Single       0.17554
                                                  Weighted Sum of Single      1395160.1


                                                              Fit Statistics

                                                        R-Square          0.003363
                                                        Root MSE            0.3798
                                                        Denominator DF          45


                                                           Variance Estimation

                                                    Method                   Jackknife
                                                    Replicate Weights           SRPROB
                                                    Number of Replicates            45


                                                         Class Level Information
 
                                                 CLASS
                                                 Variable      Levels    Values

                                                 YOB                2    1935-44 1945-54 


                                                          Tests of Model Effects
 
                                                 Effect       Num DF    F Value    Pr > F

                                                 Model             1       4.90    0.0319
                                                 Intercept         1     378.78    <.0001
                                                      (Apparently) correct estimates                  14:55 Friday, September 30, 2016   2

                                                         The SURVEYREG Procedure
 
                                            Regression Analysis for Dependent Variable Single

                                                          Tests of Model Effects
 
                                                 Effect       Num DF    F Value    Pr > F

                                                 YOB               1       4.90    0.0319

                                     NOTE: The denominator degrees of freedom for the F tests is 45.


                                                    Estimated Regression Coefficients
 
                                                                     Standard
                                      Parameter        Estimate         Error    t Value    Pr > |t|

                                      Intercept       0.1942445    0.01116659      17.40      <.0001
                                      YOB 1935-44    -0.0447309    0.02020202      -2.21      0.0319
                                      YOB 1945-54     0.0000000    0.00000000        .         .    

NOTE: The degrees of freedom for the t tests is 45.
      Matrix X'WX is singular and a generalized inverse was used to solve the normal equations.  Estimates are not unique.
                                 
************************** Problematic results below *****************************************

Estimates with different reference category. Note different F for YOB 3 14:55 Friday, September 30, 2016 The SURVEYREG Procedure Regression Analysis for Dependent Variable Single Data Summary Number of Observations 7076 Sum of Weights 7947616.5 Weighted Mean of Single 0.17554 Weighted Sum of Single 1395160.1 Fit Statistics R-Square 0.003363 Root MSE 0.3798 Denominator DF 45 Variance Estimation Method Jackknife Replicate Weights SRPROB Number of Replicates 45 Class Level Information CLASS Variable Levels Values YOB 2 1945-54 1935-44 Estimates with different reference category. Note different F for YOB 4 14:55 Friday, September 30, 2016 The SURVEYREG Procedure Regression Analysis for Dependent Variable Single Tests of Model Effects Effect Num DF F Value Pr > F Model 1 0.01 0.9402 Intercept 1 378.78 <.0001 YOB 1 0.01 0.9402 NOTE: The denominator degrees of freedom for the F tests is 45. Estimated Regression Coefficients Standard Parameter Estimate Error t Value Pr > |t| Intercept 0.14951351 0.29650288 0.50 0.6165 YOB 1945-54 0.04473095 0.59317890 0.08 0.9402 YOB 1935-44 0.00000000 0.00000000 . . NOTE: The degrees of freedom for the t tests is 45. Matrix X'WX is singular and a generalized inverse was used to solve the normal equations. Estimates are not unique.

 

BruceBrad
Lapis Lazuli | Level 10

SAS support have provided me with more information:

- The bug described in alert note 36068 has in fact been fixed (in 9.2 TS2M3). The note was in error in saying that it hadn't been fixed for Windows.

- My problem is actually a separate bug. They have found that it also occurs with unformatted data. An alert note will be coming out soon.

 

My recommendation: Do not use the REF= option when using replicate weights in SURVEYREG until this is fixed.

BruceBrad
Lapis Lazuli | Level 10

And SAS support has supplied the following code to replicate the problem.

data a;
  input x RepWt_1 RepWt_2 RepWt_3 RepWt_4;
  w=1;
  y=x+rannor(23456);
  datalines;
1  0 1 1 1
1  1 0 1 1 
2  1 1 0 1
2  1 1 1 0
;
title 'JK Repweights, Default';
proc surveyreg data=a varmethod=jk;
ods select ParameterEstimates;
  class x;
  model y=x/solution;
  weight w; 
  repweights RepWt_1 - RepWt_4;
run;
title 'BUG: JK Repweights, REF=FIRST';
title2 'Standard Error for X level 2 should be the same as for X level 1 in the previous run';
proc surveyreg data=a varmethod=jk;
ods select ParameterEstimates;
  class x(ref=first);
  model y=x/solution;
  weight w; 
  repweights RepWt_1 - RepWt_4;
run;
title;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1824 views
  • 3 likes
  • 3 in conversation