BookmarkSubscribeRSS Feed
0 Likes

For the SURVEY procedures in SAS (e.g. PROC SURVEYMEANS, PROC SURVEYFREQ,...), the documentation says "If an observation has a weight that is nonpositive or missing, then the procedure omits that observation from the analysis".

 

However, this can give incorrect standard errors when enough PSUs and Stratums from the sampling design are removed due to persons with zero weight. In fact, the DOMAIN statement exists for this precise purpose, in order to keep all observations in the analysis rather than subsetting with a WHERE statement and removing valuable information for standard error estimation. However, persons with zero weight are removed from the dataset before DOMAIN is implemented. Thus, using DOMAIN in this case is moot. 

 

Currently, the only solution I've come across is to either change the weight from zero to a very small positive number (e.g. 1E-60), or to set it to a positive number (e.g. 1) and flag this group with an indicator variable, which is then used in the DOMAIN statement.

 

It would be nice to have a simple option in the SURVEY procedures which wouldn't require editing the original dataset in order to get the correct standard errors.

9 Comments
ballardw
Super User

 

I am curious how you get so many weights of 0.

mitcheem
Fluorite | Level 6

The data is the Medical Expenditure Panel Survey (MEPS), which is nationally representative (and not too many zero weights for the main person weight).

 

The survey also includes supplemental surveys on a sub-population -- for instance, persons with diabetes. For that analysis, anyone that didn't take the Diabetes Care Survey has a secondary weight of 0.

SAS_Rob
SAS Employee

In this situation you should use the NOMCAR option on the SURVEYFREQ statement which will automatically treat the missing/zero weights as a separate domain and thus use them to derive correct standard errors.

mitcheem
Fluorite | Level 6

@SAS_Rob, I can't get the NOMCAR to make a difference. Here's an example, using 2015 data from MEPS (can download SAS transport file here😞

 

 

/* Option 1: default version -- 33250 observations dropped */
proc surveyfreq data = h181; 
	FORMAT DSEY1553 diab_eye. ;
	STRATA VARSTR;
	CLUSTER VARPSU;
	WEIGHT DIABW15F; 
	TABLES DSEY1553 / row;
run;

/* Option 2: Using nomcar -- no difference in SEs */
proc surveyfreq data = h181 nomcar; 
	FORMAT DSEY1553 diab_eye. ;
	STRATA VARSTR;
	CLUSTER VARPSU;
	WEIGHT DIABW15F; 
	TABLES DSEY1553 / row;
run;

/* Option 3: Changing weight by hand */
data alt; 
	set h181;
	if DIABW15F = 0 then do;
		DIABW15F = 1;
		domain = 2;
	end;
	else do;
		domain = 1;
	end;
run;

proc surveyfreq data = alt; 
	FORMAT DSEY1553 diab_eye. ;
	STRATA VARSTR;
	CLUSTER VARPSU;
	WEIGHT DIABW15F; 
	TABLES domain*DSEY1553 / row;
run;

Options 1 and 2 both drop 33,250 observations before analysis, and both give row standard errors for 'Eye exam in past year' of 1.2847.

 

 

Option 3, which alters the weights in the dataset, doesn't drop any observations, and gives a row standard error of 1.2927

SAS_Rob
SAS Employee

I think there may be more going on here.  When you add those observations in manually, you are also introducing 30 additional clusters that would not have been there otherwise (see the Data Summary table).  This also changes the variance calculation, not just because of the addition of those 30 clusters, but also because of the 

NOTE:There is at least one stratum that contains only a single cluster for the table of DSEY1553. Single-cluster strata are not included in the variance estimates.

 

I think that the alternate code might actual be giving you incorrect standard errors.

mitcheem
Fluorite | Level 6

Thanks for the suggestions, @SAS_Rob, but I don't think that's the issue. The problem is that over 30,000 observations are dropped from the dataset before SURVEYFREQ runs, meaning that some PSUs and Strata are dropped as well, which are needed for correct SE estimation (this is similar to the reason why we have to use the DOMAIN statement, instead of a 'WHERE = ' subset, to keep all observations in the dataset in order to calculate correct SEs).

 

But since you brought it up, what's the best way to deal with lonely PSUs (aka 'single-cluster strata') in SAS?

SAS_Rob
SAS Employee

There is no one best way to deal with singleton strata which is why SAS doesn't do anything automatically.  The most common approach is to collapse them into other similar strata prior to running the procedure.

NCANT033
Calcite | Level 5

Hi @SAS_Rob.

Aside from collapsing your singleton strata into a similar strata, does SAS have any other procedure to deal with the lonely PSU. 

I know in R you can use the following command: 

survey.lonely.psu="adjust"

 

This option takes the 'lonely' stratum's contribution to the variance as the average of all the strata with more than one PSU. 

Any comments you could provide would be greatly appreciated!

Thanks

hbercaw
Calcite | Level 5

If a MEC weight is equal to zero, it means that the person is missing a MEC examination.  It should not be included in analysis, and MEC weight=0 be considered an exclusion criteria for your sample. This is where the DOMAIN statement comes in. 

if MECwt = 0 the eligible =0;

else eligible=1;

and use the statement (DOMAIN eligible) to interpret results where eligible=1.