BookmarkSubscribeRSS Feed
MichaelMcG
Calcite | Level 5

I am working with the NHANES 2011-2018 data and I am trying to exclude the "Don't Know", "Refused", and missing responses from the analysis. Following the suggestions in this thread, I created a variable, ANALYSIS for the observations I wanted to include in my analysis:

DATA data.DSdec_cd;
    SET data.DSdec;
    KEEP RIAGENDR RIDAGEYR DMDYRSUS
         DMDEDUC2 INDFMPIR ACD040
         SDMVSTRA SDMVPSU WTINT10YR
         FSDHH ANALYSIS;
    IF (DMDYRSUS IN (1:9)
        AND
        DMDEDUC2 IN (1:5)
        AND
        INDFMPIR ge 0
        AND
        ACD040 IN (1:5)
        AND
        FSDHH IN (1:4))
      THEN ANALYSIS = 1;
    ELSE ANALYSIS = 0;
RUN;

However, the following SURVEYLOGISTIC procedure:

PROC SURVEYLOGISTIC data = data.DSdec_cd order = data NOMCAR;
DOMAIN ANALYSIS ("1");
STRATA SDMVSTRA;
CLUSTER SDMVPSU;
WEIGHT WTINT10YR;
CLASS RIAGENDR (REF = LAST)
DMDYRSUS (REF = FIRST)
DMDEDUC2 (REF = FIRST)
ACD040 (REF = FIRST)
FSDHH (REF = FIRST) / PARAM = REF;
MODEL FSDHH = RIAGENDR
RIDAGEYR
ACD040 / CLODDS;
ODS SELECT Domain2.CLOdds;
RUN;

It still includes the "Don't Know" (7) and "Refused" (9) categories. I don't understand why they are still included or if it's possible to exclude them from the analysis or at least the output. I've used WHERE statements to subset the data, but I understand that that is not the way to restrict the analysis to a subset of the data.

 

I am also wondering how to get Wald confidence limits with SURVEYLOGISTIC,  as the documentation say that I can but does not do by default.

 

2 REPLIES 2
ballardw
Super User

WHICH Variables have the unwanted "Don't Know" or "Refused" categories?

 

What your analysis variable does is segregate observations, it does not remove any values of "Don't Know" or  "Refused".

From the documentation for the DOMAIN statement for Surveylogistic: (emphasis added). So you get an analysis of the subpopulation where analysis = 1 and all of the observations.

The DOMAIN statement requests analysis for domains (subpopulations) in addition to analysis for the entire study population.

 

Typically you set the variables to MISSING, which means the observations will be excluded from the model OR recode them to another category you do want to include.

Setting to missing would be something like

If var in (7, 9) then call missing(var);

 

mkeintz
PROC Star

You could use the "where=" filter on your input dataset to proc surveylogistic, as in:

 

PROC SURVEYLOGISTIC data = data.DSdec_cd (where=(analysis=1)) order = data NOMCAR;

And then you wouldn't need the DOMAIN statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

LIBNAME 101

Follow along as SAS technical trainer Dominique Weatherspoon expertly answers all your questions about SAS Libraries.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 888 views
  • 3 likes
  • 3 in conversation