@buder wrote:
In working on weighted descriptive statistics, I started with PROC SURVEYFREQ as I am interested in analyzing one variable and there upon a subsample of the population within that variable: more specifically, among those who are physically active, how many are White, Black, Hispanic, and Asian. Putting in a 'where' statement gives a note stating: The input data set is subset by WHERE, OBS, or FIRSTOBS. This provides a completely separate analysis of the subset. It does not provide a statistically valid subpopulation or omain analysis, where the total number of units in the subpopulation is not known with certainty. If you want a domain analysis, you should include the domain variables in the TABLES request.
I changed the code to a PROC SURVEYMEANS and used the "domain statement" instead of the "where" statement. The variables that I looked into generated the means were "active_cat, insufficient_cat, inactive_cat" which were 1 if the main outcome variable and 0 if not.
For example among those who are White non-Hispanic code is provided below:
PROC SURVEYMEANS DATA = FINAL;
DOMAIN WHITE_MEPS;
VAR ACTIVE_CAT INSUFFICIENT_CAT INACTIVE_CAT;
WEIGHT "WEIGHT";
RUN;
ACTIVE_CAT: 1 is active; 0 is either inactive or insufficient
INSUFFICIENT: 1 is insufficient; 0 is either active or inactive
INACT: 1 is inactive; 0 is either active or insufficient
I ran the same code for DOMAIN Black non-Hispanic, Asian non-Hispanic, and Hispanic.
Output is below:
WHITE_NH Std Error
N Mean of Mean 95% CL for Mean
0 ACTIVE_CAT 31055 0.426972 0.003908 0.41931346 0.43463105
INSUFFICIENT_CAT 31055 0.191087 0.003140 0.18493214 0.19724246
INACTIVE_CAT 31055 0.381940 0.003847 0.37440091 0.38947999
1 ACTIVE_CAT 38705 0.435237 0.003391 0.42859149 0.44188345
INSUFFICIENT_CAT 38705 0.1923
INACTIVE_CAT 38705 0.3724
Is it correct that the bolded text (White_nH = 1) would provide the proper distribution among the subsample?
If your domain variable meaning of 1 indicates membership in a category then the 0.435237 should indicate that 43.52 percent of the domain are "active_cat".
You may need to consider whether you are providing all of the appropriate sample information to procedure though. Is your sample stratified, possibly by geographic region? Then you should have a strata statement.
If this data comes from the BRFSS, as seems possible from the category descriptions, you likely need a Cluster statement as the household is the primary sampling unit and is a cluster (selected from adults in the household sound familiar). IF the data is BRFSS you may have a variable _psu for that purpose.
And for data points that may be missing due to skip patterns in the survey you may want the option NOMCAR on the proc statement (not missing completely at random)
... View more