BookmarkSubscribeRSS Feed
buder
Fluorite | Level 6

In working on weighted descriptive statistics, I started with PROC SURVEYFREQ as I am interested in analyzing one variable and there upon a subsample of the population within that variable: more specifically, among those who are physically active, how many are White, Black, Hispanic, and Asian. Putting in a 'where' statement gives a note stating: The input data set is subset by WHERE, OBS, or FIRSTOBS. This provides a completely separate analysis of the subset. It does not provide a statistically valid subpopulation or  omain analysis, where the total number of units in the subpopulation is not known with certainty. If you want a domain analysis, you should include the domain variables in the TABLES request.

 

I changed the code to a PROC SURVEYMEANS and used the "domain statement" instead of the "where" statement. The variables that I looked into generated the means were "active_cat, insufficient_cat, inactive_cat" which were 1 if the main outcome variable and 0 if not. 

 

For example among those who are White non-Hispanic code is provided below: 

 

PROC SURVEYMEANS DATA = FINAL;

         DOMAIN WHITE_MEPS;

         VAR ACTIVE_CAT INSUFFICIENT_CAT INACTIVE_CAT;

         WEIGHT "WEIGHT";

         RUN;

ACTIVE_CAT: 1 is active; 0 is either inactive or insufficient

INSUFFICIENT: 1 is insufficient; 0 is either active or inactive

INACT: 1 is inactive; 0 is either active or insufficient

 

I ran the same code for DOMAIN Black non-Hispanic, Asian non-Hispanic, and Hispanic. 

 

Output is below: 

 

 

  WHITE_NH                                                                        Std Error

                                                        N           Mean              of Mean      95% CL for Mean

        0   ACTIVE_CAT                31055       0.426972       0.003908   0.41931346 0.43463105

            INSUFFICIENT_CAT     31055       0.191087       0.003140   0.18493214 0.19724246

            INACTIVE_CAT             31055       0.381940       0.003847   0.37440091 0.38947999

        1   ACTIVE_CAT                38705       0.435237       0.003391   0.42859149 0.44188345

             INSUFFICIENT_CAT    38705       0.1923

             INACTIVE_CAT             38705       0.3724

 

Is it correct that the bolded text (White_nH = 1) would provide the proper distribution among the subsample? 

 

1 REPLY 1
ballardw
Super User

@buder wrote:

In working on weighted descriptive statistics, I started with PROC SURVEYFREQ as I am interested in analyzing one variable and there upon a subsample of the population within that variable: more specifically, among those who are physically active, how many are White, Black, Hispanic, and Asian. Putting in a 'where' statement gives a note stating: The input data set is subset by WHERE, OBS, or FIRSTOBS. This provides a completely separate analysis of the subset. It does not provide a statistically valid subpopulation or  omain analysis, where the total number of units in the subpopulation is not known with certainty. If you want a domain analysis, you should include the domain variables in the TABLES request.

 

I changed the code to a PROC SURVEYMEANS and used the "domain statement" instead of the "where" statement. The variables that I looked into generated the means were "active_cat, insufficient_cat, inactive_cat" which were 1 if the main outcome variable and 0 if not. 

 

For example among those who are White non-Hispanic code is provided below: 

 

PROC SURVEYMEANS DATA = FINAL;

         DOMAIN WHITE_MEPS;

         VAR ACTIVE_CAT INSUFFICIENT_CAT INACTIVE_CAT;

         WEIGHT "WEIGHT";

         RUN;

ACTIVE_CAT: 1 is active; 0 is either inactive or insufficient

INSUFFICIENT: 1 is insufficient; 0 is either active or inactive

INACT: 1 is inactive; 0 is either active or insufficient

 

I ran the same code for DOMAIN Black non-Hispanic, Asian non-Hispanic, and Hispanic. 

 

Output is below: 

 

 

  WHITE_NH                                                                        Std Error

                                                        N           Mean              of Mean      95% CL for Mean

        0   ACTIVE_CAT                31055       0.426972       0.003908   0.41931346 0.43463105

            INSUFFICIENT_CAT     31055       0.191087       0.003140   0.18493214 0.19724246

            INACTIVE_CAT             31055       0.381940       0.003847   0.37440091 0.38947999

        1   ACTIVE_CAT                38705       0.435237       0.003391   0.42859149 0.44188345

             INSUFFICIENT_CAT    38705       0.1923

             INACTIVE_CAT             38705       0.3724

 

Is it correct that the bolded text (White_nH = 1) would provide the proper distribution among the subsample? 

 


 

If your domain variable meaning of 1 indicates membership in a category then the 0.435237 should indicate that 43.52 percent of the domain are "active_cat".

 

 

You may need to consider whether you are providing all of the appropriate sample information to procedure though. Is your sample stratified, possibly by geographic region? Then you should have a strata statement.

If this data comes from the BRFSS, as seems possible from the category descriptions, you likely need a Cluster statement as the household is the primary sampling unit and is a cluster (selected from adults in the household sound familiar). IF the data is BRFSS you may have a variable _psu for that purpose.

 

And for data points that may be missing due to skip patterns in the survey you may want the option NOMCAR on the proc statement (not missing completely at random)

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1737 views
  • 0 likes
  • 2 in conversation