BookmarkSubscribeRSS Feed
_maldini_
Barite | Level 11

How should I create subgroups in PROC SURVEYFREQ if I can't use WHERE or DOMAIN?

 

I'm working w/ complex survey data (i.e., NHANES) and following these instructions (Survey Data Analysis Made Easy With SAS). According to this guidance, the DOMAIN statement should be used in survey procedures instead of the WHERE or BY statements to provide analyses of subgroups. However, the DOMAIN statement is not available with the SURVEYFREQ procedure. 

 

Normally I would use WHERE statements to create subgroups of respondents by excluding respondents who selected "Don't know" or "Refused" (see below).

 

How should I create subgroups in PROC SURVEYFREQ if I can't use WHERE or DOMAIN?

 

proc freq data=&dataset;
	
	/* DOMAIN: used instead of the WHERE or BY statement to provide analyses of subgroups.  */
	/* The DOMAIN statement is not available with the SURVEYFREQ procedure.  */
 	
 	where ever_told_mi not in(7,9);
/*  7 = "Refused", 9 = "Don't know" */
	where same and educ_gtet_20 not in(7,9);
	where same and hh_income not in(77,99);
/* 	77 = "Refused", 99 = "Don't know" */
	where same and mar_status not in(77,99);
 	
 	TABLE
/* 	NHANES variables */
		(
		age_yrs 
		gender
		race_all
		mar_status
		hh_income
		educ_gtet_20
		smk_status
		
		)*ever_told_mi / EXPECTED CHISQ;
 		
 	FORMAT 
/* 		NHANES variables */
 		ever_told_mi 		yes_no_fmt.
 		cann_use_status		cann_use_statusfmt.
		gender 				genderfmt.
		age_yrs 			age_yrsfmt.
		race_all 			race_allfmt.
		mar_status			mar_status2fmt.
		hh_income			hh_income2fmt.
		educ_gtet_20		educ_adults_20_2fmt.
		smk_status			smk_statusfmt.
		;
run;

My goal in using PROC SURVEYFREQ is to generate a chi-square analysis to determine if these variables are significantly associated w/ the outcome (i.e., ever being told they had an MI, ever_told_mi).

 

When I can't create the subgroups this way, I get rows that I want to exclude (i.e., the subpopulation is not created).

Screen Shot 2022-06-08 at 10.11.11 AM.png

I guess there is also a broader question about how to include/exclude observations with certain values (e.g.,  7 = "Refused", 9 = "Don't know"). Please advise if there are other ways of handling this situation w/o deleting them.

 

As always, thanks for any guidance here.

3 REPLIES 3
ballardw
Super User

Variables that have values like your 7, 77, 9 and 99 that I am guessing relate to "Don't know" and "Refused" should be recoded into different variables so that the value is missing. Then the value is excluded by default from the summary and does not lose entire records as Where would.

 

If by "domain" you mean another grouping variable such as demographics like gender, race, ethnicity, then you apparently doing it correctly on the table statement.

For some uses you may want to NEST things such as

 

tables gender*race_all* (other variables) to get gender and race sub-groups.

 

You may need to play around with the order of the nesting and probably will want to use ODS OUTPUT to create data sets for manipulation as the output for such nestings is not conducive to easy reporting.

Typically my reports out of the survey procedures look like 1) ods output set(s) 2) reshape and possibly select/combine/reformat variables 3) use a reporting procedure like Tabulate, Report or Print to display the desired results in a "nice" appearance.

 

SAS_Rob
SAS Employee

To do a sub-group or domain analysis in SURVEYFREQ you should place the domain variable itself on the TABLES statement.  There are specific details related to this in the SURVEYFREQ documentation.

SAS Help Center: Domain Analysis

_maldini_
Barite | Level 11

@SAS_Rob Thanks for the response. 

2 clarifying questions:

<place the domain variable itself on the TABLES statement>

1. Could you please clarify? 

For example, using PROC SURVEYLOGISTIC I would add the following (see below): domain flag_2;

proc surveylogistic data=&dataset nomcar;
 	stratum sdmvstra;
 	cluster sdmvpsu;
 	weight &weight;
 	/* DOMAIN: used instead of the WHERE or BY statement to provide analyses of subgroups.  */
 	domain flag_2;
 	class &exp (REF="Never")/param=ref; 
 	model ever_told_mi (descending) = &exp;
 	format ever_told_mi yes_no_fmt. &exp cann_use_statusfmt.;
run; 

Are you saying that for PROC SURVEYFREQ you just include the variable w/o the DOMAIN statement, like this? 

proc surveyfreq data=&dataset nomcar;
 	STRATUM sdmvstra;
 	CLUSTER sdmvpsu;
 	WEIGHT &weight;
 	table flag_2*&exp  /cl col chisq;
	format &exp cann_use_statusfmt.;
run; 

2. Are you saying that the DOMAIN statement IS available with the SURVEYFREQ procedure, or are you just saying you can mimic its functionality w/o specifying the actual statement by adding the DOMAIN variable to the TABLES statement? 

Thanks!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1795 views
  • 2 likes
  • 3 in conversation