DATA Step, Macro, Functions and more

Contingency tables and output

Reply
Occasional Learner
Posts: 1

Contingency tables and output

Hi I need help with generating contingency tables, and then using ODS to output everything. I have included the question and also what I have so far, Thanks

 

 

Problem:

The SAS data set Houston.sas7bdat was obtained from the Behavioral Risk Factor Surveillance System (BRFSS) in 2012. The samples are people ≥ 18 year-old in Houston. This data set contains the following categorical variables:

asthma = (0, 1) = (never had asthma, ever had asthma)
adult = (1, 2) = (adults, elders)
sex = (1, 2) = (male, female)
race = (1, 2, 3, 4) = (White, Black, Hispanic, Others)
bmicat = (1, 2, 3, 4) = (underweight, normal, overweight, obese)
smoking = (1, 2, 3) = (non-smoker, former smoker, current smoker)
insurance = (1, 2) = (have insurance, no insurance)

  1. Need to generate a data set which can exactly fill out the table below.

 

 

Never had asthma

Ever had asthma

 

 

 

 

N

%†

N

%†

χ2 value

P-value

Adult

Adult

 

 

 

 

 

 

 

Elder

 

 

 

 

 

 

Sex

Male

 

 

 

 

 

 

 

Female

 

 

 

 

 

 

Race

White

 

 

 

 

 

 

 

Black

 

 

 

 

 

 

 

Hispanic

 

 

 

 

 

 

 

Others

 

 

 

 

 

 

† % = Row percentage

            Step 1) Use a PROC FREQ to generate contingency tables with asthma in columns and adult, sex, and race in rows. Each cell only shows counts and row percentages. Have the Chi-square test for each variable, and then use ODS to generate two data sets (named freq and chisq) for cross-tables and statistics separately.

            Step 2) Use a data step to clean the freq data set. Remove total frequencies and percentages (Hint: use _TYPE_ variable). Then use the IF-THEN-ELSE statement to output the value of frequency and row percentage into three different sub-datasets for adult, sex, and race, respectively.

Step 3) Use an appropriate procedure to transpose frequency and row percentage by adult/sex/race in those sub-datasets.

            Step 4) Use the MERGE statement to combine the same transposed data set twice. One data set only contains frequency and another one only contains row percentage. Rearrange the position of variables as shown in the table.

            Step 5) Use a data step to exact the chi-square value and p-value for adult. (Hint: Use the scan function on the variable Table to keep wanted rows)

            Step 6) Combine the merged data set and chi-square data set for adult, which looks like:

            Step 7) Repeat step 3~6 for sex and race.

            Step 8) Combine three clean data sets.

            Step 9) Export the data set to a csv file and directly copy all values from the excel file to the table, like:

 

 

Never had asthma

Ever had asthma

 

 

 

 

N

%†

N

%†

χ2 value

P-value

Adult

Adult

695

87.09

103

12.91

0.1339

0.7144

 

Elder

262

87.92

36

12.08

.

.

Sex

Male

 

 

 

 

 

 

 

Female

 

 

 

 

 

 

Race

White

 

 

 

 

 

 

 

Black

 

 

 

 

 

 

 

Hispanic

 

 

 

 

 

 

 

Others

 

 

 

 

 

 

 

  1. b)  Use PROC FORMAT to create customized formats for variables asthma, bmicat, smoking, and insurance according to the description on page 3. Draw frequency plots for bmicat, smoking and insurance with grouped bars by asthma

 

 

 

 

THIS IS THE CODE I HAVE SO FAR, BUT GETTING STUCK AT OUTPUTTING THE FIRST STEP:

 

ODS LISTING;
DATA houston ;
set 'C:\Data\houston.sas7bdat' ;
RUN;

PROC FORMAT ;
value asthmalbl 1 = 'never had asthma' 0 = 'ever had asthma';
value adultlbl 1 = 'adults' 2 = 'elders' ;
value sexlbl 1 = 'male' 2 = 'female' ;
value racelbl 1 = 'White' 2 = 'Black' 3 = 'Hispanic' 4 = 'Others';
value bmicatlbl 1 = 'underweight' 2 = 'normal' 3 = 'overweight' 4 = 'obese' ;
value smokinglbl 1 = 'non-smoker' 2 = 'former smoker' 3 = 'current smoker' ;
value insurancelbl 1 = 'have insurance' 2 = 'no insurance' ;
RUN;
PROC PRINT DATA = houston ;
format asthma asthmalbl. adult adultlbl. sex sexlbl. race racelbl. bmicat bmicatlbl.
smoking smokinglbl. insurance insurancelbl. ;
RUN;

PROC FREQ DATA = houston ;
TABLES race adult sex * Asthma / NOROW NOCOL NOFREQ CHISQ CL;
ODS OUTPUT CrosstabFreqs = houston2
TITLE ;
format asthma asthmalbl. adult adultlbl. sex sexlbl. race racelbl. bmicat bmicatlbl.
smoking smokinglbl. insurance insurancelbl. ;
RUN;

 

 

This is a sample of the data in case you do not want to use the attachment

 

Obs asthma adult sex race bmicat educat smoking insurance

1 1 1 2 1 3 3 3 1
2 0 1 2 1 2 3 3 1
3 0 1 1 3 2 2 2 2
4 0 2 2 4 2 3 1 1
5 0 2 2 3 3 1 3 1
6 0 2 1 1 3 3 3 1
7 0 2 1 1 2 2 2 1
8 0 1 2 2 2 2 3 1
9 0 1 1 1 3 3 3 1
10 0 1 1 1 2 3 1 1
11 0 2 2 1 3 3 2 1
12 0 2 2 1 4 2 2 1
13 0 1 1 1 2 3 3 1
14 0 1 1 1 2 3 2 1
15 0 2 2 1 3 3 3 1
16 0 1 1 1 2 3 1 1
17 0 2 2 1 2 3 3 1
18 0 1 1 1 4 3 2 1
19 0 1 2 2 2 2 3 1
20 0 1 2 1 4 3 3 1

 

I appreciate any help, thank you!

Attachment
Super User
Posts: 10,469

Re: Contingency tables and output

This sounds so much like a classroom exercise I'm going ask if it is.

 

The main reason I want to know if it is a school exercise is, having worked with BRFSS data for approaching 20 years, the complex weighting schemes generally mean that procedures that do not account for the sample design generally are not suitable for analysis.

 

As a minimum you should account for the weights associated. That is what the Weight statement is for.

 

Easiest is to have an ODS OUTPUT statement for each table that you want to output, if that's the assignment.

This line:

TABLES race adult sex * Asthma / NOROW NOCOL NOFREQ CHISQ CL;

 

possibly should be

TABLES (race adult sex) * Asthma / NOROW NOCOL NOFREQ CHISQ CL;

if you want to cross the first 3 variables with the asthma variable. Since you have arequirement to include a row percent you might want to consider dropping that NOROW. That generates 3 tables of output. (HINT)

 

Ask a Question
Discussion stats
  • 1 reply
  • 212 views
  • 1 like
  • 2 in conversation