Hi I need help with generating contingency tables, and then using ODS to output everything. I have included the question and also what I have so far, Thanks
Problem:
The SAS data set Houston.sas7bdat was obtained from the Behavioral Risk Factor Surveillance System (BRFSS) in 2012. The samples are people ≥ 18 year-old in Houston. This data set contains the following categorical variables:
asthma = (0, 1) = (never had asthma, ever had asthma)
adult = (1, 2) = (adults, elders)
sex = (1, 2) = (male, female)
race = (1, 2, 3, 4) = (White, Black, Hispanic, Others)
bmicat = (1, 2, 3, 4) = (underweight, normal, overweight, obese)
smoking = (1, 2, 3) = (non-smoker, former smoker, current smoker)
insurance = (1, 2) = (have insurance, no insurance)
|
| Never had asthma | Ever had asthma |
|
| ||
|
| N | %† | N | %† | χ2 value | P-value |
Adult | Adult |
|
|
|
|
|
|
| Elder |
|
|
|
|
|
|
Sex | Male |
|
|
|
|
|
|
| Female |
|
|
|
|
|
|
Race | White |
|
|
|
|
|
|
| Black |
|
|
|
|
|
|
| Hispanic |
|
|
|
|
|
|
| Others |
|
|
|
|
|
|
† % = Row percentage
Step 1) Use a PROC FREQ to generate contingency tables with asthma in columns and adult, sex, and race in rows. Each cell only shows counts and row percentages. Have the Chi-square test for each variable, and then use ODS to generate two data sets (named freq and chisq) for cross-tables and statistics separately.
Step 2) Use a data step to clean the freq data set. Remove total frequencies and percentages (Hint: use _TYPE_ variable). Then use the IF-THEN-ELSE statement to output the value of frequency and row percentage into three different sub-datasets for adult, sex, and race, respectively.
Step 3) Use an appropriate procedure to transpose frequency and row percentage by adult/sex/race in those sub-datasets.
Step 4) Use the MERGE statement to combine the same transposed data set twice. One data set only contains frequency and another one only contains row percentage. Rearrange the position of variables as shown in the table.
Step 5) Use a data step to exact the chi-square value and p-value for adult. (Hint: Use the scan function on the variable Table to keep wanted rows)
Step 6) Combine the merged data set and chi-square data set for adult, which looks like:
Step 7) Repeat step 3~6 for sex and race.
Step 😎 Combine three clean data sets.
Step 9) Export the data set to a csv file and directly copy all values from the excel file to the table, like:
|
| Never had asthma | Ever had asthma |
|
| ||
|
| N | %† | N | %† | χ2 value | P-value |
Adult | Adult | 695 | 87.09 | 103 | 12.91 | 0.1339 | 0.7144 |
| Elder | 262 | 87.92 | 36 | 12.08 | . | . |
Sex | Male |
|
|
|
|
|
|
| Female |
|
|
|
|
|
|
Race | White |
|
|
|
|
|
|
| Black |
|
|
|
|
|
|
| Hispanic |
|
|
|
|
|
|
| Others |
|
|
|
|
|
|
THIS IS THE CODE I HAVE SO FAR, BUT GETTING STUCK AT OUTPUTTING THE FIRST STEP:
ODS LISTING;
DATA houston ;
set 'C:\Data\houston.sas7bdat' ;
RUN;
PROC FORMAT ;
value asthmalbl 1 = 'never had asthma' 0 = 'ever had asthma';
value adultlbl 1 = 'adults' 2 = 'elders' ;
value sexlbl 1 = 'male' 2 = 'female' ;
value racelbl 1 = 'White' 2 = 'Black' 3 = 'Hispanic' 4 = 'Others';
value bmicatlbl 1 = 'underweight' 2 = 'normal' 3 = 'overweight' 4 = 'obese' ;
value smokinglbl 1 = 'non-smoker' 2 = 'former smoker' 3 = 'current smoker' ;
value insurancelbl 1 = 'have insurance' 2 = 'no insurance' ;
RUN;
PROC PRINT DATA = houston ;
format asthma asthmalbl. adult adultlbl. sex sexlbl. race racelbl. bmicat bmicatlbl.
smoking smokinglbl. insurance insurancelbl. ;
RUN;
PROC FREQ DATA = houston ;
TABLES race adult sex * Asthma / NOROW NOCOL NOFREQ CHISQ CL;
ODS OUTPUT CrosstabFreqs = houston2
TITLE ;
format asthma asthmalbl. adult adultlbl. sex sexlbl. race racelbl. bmicat bmicatlbl.
smoking smokinglbl. insurance insurancelbl. ;
RUN;
This is a sample of the data in case you do not want to use the attachment
Obs asthma adult sex race bmicat educat smoking insurance
1 1 1 2 1 3 3 3 1
2 0 1 2 1 2 3 3 1
3 0 1 1 3 2 2 2 2
4 0 2 2 4 2 3 1 1
5 0 2 2 3 3 1 3 1
6 0 2 1 1 3 3 3 1
7 0 2 1 1 2 2 2 1
8 0 1 2 2 2 2 3 1
9 0 1 1 1 3 3 3 1
10 0 1 1 1 2 3 1 1
11 0 2 2 1 3 3 2 1
12 0 2 2 1 4 2 2 1
13 0 1 1 1 2 3 3 1
14 0 1 1 1 2 3 2 1
15 0 2 2 1 3 3 3 1
16 0 1 1 1 2 3 1 1
17 0 2 2 1 2 3 3 1
18 0 1 1 1 4 3 2 1
19 0 1 2 2 2 2 3 1
20 0 1 2 1 4 3 3 1
I appreciate any help, thank you!
This sounds so much like a classroom exercise I'm going ask if it is.
The main reason I want to know if it is a school exercise is, having worked with BRFSS data for approaching 20 years, the complex weighting schemes generally mean that procedures that do not account for the sample design generally are not suitable for analysis.
As a minimum you should account for the weights associated. That is what the Weight statement is for.
Easiest is to have an ODS OUTPUT statement for each table that you want to output, if that's the assignment.
This line:
TABLES race adult sex * Asthma / NOROW NOCOL NOFREQ CHISQ CL;
possibly should be
TABLES (race adult sex) * Asthma / NOROW NOCOL NOFREQ CHISQ CL;
if you want to cross the first 3 variables with the asthma variable. Since you have arequirement to include a row percent you might want to consider dropping that NOROW. That generates 3 tables of output. (HINT)
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.