## Proc Freq comparing multiple ethnicities

Solved
Occasional Contributor
Posts: 7

# Proc Freq comparing multiple ethnicities

Analysis of the data that I am working with requires finding and comparing the prevalence of certain diseases between different ethnic groups. I thought to make it easier I could just create subsets of ethnic data. So for example, all the German's are now in their own data set, the Swedish are assigned their own data set etc.

i.e.

data primhd.german;
set primhd.merged;
if (german=1) then output;
run;

Using proc freq I am able to look at the prevalence of day heart disease for each individual group.

Is there is a way to create a single frequency table (using proc freq) from these multiple data sets of ethnicity for a said condition?

If not, is there a way to structure my original dataset (with all ethnicities) so that I can carry this out?

Accepted Solutions
Solution
‎11-05-2017 05:44 PM
Super User
Posts: 13,889

## Re: Proc Freq comparing multiple ethnicities

From your example code of "german=1" it appears that you have multiple ethnicity variables. You may be better off to have a single ethnicity variable that has values indicating each different variable (unless your subjects are being analyzed as belonging to multiple ethnicities).

Then in proc freq use a tables statement like:

tables ethnicity * disease;

Possibly with  the CHISQ option to see if the distribution differs across ethnicities.

All Replies
Super User
Posts: 23,951

## Re: Proc Freq comparing multiple ethnicities

Is there is a way to create a single frequency table (using proc freq) from these multiple data sets of ethnicity for a said condition?

Not easily.

If not, is there a way to structure my original dataset (with all ethnicities) so that I can carry this out?

Yes, but we would have to understand what your original data structure was in the first place. This would likely be the more efficient solution, rarely is splitting the data set a good idea.

Solution
‎11-05-2017 05:44 PM
Super User
Posts: 13,889

## Re: Proc Freq comparing multiple ethnicities

From your example code of "german=1" it appears that you have multiple ethnicity variables. You may be better off to have a single ethnicity variable that has values indicating each different variable (unless your subjects are being analyzed as belonging to multiple ethnicities).

Then in proc freq use a tables statement like:

tables ethnicity * disease;

Possibly with  the CHISQ option to see if the distribution differs across ethnicities.

Occasional Contributor
Posts: 7

## Re: Proc Freq comparing multiple ethnicities

Below is an example of the step I made prior to subsetting the data. In this data individuals can identify with up to three ethnicities. I am only interested in a few ethnic groups.

data primhd.client2;

set primhd.client1;

german=0;

if ethnicg1='31'

or ethnicg2='31'

or ethnicg3='31' then german=1;

swedish=0;

if ethnicg1='32'

or ethnicg2='32'

or ethnicg3='32' then swedish=1;

norwegian=0;

if ethnicg1='33'

or ethnicg2='33'

or ethnicg3='33' then norwegian =1;

aggother=0;

if ethnicg1='30'

or ethnicg2='30'

or ethnicg3='30'

then aggother =1;

if ethnicg1='34'

or ethnicg2='34'

or ethnicg3='34'

then aggother =1;

if ethnicg1='35'

or ethnicg2='35'

or ethnicg3='35'

then aggother =1;

if ethnicg1='36'

or ethnicg2='36'

or ethnicg3='36'

then aggother =1;

if ethnicg1='37'

or ethnicg2='37'

or ethnicg3='37'

then aggother =1;

run:

I'm not too taken with the resulting output of a bunch of a whole new variables. Other than the boolean expression above what expression could I use to create a single ethnicity variable?

Super User
Posts: 23,951

## Re: Proc Freq comparing multiple ethnicities

How do you want to deal with multiples though?  If someone is both Norwegian and German?  German = whichc('31', of ethnicg1-ethnicg3) > 0; Swedish = whichc('32', of ethnicg1-ethnicg3) > 0; Norwegian = whichc('33', of ethnicg1-ethnicg3) > 0; AggOther = a bunch of WHICHC statements or similar IF statements.    PS. This would have been easier if you had numeric variables.
Occasional Contributor
Posts: 7

## Re: Proc Freq comparing multiple ethnicities

I was wanting to use a total response grouping method, hence a individual who was both norwegian and spanish would be included in both ethnic groups.
Super User
Posts: 23,951

## Re: Proc Freq comparing multiple ethnicities

Did you run the code? It should generate the exact same results except for AggOther, which you could do the same way or this way but still requires multiple values.

``````German = whichc('31', of ethnicg1-ethnicg3) > 0;
Swedish = whichc('32', of ethnicg1-ethnicg3) > 0;
Norwegian = whichc('33', of ethnicg1-ethnicg3) > 0; AggOther = a bunch of WHICHC statements or similar IF statements. ``````
☑ This topic is solved.