Solved: Proc Freq comparing multiple ethnicities

senac255 · Posted 10-30-2017 05:25 PM

Please help this SAS newbie!

Analysis of the data that I am working with requires finding and comparing the prevalence of certain diseases between different ethnic groups. I thought to make it easier I could just create subsets of ethnic data. So for example, all the German's are now in their own data set, the Swedish are assigned their own data set etc.

i.e.

data primhd.german;
set primhd.merged;
if (german=1) then output;
run;

Using proc freq I am able to look at the prevalence of day heart disease for each individual group.

Is there is a way to create a single frequency table (using proc freq) from these multiple data sets of ethnicity for a said condition?

If not, is there a way to structure my original dataset (with all ethnicities) so that I can carry this out?

ballardw · Posted 10-31-2017 11:36 AM

From your example code of "german=1" it appears that you have multiple ethnicity variables. You may be better off to have a single ethnicity variable that has values indicating each different variable (unless your subjects are being analyzed as belonging to multiple ethnicities).

Then in proc freq use a tables statement like:

tables ethnicity * disease;

Possibly with the CHISQ option to see if the distribution differs across ethnicities.

View solution in original post

Reeza · Posted 10-30-2017 06:50 PM

Is there is a way to create a single frequency table (using proc freq) from these multiple data sets of ethnicity for a said condition?

Not easily.

If not, is there a way to structure my original dataset (with all ethnicities) so that I can carry this out?

Yes, but we would have to understand what your original data structure was in the first place. This would likely be the more efficient solution, rarely is splitting the data set a good idea.

ballardw · Posted 10-31-2017 11:36 AM

From your example code of "german=1" it appears that you have multiple ethnicity variables. You may be better off to have a single ethnicity variable that has values indicating each different variable (unless your subjects are being analyzed as belonging to multiple ethnicities).

Then in proc freq use a tables statement like:

tables ethnicity * disease;

Possibly with the CHISQ option to see if the distribution differs across ethnicities.

senac255 · Posted 10-31-2017 06:57 PM

Below is an example of the step I made prior to subsetting the data. In this data individuals can identify with up to three ethnicities. I am only interested in a few ethnic groups.

data primhd.client2;

set primhd.client1;

german=0;

if ethnicg1='31'

or ethnicg2='31'

or ethnicg3='31' then german=1;

swedish=0;

if ethnicg1='32'

or ethnicg2='32'

or ethnicg3='32' then swedish=1;

norwegian=0;

if ethnicg1='33'

or ethnicg2='33'

or ethnicg3='33' then norwegian =1;

aggother=0;

if ethnicg1='30'

or ethnicg2='30'

or ethnicg3='30'

then aggother =1;

if ethnicg1='34'

or ethnicg2='34'

or ethnicg3='34'

then aggother =1;

if ethnicg1='35'

or ethnicg2='35'

or ethnicg3='35'

then aggother =1;

if ethnicg1='36'

or ethnicg2='36'

or ethnicg3='36'

then aggother =1;

if ethnicg1='37'

or ethnicg2='37'

or ethnicg3='37'

then aggother =1;

run:

I'm not too taken with the resulting output of a bunch of a whole new variables. Other than the boolean expression above what expression could I use to create a single ethnicity variable?

Reeza · Posted 10-31-2017 09:05 PM

How do you want to deal with multiples though? If someone is both Norwegian and German? German = whichc('31', of ethnicg1-ethnicg3) > 0; Swedish = whichc('32', of ethnicg1-ethnicg3) > 0; Norwegian = whichc('33', of ethnicg1-ethnicg3) > 0; AggOther = a bunch of WHICHC statements or similar IF statements. PS. This would have been easier if you had numeric variables.

senac255 · Posted 11-05-2017 03:13 PM

I was wanting to use a total response grouping method, hence a individual who was both norwegian and spanish would be included in both ethnic groups.

Reeza · Posted 11-05-2017 03:19 PM

Did you run the code? It should generate the exact same results except for AggOther, which you could do the same way or this way but still requires multiple values.

German = whichc('31', of ethnicg1-ethnicg3) > 0; 
Swedish = whichc('32', of ethnicg1-ethnicg3) > 0; 
Norwegian = whichc('33', of ethnicg1-ethnicg3) > 0; 
AggOther = a bunch of WHICHC statements or similar IF statements.

Proc Freq comparing multiple ethnicities

Re: Proc Freq comparing multiple ethnicities

Re: Proc Freq comparing multiple ethnicities

Re: Proc Freq comparing multiple ethnicities

Re: Proc Freq comparing multiple ethnicities

Re: Proc Freq comparing multiple ethnicities

Re: Proc Freq comparing multiple ethnicities

Re: Proc Freq comparing multiple ethnicities

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!