Please help this SAS newbie!
Analysis of the data that I am working with requires finding and comparing the prevalence of certain diseases between different ethnic groups. I thought to make it easier I could just create subsets of ethnic data. So for example, all the German's are now in their own data set, the Swedish are assigned their own data set etc.
i.e.
data primhd.german;
set primhd.merged;
if (german=1) then output;
run;
Using proc freq I am able to look at the prevalence of day heart disease for each individual group.
Is there is a way to create a single frequency table (using proc freq) from these multiple data sets of ethnicity for a said condition?
If not, is there a way to structure my original dataset (with all ethnicities) so that I can carry this out?
From your example code of "german=1" it appears that you have multiple ethnicity variables. You may be better off to have a single ethnicity variable that has values indicating each different variable (unless your subjects are being analyzed as belonging to multiple ethnicities).
Then in proc freq use a tables statement like:
tables ethnicity * disease;
Possibly with the CHISQ option to see if the distribution differs across ethnicities.
Is there is a way to create a single frequency table (using proc freq) from these multiple data sets of ethnicity for a said condition?
Not easily.
If not, is there a way to structure my original dataset (with all ethnicities) so that I can carry this out?
Yes, but we would have to understand what your original data structure was in the first place. This would likely be the more efficient solution, rarely is splitting the data set a good idea.
From your example code of "german=1" it appears that you have multiple ethnicity variables. You may be better off to have a single ethnicity variable that has values indicating each different variable (unless your subjects are being analyzed as belonging to multiple ethnicities).
Then in proc freq use a tables statement like:
tables ethnicity * disease;
Possibly with the CHISQ option to see if the distribution differs across ethnicities.
Below is an example of the step I made prior to subsetting the data. In this data individuals can identify with up to three ethnicities. I am only interested in a few ethnic groups.
data primhd.client2;
set primhd.client1;
german=0;
if ethnicg1='31'
or ethnicg2='31'
or ethnicg3='31' then german=1;
swedish=0;
if ethnicg1='32'
or ethnicg2='32'
or ethnicg3='32' then swedish=1;
norwegian=0;
if ethnicg1='33'
or ethnicg2='33'
or ethnicg3='33' then norwegian =1;
aggother=0;
if ethnicg1='30'
or ethnicg2='30'
or ethnicg3='30'
then aggother =1;
if ethnicg1='34'
or ethnicg2='34'
or ethnicg3='34'
then aggother =1;
if ethnicg1='35'
or ethnicg2='35'
or ethnicg3='35'
then aggother =1;
if ethnicg1='36'
or ethnicg2='36'
or ethnicg3='36'
then aggother =1;
if ethnicg1='37'
or ethnicg2='37'
or ethnicg3='37'
then aggother =1;
run:
I'm not too taken with the resulting output of a bunch of a whole new variables. Other than the boolean expression above what expression could I use to create a single ethnicity variable?
Did you run the code? It should generate the exact same results except for AggOther, which you could do the same way or this way but still requires multiple values.
German = whichc('31', of ethnicg1-ethnicg3) > 0;
Swedish = whichc('32', of ethnicg1-ethnicg3) > 0;
Norwegian = whichc('33', of ethnicg1-ethnicg3) > 0;
AggOther = a bunch of WHICHC statements or similar IF statements.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.