BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
senac255
Fluorite | Level 6

Please help this SAS newbie!

 

Analysis of the data that I am working with requires finding and comparing the prevalence of certain diseases between different ethnic groups. I thought to make it easier I could just create subsets of ethnic data. So for example, all the German's are now in their own data set, the Swedish are assigned their own data set etc.

 

i.e.

data primhd.german;
set primhd.merged;
if (german=1) then output;
run;

 

Using proc freq I am able to look at the prevalence of day heart disease for each individual group.

Is there is a way to create a single frequency table (using proc freq) from these multiple data sets of ethnicity for a said condition?

If not, is there a way to structure my original dataset (with all ethnicities) so that I can carry this out?

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

 

 

From your example code of "german=1" it appears that you have multiple ethnicity variables. You may be better off to have a single ethnicity variable that has values indicating each different variable (unless your subjects are being analyzed as belonging to multiple ethnicities).

Then in proc freq use a tables statement like:

tables ethnicity * disease;

Possibly with  the CHISQ option to see if the distribution differs across ethnicities.

View solution in original post

6 REPLIES 6
Reeza
Super User

Is there is a way to create a single frequency table (using proc freq) from these multiple data sets of ethnicity for a said condition?

Not easily.

 

If not, is there a way to structure my original dataset (with all ethnicities) so that I can carry this out?

Yes, but we would have to understand what your original data structure was in the first place. This would likely be the more efficient solution, rarely is splitting the data set a good idea. 

 

 

ballardw
Super User

 

 

From your example code of "german=1" it appears that you have multiple ethnicity variables. You may be better off to have a single ethnicity variable that has values indicating each different variable (unless your subjects are being analyzed as belonging to multiple ethnicities).

Then in proc freq use a tables statement like:

tables ethnicity * disease;

Possibly with  the CHISQ option to see if the distribution differs across ethnicities.

senac255
Fluorite | Level 6

Below is an example of the step I made prior to subsetting the data. In this data individuals can identify with up to three ethnicities. I am only interested in a few ethnic groups.

 

 

data primhd.client2;

set primhd.client1;

german=0;

if ethnicg1='31'

or ethnicg2='31'

or ethnicg3='31' then german=1;

swedish=0;

if ethnicg1='32'

or ethnicg2='32'

or ethnicg3='32' then swedish=1;

norwegian=0;

if ethnicg1='33'

or ethnicg2='33'

or ethnicg3='33' then norwegian =1;

aggother=0;

if ethnicg1='30'

or ethnicg2='30'

or ethnicg3='30'

then aggother =1;

if ethnicg1='34'

or ethnicg2='34'

or ethnicg3='34'

then aggother =1;

if ethnicg1='35'

or ethnicg2='35'

or ethnicg3='35'

then aggother =1;

if ethnicg1='36'

or ethnicg2='36'

or ethnicg3='36'

then aggother =1;

if ethnicg1='37'

or ethnicg2='37'

or ethnicg3='37'

then aggother =1;

run:

 

I'm not too taken with the resulting output of a bunch of a whole new variables. Other than the boolean expression above what expression could I use to create a single ethnicity variable? 

Reeza
Super User
How do you want to deal with multiples though?  If someone is both Norwegian and German?  German = whichc('31', of ethnicg1-ethnicg3) > 0; Swedish = whichc('32', of ethnicg1-ethnicg3) > 0; Norwegian = whichc('33', of ethnicg1-ethnicg3) > 0; AggOther = a bunch of WHICHC statements or similar IF statements.    PS. This would have been easier if you had numeric variables. 
senac255
Fluorite | Level 6
I was wanting to use a total response grouping method, hence a individual who was both norwegian and spanish would be included in both ethnic groups.
Reeza
Super User

Did you run the code? It should generate the exact same results except for AggOther, which you could do the same way or this way but still requires multiple values. 

 

German = whichc('31', of ethnicg1-ethnicg3) > 0; 
Swedish = whichc('32', of ethnicg1-ethnicg3) > 0; 
Norwegian = whichc('33', of ethnicg1-ethnicg3) > 0; 
AggOther = a bunch of WHICHC statements or similar IF statements.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 650 views
  • 4 likes
  • 3 in conversation