BookmarkSubscribeRSS Feed
SJ12
Calcite | Level 5

Hello, 

I have a dataset that ressembles the one below (table A): each person (id) could tick to which ethnicity they identified themselves, so they could answer yes to multiple ones (each ethnicity was a variable with the answer being yes, no, or not stated). I am looking to have ONE variable that tells me the ethnicity for each id (if people answered yes to multiple ethnicities, they would be reclassified into ''multiple ethnicities'' and if a value was missing or not stated, it would be reclassified as ''not stated'' in ethnicities or deleted from the dataset. See example below of result wanted (table B). I wanted to use the ARRAY function, but not sure how to do this specifically. Thanks!

 

Table A

ID Ethnicity_2Ethnicity_2Ethnicity_3Ethnicity_4Ethnicity_5
100001
201000
301100
410000
5990011
600001
701010
801000
900100
1000010
      
1= YES      
0= NO     
99= NOT STATED      

 

Table B

ID Ethnicity_1Ethnicity_2Ethnicity_3Ethnicity_4Ethnicity_5Ethnicity_final
100001Ethnicity_5
201000Ethnicity_2
301100Mutiple ethnicities
410000Ethnicity_1
5990011Not stated 
600001Ethnicity_5
701010Mutiple ethnicities
801000Ethnicity_2
900100Ethnicity_3
1000010Ethnicity_4

 

* Ethnicity_final could also be numeric to make it easier:  1-5 and 6 = 'multiple ethnicities' and 7= 'not stated' for example
6 REPLIES 6
Reeza
Super User
data want;
set have;

array _eth (5) ethnicity_1-ethnicity_5;

*check for multiple ethnicities;
if sum(of _eth(*)) >1 then ethnicity_final = 'Multiple Ethnicity';
*check for none specified;
else if sum(of _eth(*))=0 then ethnicity_final = 'Not Stated';

*else only 1 is specified, map to ethnicity value;
else do;
*find which value in array is 1;
index_ethnicity = whichn(1, of _eth(*));
*get variable name of ethnicity;
ethnicity_final = vname(eth(index_ethnicity));
end;

run;

Untested because it's lunch time 🙂

 


@SJ12 wrote:

Hello, 

I have a dataset that ressembles the one below (table A): each person (id) could tick to which ethnicity they identified themselves, so they could answer yes to multiple ones (each ethnicity was a variable with the answer being yes, no, or not stated). I am looking to have ONE variable that tells me the ethnicity for each id (if people answered yes to multiple ethnicities, they would be reclassified into ''multiple ethnicities'' and if a value was missing or not stated, it would be reclassified as ''not stated'' in ethnicities or deleted from the dataset. See example below of result wanted (table B). I wanted to use the ARRAY function, but not sure how to do this specifically. Thanks!

 

Table A

ID  Ethnicity_2 Ethnicity_2 Ethnicity_3 Ethnicity_4 Ethnicity_5
1 0 0 0 0 1
2 0 1 0 0 0
3 0 1 1 0 0
4 1 0 0 0 0
5 99 0 0 1 1
6 0 0 0 0 1
7 0 1 0 1 0
8 0 1 0 0 0
9 0 0 1 0 0
10 0 0 0 1 0
           
1= YES           
0= NO          
99= NOT STATED           

 

Table B

ID  Ethnicity_1 Ethnicity_2 Ethnicity_3 Ethnicity_4 Ethnicity_5 Ethnicity_final
1 0 0 0 0 1 Ethnicity_5
2 0 1 0 0 0 Ethnicity_2
3 0 1 1 0 0 Mutiple ethnicities
4 1 0 0 0 0 Ethnicity_1
5 99 0 0 1 1 Not stated 
6 0 0 0 0 1 Ethnicity_5
7 0 1 0 1 0 Mutiple ethnicities
8 0 1 0 0 0 Ethnicity_2
9 0 0 1 0 0 Ethnicity_3
10 0 0 0 1 0 Ethnicity_4

 

* Ethnicity_final could also be numeric to make it easier:  1-5 and 6 = 'multiple ethnicities' and 7= 'not stated' for example

 

ballardw
Super User

@Reeza wrote:
data want;
set have;

array _eth (5) ethnicity_1-ethnicity_5;

*check for multiple ethnicities;
if sum(of _eth(*)) >1 then ethnicity_final = 'Multiple Ethnicity';
*check for none specified;
else if sum(of _eth(*))=0 then ethnicity_final = 'Not Stated';

*else only 1 is specified, map to ethnicity value;
else do;
*find which value in array is 1;
index_ethnicity = whichn(1, of _eth(*));
*get variable name of ethnicity;
ethnicity_final = vname(eth(index_ethnicity));
end;

run;

Untested because it's lunch time 🙂

Set the 99 to missing before checking any values.

Otherwise this will set "Multiple" when an "unstated" 99 occurs.

if sum(of _eth(*)) >1 then ethnicity_final = 'Multiple Ethnicity';
Reeza
Super User
I think it's more complex than that.

That record has 99 and then two 1's as well for different ethnicities specified, so it is stated, even though not stated was identified. Sadly, this is probably realistic data so I think the OP will have to clarify requirements of how to do deal with the 'not stated' classification.
SJ12
Calcite | Level 5

Thank you! 

PaigeMiller
Diamond | Level 26

Arrays aren't really needed here, as there are many functions that can work across an entire row.

 

proc format;
    value ethf 2='Multiple Ethnicities' 99='Not Stated';
run;
data want;
    set have;
    if sum(of ethnicity:)=1 then ethnicity_final=whichn(1,of ethnicity:);
    else if whichn(99,of ethnicity:) then ethnicity_final=99;
    else if sum(of ethnicity:)>1 then ethnicity_final=2;
    format ethnicity_final ethf.;
run;

 

 

If you really need to have 'Ethnicity 1' instead of plain old '1' in the ethnicity_final variable, that's a little more work. You can use

 

proc format;
    value ethf 1='Ethnicity 1' 2='Ethnicity 2' 3='Ethnicity 3' 4='Ethnicity 4' 5='Ethnicity 5' 
        98='Multiple Ethnicities' 99='Not Stated';
run;
data want;
    set have;
    if sum(of ethnicity:)=1 then ethnicity_final=whichn(1,of ethnicity:);
    else if whichn(99,of ethnicity:) then ethnicity_final=99;
    else if sum(of ethnicity:)>1 then ethnicity_final=98;
    format ethnicity_final ethf.;
run;

 

Now let me throw out an opinion and some advice ... it may be the case that you have greatly simplified the problem so as you can present it to us. That is fine. The problem I have is that when you create text for an output, such as the text 'Ethinicity_1', this is likely difficult for others to understand because they don't know what the 5 ethnicities are by ethnicity number. Instead, provide text that means something, rather than coded numbers which your audience may not understand. If your data is set up such that the ethnicity is Caucausian when ethnicity_1=1 and Hispanic when when ethnicity_2=1 and so on, then the output should not say "Ethnicity_1", it should say "Caucasian", which your audience will understand better. Don't produce cryptic output, produce understandable output.

--
Paige Miller
SJ12
Calcite | Level 5

Sounds good, thanks for your help! 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 770 views
  • 4 likes
  • 4 in conversation