BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
benewaa
Fluorite | Level 6

Hi all,

 

I have a dataset that contains 9 variables for ethnic group, all binary where 1 means respondent belongs to that ethnic group and 0 means respondent doesn't.

so I have ethnic1 ethnic2 to ethnic9, each has value 1 or 0.

I want to create a single ethnic value where each respondent has a value from 1-9 indicating which ethnic group they belong to.

I have done a lot of reading around and tried a lot of things from arrays to catx but I cannot seem to make things work. can anyone help? I am using SAS university edition,

 

this is the last thing I tried (new to sas so I find this so confusing)

 


data Bios.prjtnew;
set Bios.prjtc;
ethnic = catx(ethnic_group___1, best8.),put(ethnic_group___2, best8.),
put(ethnic_group___3, best8.), put(ethnic_group___4, best8.),put(ethnic_group___5, best8.),
put(ethnic_group___6, best8.), put(ethnic_group___7, best8.),put(ethnic_group___8, best8.) and
put (ethnic_group___9, best8.));

run;

 

I am hoping to have a single ethnic variable where the value will be say 6 if the respondent belongs to ethnic group 6

1 ACCEPTED SOLUTION

Accepted Solutions
benewaa
Fluorite | Level 6

Hi Jimbarbour,

 

thanks so much for the fast reply. I do understand your syntax and it makes perfect sense. I am, however, still having challenges- when I run the exact code, I have . for all values of ethnic (ie missing). I am not sure why this is so. checked a couple of times and copied and pasted your codes as well but I still have missing for all of ethnic. Any ideas what I may be doing wrong?

View solution in original post

8 REPLIES 8
jimbarbour
Meteorite | Level 14

This sounds like a good time to use an array.

 

Just quickly on my iPhone:

Data ethnic_designations;

SET Ethnic_Data; DROP ethnic:; DROP _:; ARRAY Ethnicity [*] ethnic1 - ethnic9; DO _I = 1 TO DIM(Ethnicity); IF Ethnicity[_i] THEN Ethnic_Code =_I; END; RUN;

You may need to correct some of my syntax.

 

Jim

jimbarbour
Meteorite | Level 14

OK, I'm home now and can actually run a SAS session.  What I laid out above is essentially accurate, but here it is again, but this time with your variable and dataset names.  I created some test data too, and I can see that it's working.

 

In the code you first posted, you had PUT statements.  The PUT statement is a great command, but it's for putting things into a file, not a SAS data set.  SAS assumes that whatever variables you prepare are meant to be in whatever SAS dataset is designated in the DATA statement at the top of your DATA step, so you don't have to PUT or anything else.  SAS takes care of adding them for you. 

 

I don't want things in my output SAS dataset like _i (which is the index I used to go through the array), so I added a DROP command.  I also don't want Ethnic_Group__1 - 9 because they have been replaced by the single variable, Ethnic.  You'll notice in both cases I used the ":" modifier which acts as a wildcard.  With DROP _:, any variable that begins with an underscore will be dropped.  With DROP Ethnic_Group:, all nine Ethnic_Group variables will be dropped.

 

One caution:  I'm assuming here that only one Ethnic Group per record is a 1.  If multiple 1's are coded for a single record, the way the code is structured now, the last Ethnic Group will be used.   If multiple 1's are coded, you'd have to come up with some kind of tie-breaking logic or come up with more codes, codes that indicate multi-ethnic designations.

 

Here's my revised code, below.  Below the code are the results.

 

DATA	Bios.prjtc;
	INPUT	Ethnic_Group__1 - Ethnic_Group__9;
DATALINES;
0 0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0
0 1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1
;
RUN;

Data Bios.prjtnew;    
    DROP  Ethnic_Group:;
    DROP _:;

	SET Bios.prjtc;

    ARRAY Ethnicity [*] Ethnic_Group__1 - Ethnic_Group__9;

    DO  _I = 1 TO  DIM(Ethnicity);
        IF Ethnicity[_i] THEN
            Ethnic =	_i;
    END;
RUN;

Results:

jimbarbour_0-1601786075802.png

 

Jim

benewaa
Fluorite | Level 6

Hi Jimbarbour,

 

thanks so much for the fast reply. I do understand your syntax and it makes perfect sense. I am, however, still having challenges- when I run the exact code, I have . for all values of ethnic (ie missing). I am not sure why this is so. checked a couple of times and copied and pasted your codes as well but I still have missing for all of ethnic. Any ideas what I may be doing wrong?

benewaa
Fluorite | Level 6

Problem solved!!!

 

I was missing an underscore.

 

I appreciate your assistance, Jimbarbour

jimbarbour
Meteorite | Level 14

Well, I believe it is customary to credit the solution to the person who suggested it, but I'm glad it's working, and I'm glad I was on point with my suggestion.  

 

I should mention one more thing:  You may already know this, but the construct I coded, 

 

IF Ethnicity[_i] THEN
            Ethnic =	_i;

works because Ethnic is numeric.  Normally, one would need an operator (=, >, <, etc.) and a value, but in the case of a numeric, it's handled as a Boolean true/false.  Unlike some languages, it's not just 0 or 1 for Boolean values in SAS.  SAS treats non-missing numbers as true (negative values are treated as true).  Zero and missing are treated as false.  

 

Jim

 

benewaa
Fluorite | Level 6

Hi Jim,

As you may have noticed, I am new to SAS and this community. The ticking of my comment as a solution was made in error and I could not figure out how to correct it.

I really do appreciate the help you offer and I am still learning. Your explanations make it all the more easier. Thanks again.

 

Benewaa

ballardw
Super User

Another approach:

data want;
   set have;
   ARRAY Ethnicity [*] Ethnic_Group__1 - Ethnic_Group__9;
   Ethnic = whichn(1, of Ethnicity(*));
run;

The Whichn, and the corresponding Whichc for character values, returns the position of the first parameter, 1 in this case, of the first occurrence in the list of values following. An array reference like "of arrayname(*)" uses the values of the array in order. If for some reason your Ethnic_group variables were character the only change needed would be to change the N in Whichn to a C.

benewaa
Fluorite | Level 6

Hi Ballardw,

Thanks for the help. I used this code and it worked great. Thanks for explanation the syntax as well, it makes for easy understanding.

 

Benewaa

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 828 views
  • 5 likes
  • 3 in conversation