BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
plantprion
Fluorite | Level 6

I am trying to make a bit of code that will take a string from my dataset in excel and turn it into a number. However, there are some observations that have multiple groups and my code is only grabbing the first one it sees and I would like it to somehow recognize that there are multiple different groups in the same observations and classify it as other. Here is my example:

Have:

Where I want White to be coded as 1, African American to be coded as 2 etc and if there are two different races then I want 6 for other. 

Here is my code for attempting to do this:

data race;
   input ID $ Race $;
   datalines;
1 White 
2 AfricanAmerican 
3 White/AfricanAmerican
4 Hispanic
5 Hispanic/AfricanAmerican
6 NativeAmerican
7 PacificIslander
8 Other
9 PacificIslander/NativeAmerican
10 Hispanic/White ;

data race;
if find (Race, "White") ge 1 then code = 1;
else if find (Race, "AfricanAmerican") ge 1 then code = 2;
else if find (Race, "Hispanic") ge 1 then code = 3;
else if find (Race, "NativeAmerican") ge 1 then code = 4;
else if find (Race, "PacificIslander") ge 1 then code = 5;
else if find (Race, "Other") ge 1 then code = 6;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

So do the combinations belong in code=6, or do they start a new category such as code=7?  I'll use 7 for the sample code.

 

Starting with the first ELSE statement, turn each into two possibilities.  I'll illustrate just one set of changes for one ELSE statement:

 

else if find (Race, "AfricanAmerican") then do;

   if code=. then code=2;

   else code=7;

end;

 

So each time you find a match, check whether CODE was already assigned by an earlier match.

View solution in original post

6 REPLIES 6
RW9
Diamond | Level 26 RW9
Diamond | Level 26

This works:

data race;
  length race $200;
  input id $ race $;
  datalines;
1 White 
2 AfricanAmerican 
3 White/AfricanAmerican 
;
run;

data race;
  set race;
  if index(race, "White") then code = 1;
  if index(race, "AfricanAmerican") then code=2;
  if index(race,"White") and index(race,"AfricanAmerican") then code=3;
run;

Am using a couple of simplifications, index returns the first position of the word in the string or 0, so if its found then true.  Also I use greatest wins, i.e. White would first become 1, then become 3 later on due to position of the if.  

 

TBH I wouldn't do it this way.  First step would be to split out the race value into component parts, e.g.:

array r{2} $200;
do i=1 to countw(race,"/");
  r{i}=scan(race,i,"/");
end;

Then you would apply a format to the races using a proc format.  Its just easier to standardise that way.

plantprion
Fluorite | Level 6

Thank you for your help, the issue is that I have many more race variables then just the two so I did not want to just break up the two parts for every possible combination of races. For your second solution, I actually have no idea how that works so if you have any reading or can explain what you are doing in that second step that would be very helpful. Thanks again. 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Present some accurate test data then, and show what you want the output to look like.

plantprion
Fluorite | Level 6
I edited my original post to be more clear in what I am working with and what I actually want.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

So:

data race;
  length id race $200;
   input id $ race $;
   datalines;
1 White 
2 AfricanAmerican 
3 White/AfricanAmerican
4 Hispanic
5 Hispanic/AfricanAmerican
6 NativeAmerican
7 PacificIslander
8 Other
9 PacificIslander/NativeAmerican
10 Hispanic/White 
;
run;

data race;
  set race;
   if index(race,"White") then code = 1;
   if index(race,"AfricanAmerican") then code = 2;
   if index(race,"Hispanic") then code = 3;
   if index(race,"NativeAmerican") then code = 4;
   if index(race,"PacificIslander") then code = 5;
   if index(race,"Other") then code = 6;
run;
Astounding
PROC Star

So do the combinations belong in code=6, or do they start a new category such as code=7?  I'll use 7 for the sample code.

 

Starting with the first ELSE statement, turn each into two possibilities.  I'll illustrate just one set of changes for one ELSE statement:

 

else if find (Race, "AfricanAmerican") then do;

   if code=. then code=2;

   else code=7;

end;

 

So each time you find a match, check whether CODE was already assigned by an earlier match.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 989 views
  • 0 likes
  • 3 in conversation