find multiple occurrences of a value in a text string

Solved
Regular Contributor
Posts: 184

find multiple occurrences of a value in a text string

Hi!

I have a race text variable that can have multiple values for race:

RACE

Asian, White

White

Black, White

Asian, Black

How can I use a find all occurrences of say, Asian, and assign a numeric value?  I know it is not using 'contains' but this is an example of what I'm trying to do.  Maybe index?

data want;

set have;

if c then do;
ord=1;
if race contains ('White') then sord=6;
else if race contains ('Black') then sord=4;
else if race contains ('Asian') then sord=3;

end;

run;

Accepted Solutions
Solution
‎02-20-2018 12:15 PM
Super User
Posts: 6,908

Re: find multiple occurrences of a value in a text string

Yes, INDEX is a good tool for the job:

if index(race, 'White') then ...

However, note that you can only have one value for SORD per observation.  So your logic is selecting which RACE value takes priority.

All Replies
Solution
‎02-20-2018 12:15 PM
Super User
Posts: 6,908

Re: find multiple occurrences of a value in a text string

Yes, INDEX is a good tool for the job:

if index(race, 'White') then ...

However, note that you can only have one value for SORD per observation.  So your logic is selecting which RACE value takes priority.

Super User
Posts: 13,898

Re: find multiple occurrences of a value in a text string

[ Edited ]

And perhaps anything with a comma could/should be treated as "more than one race"?

For one project where I have stuff like this I actually create a series of dichotomous variables such as

rw = index(Race,'White')>0;

rb = index(Race,'Black')>0;

ra = index(Race,'Asian')>0;

/* and for those who think Hispanic is a race*/

rh = index(Race,'Hispanic)>0;

Because I have to report on multiracial, those that are only one race or that report combinations.

sums or max of multiple variables are then easy ways to find the specific examples.

HB = sum(rb,rh)=2; for instance creates a dichotomous variable indicating Black Hispanics. Which is much easier once you get used to it than: IF rb and rh then HB=1; else HB=0;

☑ This topic is solved.