Solved: need to create race 1-5 multiracial where I have 15 race categories fo...

Epi · Posted 06-14-2024 10:29 AM

Edited: added SAS table file as well as the output needed. Thank you to those who have already responded. I have not had a chance to try the code in responses at this time. I'm new to this posting questions. I'm usually able to recycle code from other programs or code I find online.

Hi everyone,

I have a variable where is tells me if the race is just alone or "Two or more races". For the ones where race is 'race, alone', simple coding:

if SINGLE_TOM_RACE NE 'Two or more races' THEN DO;
 if SINGLE_TOM_RACE = 'American Indian/Alaskan Native, Alone' then race1 = 'R1';
	else if SINGLE_TOM_RACE = 'Asian, Alone' then race1 = 'R2';
	else if SINGLE_TOM_RACE = = 'Black/African American, Alone' then race1 = 'R3';
	else if SINGLE_TOM_RACE = 'Native Hawaiian or other Pacific Islander, Alone' then race1 = 'R4';
	else if SINGLE_TOM_RACE = 'White, Alone' then race1 = 'R5';
	else if SINGLE_TOM_RACE = 'Other/Unknown' then race1= 'UNK';
	else race1='';
end;

I'm having brain freeze on when SINGLE_TOM_RACE is 'Two or more races'. I know I need an array. Each observation is unique. Race categories run from Race1a -- Race15a. Data dictionary is in the table below:

column	code	Decedent's race literal	IMPORT RESULT NEEDED
RACE1a	Y, N, U	White	R5
RACE2a	Y, N, U	Black or African American	R3
RACE3a	Y, N, U	American Indian or Alaska Native	R1
RACE4a	Y, N, U	Asian Indian	R2.01
RACE5a	Y, N, U	Chinese	R2.06
RACE6a	Y, N, U	Filipino	R2.08
RACE7a	Y, N, U	Japanese	R2.11
RACE8a	Y, N, U	Korean	R2.12
RACE9a	Y, N, U	Vietnamese	R2.19
RACE10a	Y, N, U	Other Asian	R2
RACE11a	Y, N, U	Native Hawaiian	R4.01.001
RACE12a	Y, N, U	Guamanian or Chamorro	R4.02.001
RACE13a	Y, N, U	Samoan	R4.01.002
RACE14a	Y, N, U	Other Pacific Islander	R4
RACE15a	Y, N, U	Other	UNK

IMPORT FILE HAS RACE VARIABLES RACE1-RACE5 AVAILABLE FOR MULTIRACIAL CASES

It's also in the attached excel file.

During my searches for similar answers, I've only found responses for duplicate observations and not unique/ distinct observations. I know that I need the first - n 'Y' in each row, but I am completely blanking on the way to do that array. From the searches I have found, this code below seems to be on the right track, but I'm not sure.

if SINGLE_TOM_RACE = 'Two or more races' then do;
 array race{15} race1a-race15a;
  do i ????????(first) then first=race(i);
    else if race(i) ^= first then second = race(i);
  end;
  drop i;
run;

The question marks are there to replace the 'missing' from the original code. There are no missing values in the observations.

Any help is greatly appreciated. I'm usually recycling code rather than writing it myself because I know someone else has already written the code or it's similar enough that I can edit it to fit my needs.

Needed Output:

id	race1	race2	race3	race4	race5
434	R5	UNK
600	R5	R2.08
682	R3	R1
786	R5	R1
1036	R5	R3
1632	R5	R3	R2.11
1911	R3	R2
2215	R5	R2.08
2391	R5	R3
3087	R3	R1	UNK
3314	R5	R3
4026	R5	R3	R1	R2	R4

Thanks!

ballardw · Posted 06-14-2024 03:45 PM

Still not a very clear description.

Here is a small example of what I think you are asking for. I am assuming that you are attempting to assign a value to variables Race1 through Race 5 based on which of the 15 variables have a Y in their column.

This places the text of the desired result for each of the 15 (my example uses 3 because I'm lazy) variables in a temporary array. Temporary means the variables created do not get written to the data set. This is one way to "look up" values based on position. X are the input variables with the Y N or whatever values.

This counts how many Y are encountered and as long as there are 3 (too lazy for 5) or fewer it updates the Race1 to 3.

Note that if you don't want to deal with multiple subcodes such as the Race4A through Race9A you could enter the R2 instead of the R2.01 R2.06 etc. in the temporary array.

data example;
   input a $ b $ C $;
   array x (*) a b c;
   array t (3) $ 10 _temporary_('R1','R2','R2.3');
   array r (3) $ 10 Race1-Race3;
   Ycount=0;
   do i=1 to dim(x);
      if x[i]='Y' then do;
         ycount=ycount+1;
         IF ycount le dim(r) then r[ycount] = t[i];
      end;
   end;
   drop i ycount;
datalines;
Y N Y
N Y Y
Y Y Y
Y Y N
Y N N
N Y N
N N Y
;

Of note: you shouldn't need separate coding for the two or more as a single value gets set as well.

View solution in original post

ballardw · Posted 06-14-2024 11:13 AM

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software or organization policies. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

Only include the variable(s) needed for exercise and just enough rows to illustrate the different cases.

I am not sure I can tell what you expect as a result. You don't show any actual input values and the expected output for those.

Epi · Posted 06-14-2024 01:38 PM

Thanks for letting me know about those issues. This is my first post. I've added the SAS table as well as the needed output.

Best!

Epi · Posted 06-14-2024 02:39 PM

Sorry, I think this is what you had in mind than the sas table:

data work.race (Label= 'Multitace Data');
infile datalines dsd dlm='|' truncover;
input id:$12. RACE1a:$1. RACE2a:$1. RACE3a:$1. RACE4a:$1. RACE5a:$1. RACE6a:$1. RACE7a:$1. RACE8a:$1. RACE9a:$1.
RACE10a:$1. RACE11a:$1. RACE12a:$1. RACE13a:$1. RACE14a:$1. RACE15a:$1. SINGLE_TOM_RACE:$17.;

datalines;
434|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Y|Two or more races
600|Y|N|N|N|N|Y|N|N|N|N|N|N|N|N|N|Two or more races
682|N|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
786|Y|N|Y|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
1036|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
1632|Y|Y|N|N|N|N|Y|N|N|N|N|N|N|N|Y|Two or more races
1911|N|Y|N|N|N|N|N|N|N|Y|N|N|N|N|N|Two or more races
2215|Y|N|N|N|N|Y|N|N|N|N|N|N|N|N|N|Two or more races
2391|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
3087|N|Y|Y|N|N|N|N|N|N|N|N|N|N|N|Y|Two or more races
3314|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
4026|Y|Y|Y|N|N|N|N|N|N|Y|N|N|N|Y|N|Two or more races
;
run;

proc print data=work.race;
run;
quit;

Thanks again!

sbxkoenk · Posted 06-14-2024 11:59 AM

You want all 105 pairwise combinations?

PROC IML;
n = 15; 
k = 2; 
c = allcomb(n, k);
call sort(c, 1:k);
print c;
r = nrow(c);
print r;
QUIT;

Koen

Epi · Posted 06-14-2024 01:45 PM

Thanks for answering. I'm just looking for the 'Y'es response is in the race1a-race15a race columns in order to have the needed output below. The SAS table has been uploaded into the original post.

id	race1	race2	race3	race4	race5
434	R5	UNK
600	R5	R2.08
682	R3	R1
786	R5	R1
1036	R5	R3
1632	R5	R3	R2.11
1911	R3	R2
2215	R5	R2.08
2391	R5	R3
3087	R3	R1	UNK
3314	R5	R3
4026	R5	R3	R1	R2	R4

ballardw · Posted 06-14-2024 03:25 PM

What if more than 5 or the race values are Y?

Epi · Posted 06-14-2024 04:03 PM

I'm not sure. I've never run into that issue before. The majority of the individuals in my dataset are single race. The multiracial individuals have the Y or yes option in two or more of the columns labeled race1a-race15a. I've never seen more than 5 races being determined for an individual. The database I'm preparing the data for only has options to import 5 race classifications. For example, my import uses R4 for those of Hawaiian/ Pacific Islander decent. I can further differentiate based on columns Race11a-Race14a. If the person is Native Hawaiian, a Yes response in Column race11a, R4.01.001 can be used. As I'm typing this, it might be easier not to get into specifics for race where R2 (Asian decent) and R4 are concerned in my data.

Sorry for rambling, it's the end of my work day. I hope that answered the question.

Have a great weekend!

ballardw · Posted 06-14-2024 03:45 PM

Still not a very clear description.

Here is a small example of what I think you are asking for. I am assuming that you are attempting to assign a value to variables Race1 through Race 5 based on which of the 15 variables have a Y in their column.

This places the text of the desired result for each of the 15 (my example uses 3 because I'm lazy) variables in a temporary array. Temporary means the variables created do not get written to the data set. This is one way to "look up" values based on position. X are the input variables with the Y N or whatever values.

This counts how many Y are encountered and as long as there are 3 (too lazy for 5) or fewer it updates the Race1 to 3.

Note that if you don't want to deal with multiple subcodes such as the Race4A through Race9A you could enter the R2 instead of the R2.01 R2.06 etc. in the temporary array.

data example;
   input a $ b $ C $;
   array x (*) a b c;
   array t (3) $ 10 _temporary_('R1','R2','R2.3');
   array r (3) $ 10 Race1-Race3;
   Ycount=0;
   do i=1 to dim(x);
      if x[i]='Y' then do;
         ycount=ycount+1;
         IF ycount le dim(r) then r[ycount] = t[i];
      end;
   end;
   drop i ycount;
datalines;
Y N Y
N Y Y
Y Y Y
Y Y N
Y N N
N Y N
N N Y
;

Of note: you shouldn't need separate coding for the two or more as a single value gets set as well.

Epi · Posted 06-14-2024 04:10 PM

Thank you! That is exactly what I am looking to do. I'm glad you were able to figure out what i was trying to describe. I'll try the code when I next at work. Have a great weekend!

Epi · Posted 06-14-2024 02:35 PM

I think this might be what was asked for. Sorry for all the updates.

data work.race (Label= 'Multitrace Data');
infile datalines dsd dlm='|' truncover;
input id:$12. RACE1a:$1. RACE2a:$1. RACE3a:$1. RACE4a:$1. RACE5a:$1. RACE6a:$1. RACE7a:$1. RACE8a:$1. RACE9a:$1. RACE10a:$1. RACE11a:$1. RACE12a:$1. RACE13a:$1. RACE14a:$1. RACE15a:$1. SINGLE_TOM_RACE:$17.;

datalines;
434|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Y|Two or more races
600|Y|N|N|N|N|Y|N|N|N|N|N|N|N|N|N|Two or more races
682|N|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
786|Y|N|Y|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
1036|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
1632|Y|Y|N|N|N|N|Y|N|N|N|N|N|N|N|Y|Two or more races
1911|N|Y|N|N|N|N|N|N|N|Y|N|N|N|N|N|Two or more races
2215|Y|N|N|N|N|Y|N|N|N|N|N|N|N|N|N|Two or more races
2391|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
3087|N|Y|Y|N|N|N|N|N|N|N|N|N|N|N|Y|Two or more races
3314|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
4026|Y|Y|Y|N|N|N|N|N|N|Y|N|N|N|Y|N|Two or more races
;
run;

proc print data=work.race;
run;
quit;

Tom · Posted 06-14-2024 03:17 PM

You can use COUNTC() to find how many Y's there are. Then use SELECT to decide what action to take.

First let's simplify your example. But also provide more different types of cases. To make my life easier I remove the spurious letter A from the end of the variable names so that I an reference them with a variable list.

data RACE (Label= 'Multitrace Data');
  infile datalines dsd dlm='|' truncover;
  input id :$12. (RACE1-RACE4) (:$1.) Expect :$17.;
datalines;
434|Y|N|N|N|RACE1
600|Y|N|N|Y|Two or more races
682|N|N|N|N|None specified
786|N|Y|N|N|RACE2
;

Now let's try processing it.

data want;
  set race ;
  array races race1-race4;
  length race_single $17;
  select (countc(cats(of races[*]),'Y'));
    when(0) race_single='None specified';
    when(1) race_single=vname(races[whichc('Y',of races[*])]);
    otherwise race_single='Two or more races';
  end;
run;

Result:

If your variables have Labels attached to them you could use VLABEL() instead of VNAME() function and get a result more like:

need to create race 1-5 multiracial where I have 15 race categories for each unique observations

Re: need to create race 1-5 multiracial where I have 15 race categories for each unique observations

Re: need to create race 1-5 multiracial where I have 15 race categories for each unique observations

Re: need to create race 1-5 multiracial where I have 15 race categories for each unique observations

Re: need to create race 1-5 multiracial where I have 15 race categories for each unique observations

Re: need to create race 1-5 multiracial where I have 15 race categories for each unique observations

Re: need to create race 1-5 multiracial where I have 15 race categories for each unique observations

Re: need to create race 1-5 multiracial where I have 15 race categories for each unique observations

Re: need to create race 1-5 multiracial where I have 15 race categories for each unique observations

Re: need to create race 1-5 multiracial where I have 15 race categories for each unique observations

Re: need to create race 1-5 multiracial where I have 15 race categories for each unique observations

Re: need to create race 1-5 multiracial where I have 15 race categories for each unique observations

Re: need to create race 1-5 multiracial where I have 15 race categories for each unique observations

SAS Innovate 2025: Call for Content

Classroom Training Available!