Edited: added SAS table file as well as the output needed. Thank you to those who have already responded. I have not had a chance to try the code in responses at this time. I'm new to this posting questions. I'm usually able to recycle code from other programs or code I find online.
Hi everyone,
I have a variable where is tells me if the race is just alone or "Two or more races". For the ones where race is 'race, alone', simple coding:
if SINGLE_TOM_RACE NE 'Two or more races' THEN DO;
if SINGLE_TOM_RACE = 'American Indian/Alaskan Native, Alone' then race1 = 'R1';
else if SINGLE_TOM_RACE = 'Asian, Alone' then race1 = 'R2';
else if SINGLE_TOM_RACE = = 'Black/African American, Alone' then race1 = 'R3';
else if SINGLE_TOM_RACE = 'Native Hawaiian or other Pacific Islander, Alone' then race1 = 'R4';
else if SINGLE_TOM_RACE = 'White, Alone' then race1 = 'R5';
else if SINGLE_TOM_RACE = 'Other/Unknown' then race1= 'UNK';
else race1='';
end;
I'm having brain freeze on when SINGLE_TOM_RACE is 'Two or more races'. I know I need an array. Each observation is unique. Race categories run from Race1a -- Race15a. Data dictionary is in the table below:
column | code | Decedent's race literal | IMPORT RESULT NEEDED | |
RACE1a | Y, N, U | White | R5 | |
RACE2a | Y, N, U | Black or African American | R3 | |
RACE3a | Y, N, U | American Indian or Alaska Native | R1 | |
RACE4a | Y, N, U | Asian Indian | R2.01 | |
RACE5a | Y, N, U | Chinese | R2.06 | |
RACE6a | Y, N, U | Filipino | R2.08 | |
RACE7a | Y, N, U | Japanese | R2.11 | |
RACE8a | Y, N, U | Korean | R2.12 | |
RACE9a | Y, N, U | Vietnamese | R2.19 | |
RACE10a | Y, N, U | Other Asian | R2 | |
RACE11a | Y, N, U | Native Hawaiian | R4.01.001 | |
RACE12a | Y, N, U | Guamanian or Chamorro | R4.02.001 | |
RACE13a | Y, N, U | Samoan | R4.01.002 | |
RACE14a | Y, N, U | Other Pacific Islander | R4 | |
RACE15a | Y, N, U | Other | UNK | |
IMPORT FILE HAS RACE VARIABLES RACE1-RACE5 AVAILABLE FOR MULTIRACIAL CASES |
It's also in the attached excel file.
During my searches for similar answers, I've only found responses for duplicate observations and not unique/ distinct observations. I know that I need the first - n 'Y' in each row, but I am completely blanking on the way to do that array. From the searches I have found, this code below seems to be on the right track, but I'm not sure.
if SINGLE_TOM_RACE = 'Two or more races' then do;
array race{15} race1a-race15a;
do i ????????(first) then first=race(i);
else if race(i) ^= first then second = race(i);
end;
drop i;
run;
The question marks are there to replace the 'missing' from the original code. There are no missing values in the observations.
Any help is greatly appreciated. I'm usually recycling code rather than writing it myself because I know someone else has already written the code or it's similar enough that I can edit it to fit my needs.
Needed Output:
id | race1 | race2 | race3 | race4 | race5 |
434 | R5 | UNK | |||
600 | R5 | R2.08 | |||
682 | R3 | R1 | |||
786 | R5 | R1 | |||
1036 | R5 | R3 | |||
1632 | R5 | R3 | R2.11 | ||
1911 | R3 | R2 | |||
2215 | R5 | R2.08 | |||
2391 | R5 | R3 | |||
3087 | R3 | R1 | UNK | ||
3314 | R5 | R3 | |||
4026 | R5 | R3 | R1 | R2 | R4 |
Thanks!
Still not a very clear description.
Here is a small example of what I think you are asking for. I am assuming that you are attempting to assign a value to variables Race1 through Race 5 based on which of the 15 variables have a Y in their column.
This places the text of the desired result for each of the 15 (my example uses 3 because I'm lazy) variables in a temporary array. Temporary means the variables created do not get written to the data set. This is one way to "look up" values based on position. X are the input variables with the Y N or whatever values.
This counts how many Y are encountered and as long as there are 3 (too lazy for 5) or fewer it updates the Race1 to 3.
Note that if you don't want to deal with multiple subcodes such as the Race4A through Race9A you could enter the R2 instead of the R2.01 R2.06 etc. in the temporary array.
data example; input a $ b $ C $; array x (*) a b c; array t (3) $ 10 _temporary_('R1','R2','R2.3'); array r (3) $ 10 Race1-Race3; Ycount=0; do i=1 to dim(x); if x[i]='Y' then do; ycount=ycount+1; IF ycount le dim(r) then r[ycount] = t[i]; end; end; drop i ycount; datalines; Y N Y N Y Y Y Y Y Y Y N Y N N N Y N N N Y ;
Of note: you shouldn't need separate coding for the two or more as a single value gets set as well.
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software or organization policies. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
Only include the variable(s) needed for exercise and just enough rows to illustrate the different cases.
I am not sure I can tell what you expect as a result. You don't show any actual input values and the expected output for those.
Thanks for letting me know about those issues. This is my first post. I've added the SAS table as well as the needed output.
Best!
Sorry, I think this is what you had in mind than the sas table:
data work.race (Label= 'Multitace Data');
infile datalines dsd dlm='|' truncover;
input id:$12. RACE1a:$1. RACE2a:$1. RACE3a:$1. RACE4a:$1. RACE5a:$1. RACE6a:$1. RACE7a:$1. RACE8a:$1. RACE9a:$1.
RACE10a:$1. RACE11a:$1. RACE12a:$1. RACE13a:$1. RACE14a:$1. RACE15a:$1. SINGLE_TOM_RACE:$17.;
datalines;
434|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Y|Two or more races
600|Y|N|N|N|N|Y|N|N|N|N|N|N|N|N|N|Two or more races
682|N|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
786|Y|N|Y|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
1036|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
1632|Y|Y|N|N|N|N|Y|N|N|N|N|N|N|N|Y|Two or more races
1911|N|Y|N|N|N|N|N|N|N|Y|N|N|N|N|N|Two or more races
2215|Y|N|N|N|N|Y|N|N|N|N|N|N|N|N|N|Two or more races
2391|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
3087|N|Y|Y|N|N|N|N|N|N|N|N|N|N|N|Y|Two or more races
3314|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
4026|Y|Y|Y|N|N|N|N|N|N|Y|N|N|N|Y|N|Two or more races
;
run;
proc print data=work.race;
run;
quit;
Thanks again!
You want all 105 pairwise combinations?
PROC IML;
n = 15;
k = 2;
c = allcomb(n, k);
call sort(c, 1:k);
print c;
r = nrow(c);
print r;
QUIT;
Koen
Thanks for answering. I'm just looking for the 'Y'es response is in the race1a-race15a race columns in order to have the needed output below. The SAS table has been uploaded into the original post.
id | race1 | race2 | race3 | race4 | race5 |
434 | R5 | UNK | |||
600 | R5 | R2.08 | |||
682 | R3 | R1 | |||
786 | R5 | R1 | |||
1036 | R5 | R3 | |||
1632 | R5 | R3 | R2.11 | ||
1911 | R3 | R2 | |||
2215 | R5 | R2.08 | |||
2391 | R5 | R3 | |||
3087 | R3 | R1 | UNK | ||
3314 | R5 | R3 | |||
4026 | R5 | R3 | R1 | R2 | R4 |
What if more than 5 or the race values are Y?
I'm not sure. I've never run into that issue before. The majority of the individuals in my dataset are single race. The multiracial individuals have the Y or yes option in two or more of the columns labeled race1a-race15a. I've never seen more than 5 races being determined for an individual. The database I'm preparing the data for only has options to import 5 race classifications. For example, my import uses R4 for those of Hawaiian/ Pacific Islander decent. I can further differentiate based on columns Race11a-Race14a. If the person is Native Hawaiian, a Yes response in Column race11a, R4.01.001 can be used. As I'm typing this, it might be easier not to get into specifics for race where R2 (Asian decent) and R4 are concerned in my data.
Sorry for rambling, it's the end of my work day. I hope that answered the question.
Have a great weekend!
Still not a very clear description.
Here is a small example of what I think you are asking for. I am assuming that you are attempting to assign a value to variables Race1 through Race 5 based on which of the 15 variables have a Y in their column.
This places the text of the desired result for each of the 15 (my example uses 3 because I'm lazy) variables in a temporary array. Temporary means the variables created do not get written to the data set. This is one way to "look up" values based on position. X are the input variables with the Y N or whatever values.
This counts how many Y are encountered and as long as there are 3 (too lazy for 5) or fewer it updates the Race1 to 3.
Note that if you don't want to deal with multiple subcodes such as the Race4A through Race9A you could enter the R2 instead of the R2.01 R2.06 etc. in the temporary array.
data example; input a $ b $ C $; array x (*) a b c; array t (3) $ 10 _temporary_('R1','R2','R2.3'); array r (3) $ 10 Race1-Race3; Ycount=0; do i=1 to dim(x); if x[i]='Y' then do; ycount=ycount+1; IF ycount le dim(r) then r[ycount] = t[i]; end; end; drop i ycount; datalines; Y N Y N Y Y Y Y Y Y Y N Y N N N Y N N N Y ;
Of note: you shouldn't need separate coding for the two or more as a single value gets set as well.
Thank you! That is exactly what I am looking to do. I'm glad you were able to figure out what i was trying to describe. I'll try the code when I next at work. Have a great weekend!
I think this might be what was asked for. Sorry for all the updates.
data work.race (Label= 'Multitrace Data');
infile datalines dsd dlm='|' truncover;
input id:$12. RACE1a:$1. RACE2a:$1. RACE3a:$1. RACE4a:$1. RACE5a:$1. RACE6a:$1. RACE7a:$1. RACE8a:$1. RACE9a:$1. RACE10a:$1. RACE11a:$1. RACE12a:$1. RACE13a:$1. RACE14a:$1. RACE15a:$1. SINGLE_TOM_RACE:$17.;
datalines;
434|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Y|Two or more races
600|Y|N|N|N|N|Y|N|N|N|N|N|N|N|N|N|Two or more races
682|N|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
786|Y|N|Y|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
1036|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
1632|Y|Y|N|N|N|N|Y|N|N|N|N|N|N|N|Y|Two or more races
1911|N|Y|N|N|N|N|N|N|N|Y|N|N|N|N|N|Two or more races
2215|Y|N|N|N|N|Y|N|N|N|N|N|N|N|N|N|Two or more races
2391|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
3087|N|Y|Y|N|N|N|N|N|N|N|N|N|N|N|Y|Two or more races
3314|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
4026|Y|Y|Y|N|N|N|N|N|N|Y|N|N|N|Y|N|Two or more races
;
run;
proc print data=work.race;
run;
quit;
You can use COUNTC() to find how many Y's there are. Then use SELECT to decide what action to take.
First let's simplify your example. But also provide more different types of cases. To make my life easier I remove the spurious letter A from the end of the variable names so that I an reference them with a variable list.
data RACE (Label= 'Multitrace Data');
infile datalines dsd dlm='|' truncover;
input id :$12. (RACE1-RACE4) (:$1.) Expect :$17.;
datalines;
434|Y|N|N|N|RACE1
600|Y|N|N|Y|Two or more races
682|N|N|N|N|None specified
786|N|Y|N|N|RACE2
;
Now let's try processing it.
data want;
set race ;
array races race1-race4;
length race_single $17;
select (countc(cats(of races[*]),'Y'));
when(0) race_single='None specified';
when(1) race_single=vname(races[whichc('Y',of races[*])]);
otherwise race_single='Two or more races';
end;
run;
Result:
If your variables have Labels attached to them you could use VLABEL() instead of VNAME() function and get a result more like:
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.