BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Epi
Calcite | Level 5 Epi
Calcite | Level 5

Edited: added SAS table file as well as the output needed.  Thank you to those who have already responded. I have not had a chance to try the code in responses at this time.   I'm new to this posting questions.  I'm usually able to recycle code from other programs or code I find online.

 

Hi everyone,

I have a variable where is tells me if the race is just alone or "Two or more races".  For the ones where race is 'race, alone', simple coding:

if SINGLE_TOM_RACE NE 'Two or more races' THEN DO;
 if SINGLE_TOM_RACE = 'American Indian/Alaskan Native, Alone' then race1 = 'R1';
	else if SINGLE_TOM_RACE = 'Asian, Alone' then race1 = 'R2';
	else if SINGLE_TOM_RACE = = 'Black/African American, Alone' then race1 = 'R3';
	else if SINGLE_TOM_RACE = 'Native Hawaiian or other Pacific Islander, Alone' then race1 = 'R4';
	else if SINGLE_TOM_RACE = 'White, Alone' then race1 = 'R5';
	else if SINGLE_TOM_RACE = 'Other/Unknown' then race1= 'UNK';
	else race1='';
end;

I'm having brain freeze on when SINGLE_TOM_RACE is 'Two or more races'.  I know I need an array.  Each observation is unique.  Race categories run from Race1a -- Race15a.  Data dictionary is in the table below:

columncodeDecedent's race literalIMPORT RESULT NEEDED 
RACE1aY, N, UWhiteR5 
RACE2aY, N, UBlack or African AmericanR3 
RACE3aY, N, UAmerican Indian or Alaska NativeR1 
RACE4aY, N, UAsian IndianR2.01 
RACE5aY, N, UChineseR2.06 
RACE6aY, N, UFilipinoR2.08 
RACE7aY, N, UJapaneseR2.11 
RACE8aY, N, UKoreanR2.12 
RACE9aY, N, UVietnameseR2.19 
RACE10aY, N, UOther AsianR2 
RACE11aY, N, UNative HawaiianR4.01.001 
RACE12aY, N, UGuamanian or ChamorroR4.02.001 
RACE13aY, N, USamoanR4.01.002 
RACE14aY, N, UOther Pacific IslanderR4 
RACE15aY, N, UOtherUNK 
     
IMPORT FILE HAS RACE VARIABLES RACE1-RACE5 AVAILABLE FOR MULTIRACIAL CASES

It's also in the attached excel file.

During my searches for similar answers, I've only found responses for duplicate observations and not unique/ distinct observations.  I know that I need the first - n 'Y' in each row, but I am completely blanking on the way to do that array.  From the searches I have found, this code below seems to be on the right track, but I'm not sure.

if SINGLE_TOM_RACE = 'Two or more races' then do;
 array race{15} race1a-race15a;
  do i ????????(first) then first=race(i);
    else if race(i) ^= first then second = race(i);
  end;
  drop i;
run;

The question marks are there to replace the 'missing' from the original code.  There are no missing values in the observations.

 

Any help is greatly appreciated.  I'm usually recycling code rather than writing it myself because I know someone else has already written the code or it's similar enough that I can edit it to fit my needs.

Needed Output:

 

idrace1race2race3race4race5
434R5UNK   
600R5R2.08   
682R3R1   
786R5R1   
1036R5R3   
1632R5R3R2.11  
1911R3R2   
2215R5R2.08   
2391R5R3   
3087R3R1UNK  
3314R5R3   
4026R5R3R1R2R4

 

 

Thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Still not a very clear description.

 

Here is a small example of what I think you are asking for. I am assuming that you are attempting to assign a value to variables Race1 through Race 5 based on which of the 15 variables have a Y in their column.

This places the text of the desired result for each of the 15 (my example uses 3 because I'm lazy) variables in a temporary array. Temporary means the variables created do not get written to the data set. This is one way to "look up" values based on position. X are the input variables with the Y N or whatever values.

This counts how many Y are encountered and as long as there are 3 (too lazy for 5) or fewer it updates the Race1 to 3.

 

Note that if you don't want to deal with multiple subcodes such as the Race4A through Race9A you could enter the R2 instead of the R2.01 R2.06 etc. in the temporary array.

 

data example;
   input a $ b $ C $;
   array x (*) a b c;
   array t (3) $ 10 _temporary_('R1','R2','R2.3');
   array r (3) $ 10 Race1-Race3;
   Ycount=0;
   do i=1 to dim(x);
      if x[i]='Y' then do;
         ycount=ycount+1;
         IF ycount le dim(r) then r[ycount] = t[i];
      end;
   end;
   drop i ycount;
datalines;
Y N Y
N Y Y
Y Y Y
Y Y N
Y N N
N Y N
N N Y
;

Of note: you shouldn't need separate coding for the two or more as a single value gets set as well.

 

 

View solution in original post

11 REPLIES 11
ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software or organization policies. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

Only include the variable(s) needed for exercise and just enough rows to illustrate the different cases.

 

I am not sure I can tell what you expect as a result. You don't show any actual input values and the expected output for those.

Epi
Calcite | Level 5 Epi
Calcite | Level 5

Thanks for letting me know about those issues.  This is my first post.  I've added the SAS table as well as the needed output.

 

Best!

Epi
Calcite | Level 5 Epi
Calcite | Level 5

Sorry, I think this is what you had in mind than the sas table:

 

data work.race (Label= 'Multitace Data');
infile datalines dsd dlm='|' truncover;
input id:$12. RACE1a:$1. RACE2a:$1. RACE3a:$1. RACE4a:$1. RACE5a:$1. RACE6a:$1. RACE7a:$1. RACE8a:$1. RACE9a:$1.
RACE10a:$1. RACE11a:$1. RACE12a:$1. RACE13a:$1. RACE14a:$1. RACE15a:$1. SINGLE_TOM_RACE:$17.;

datalines;
434|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Y|Two or more races
600|Y|N|N|N|N|Y|N|N|N|N|N|N|N|N|N|Two or more races
682|N|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
786|Y|N|Y|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
1036|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
1632|Y|Y|N|N|N|N|Y|N|N|N|N|N|N|N|Y|Two or more races
1911|N|Y|N|N|N|N|N|N|N|Y|N|N|N|N|N|Two or more races
2215|Y|N|N|N|N|Y|N|N|N|N|N|N|N|N|N|Two or more races
2391|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
3087|N|Y|Y|N|N|N|N|N|N|N|N|N|N|N|Y|Two or more races
3314|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
4026|Y|Y|Y|N|N|N|N|N|N|Y|N|N|N|Y|N|Two or more races
;
run;

proc print data=work.race;
run;
quit;

Thanks again!

sbxkoenk
SAS Super FREQ

You want all 105 pairwise combinations?

PROC IML;
n = 15; 
k = 2; 
c = allcomb(n, k);
call sort(c, 1:k);
print c;
r = nrow(c);
print r;
QUIT;

Koen

Epi
Calcite | Level 5 Epi
Calcite | Level 5

Thanks for answering.  I'm just looking for the 'Y'es response is in the race1a-race15a race columns in order to have the needed output below.  The SAS table has been uploaded into the original post.

idrace1race2race3race4race5
434R5UNK   
600R5R2.08   
682R3R1   
786R5R1   
1036R5R3   
1632R5R3R2.11  
1911R3R2   
2215R5R2.08   
2391R5R3   
3087R3R1UNK  
3314R5R3   
4026R5R3R1R2R4
Epi
Calcite | Level 5 Epi
Calcite | Level 5

I'm not sure.  I've never run into that issue before.  The majority of the individuals in my dataset are single race.  The multiracial individuals have the Y or yes option in two or more of the columns labeled race1a-race15a.  I've never seen more than 5 races being determined for an individual.  The database I'm preparing the data for only has options to import 5 race classifications.  For example, my import uses R4 for those of Hawaiian/ Pacific Islander decent.  I can further differentiate based on columns Race11a-Race14a.  If the person is Native Hawaiian, a Yes response in Column race11a, R4.01.001 can be used.  As I'm typing this, it might be easier not to get into specifics for race where R2 (Asian decent) and R4 are concerned in my data.

 

Sorry for rambling, it's the end of my work day.  I hope that answered the question.

Have a great weekend!

ballardw
Super User

Still not a very clear description.

 

Here is a small example of what I think you are asking for. I am assuming that you are attempting to assign a value to variables Race1 through Race 5 based on which of the 15 variables have a Y in their column.

This places the text of the desired result for each of the 15 (my example uses 3 because I'm lazy) variables in a temporary array. Temporary means the variables created do not get written to the data set. This is one way to "look up" values based on position. X are the input variables with the Y N or whatever values.

This counts how many Y are encountered and as long as there are 3 (too lazy for 5) or fewer it updates the Race1 to 3.

 

Note that if you don't want to deal with multiple subcodes such as the Race4A through Race9A you could enter the R2 instead of the R2.01 R2.06 etc. in the temporary array.

 

data example;
   input a $ b $ C $;
   array x (*) a b c;
   array t (3) $ 10 _temporary_('R1','R2','R2.3');
   array r (3) $ 10 Race1-Race3;
   Ycount=0;
   do i=1 to dim(x);
      if x[i]='Y' then do;
         ycount=ycount+1;
         IF ycount le dim(r) then r[ycount] = t[i];
      end;
   end;
   drop i ycount;
datalines;
Y N Y
N Y Y
Y Y Y
Y Y N
Y N N
N Y N
N N Y
;

Of note: you shouldn't need separate coding for the two or more as a single value gets set as well.

 

 

Epi
Calcite | Level 5 Epi
Calcite | Level 5

Thank you!  That is exactly what I am looking to do.  I'm glad you were able to figure out what i was trying to describe.   I'll try the code when I next at work. Have a great weekend!

 

Epi
Calcite | Level 5 Epi
Calcite | Level 5

I think this might be what was asked for.  Sorry for all the updates.

 

data work.race (Label= 'Multitrace Data');
infile datalines dsd dlm='|' truncover;
input id:$12. RACE1a:$1. RACE2a:$1. RACE3a:$1. RACE4a:$1. RACE5a:$1. RACE6a:$1. RACE7a:$1. RACE8a:$1. RACE9a:$1. RACE10a:$1. RACE11a:$1. RACE12a:$1. RACE13a:$1. RACE14a:$1. RACE15a:$1. SINGLE_TOM_RACE:$17.;

datalines;
434|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Y|Two or more races
600|Y|N|N|N|N|Y|N|N|N|N|N|N|N|N|N|Two or more races
682|N|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
786|Y|N|Y|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
1036|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
1632|Y|Y|N|N|N|N|Y|N|N|N|N|N|N|N|Y|Two or more races
1911|N|Y|N|N|N|N|N|N|N|Y|N|N|N|N|N|Two or more races
2215|Y|N|N|N|N|Y|N|N|N|N|N|N|N|N|N|Two or more races
2391|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
3087|N|Y|Y|N|N|N|N|N|N|N|N|N|N|N|Y|Two or more races
3314|Y|Y|N|N|N|N|N|N|N|N|N|N|N|N|N|Two or more races
4026|Y|Y|Y|N|N|N|N|N|N|Y|N|N|N|Y|N|Two or more races
;
run;

proc print data=work.race;
run;
quit;

 

 

Tom
Super User Tom
Super User

You can use COUNTC() to find how many Y's there are.  Then use SELECT to decide what action to take.

 

First let's simplify your example. But also provide more different types of cases.  To make my life easier I remove the spurious letter A from the end of the variable names so that I an reference them with a variable list.

data RACE (Label= 'Multitrace Data');
  infile datalines dsd dlm='|' truncover;
  input id :$12. (RACE1-RACE4) (:$1.) Expect :$17.;
datalines;
434|Y|N|N|N|RACE1
600|Y|N|N|Y|Two or more races
682|N|N|N|N|None specified
786|N|Y|N|N|RACE2
;

Now let's try processing it.

data want;
  set race ;
  array races race1-race4;
  length race_single $17;
  select (countc(cats(of races[*]),'Y'));
    when(0) race_single='None specified';
    when(1) race_single=vname(races[whichc('Y',of races[*])]);
    otherwise race_single='Two or more races';
  end;
run;

Result:

Tom_0-1718392599269.png

If your variables have Labels attached to them you could use VLABEL() instead of VNAME() function and get a result more like:

Tom_1-1718392662896.png

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1520 views
  • 2 likes
  • 4 in conversation