Solved: Calculating Race/Ethnicity from a matrix

mothert · Posted 12-02-2022 02:58 PM

I am relatively new to SAS and I am trying to calculate Race and ethnicity from a matric table as Follows:

Hispanic	White	Black	Asian	American Indian/Native American	Hawaiian/Pacific Islander	Unknown
N	Y	N	N	N	N
N	Y	N	N	N	N
Z	Y	N	N	N	N
N	Y	N	N	Y	N
N	Y	N	N	N	N
Z	Z	Z	Z	Z	Z
Y	N	N	N	N	N

Y = Yes, N=No, Z=Unknown

I just need to get a count of each race and add a 'Multiracial' variable, not sure how to do that since each race is its own variable and not sure how to code for multiracial. Multiracial would be Y to any two or more of the races. And if Hispanic = Y then only Hispanic regardless if any other race = Y.

Thanks,

Thomas

PaigeMiller · Posted 12-02-2022 03:31 PM

Using letters Y N Z is not the easiest way to do things here, in my opinion.

Using numeric value 1 0 and missing makes the problem a lot simpler. So, if you start where the values are 1 0 and missing, then this gets you the desired result.

data have2;
    set have;
    if sum(of hispanic--hawaiian)>1 and hispanic^=1 then multiracial=1;
    else multiracial=0;
    if n(of hispanic--hawaiian)=0 then unknown=1; 
    else unknown=0;
run;
proc summary data=have2;
    var hispanic--hawaiian multiracial unknown;
    output out=want sum=;
run;

Now, since you are new, I point out that we can only work with data in SAS data set, not the screen capture of your data that you show which doesn't even have valid variable names. So you can make your SAS coding simple if your data is in meaningful form, and you must have valid SAS variable names.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 12-02-2022 03:31 PM

Using letters Y N Z is not the easiest way to do things here, in my opinion.

Using numeric value 1 0 and missing makes the problem a lot simpler. So, if you start where the values are 1 0 and missing, then this gets you the desired result.

data have2;
    set have;
    if sum(of hispanic--hawaiian)>1 and hispanic^=1 then multiracial=1;
    else multiracial=0;
    if n(of hispanic--hawaiian)=0 then unknown=1; 
    else unknown=0;
run;
proc summary data=have2;
    var hispanic--hawaiian multiracial unknown;
    output out=want sum=;
run;

Now, since you are new, I point out that we can only work with data in SAS data set, not the screen capture of your data that you show which doesn't even have valid variable names. So you can make your SAS coding simple if your data is in meaningful form, and you must have valid SAS variable names.

--
Paige Miller

mothert · Posted 12-02-2022 04:03 PM

I agree that 1 and 0 would make this easier but this is the data that I have and this is the format required by whom we report the data to. I am trying to verify their counts with what we submitted.

Kurt_Bremser · Posted 12-02-2022 04:22 PM

For all kinds of statistics, you are WAY better off with the numeric 0/1/missing. The fact that people who have no clue about computing deliver data in bad formats, layouts and contents does NOT force you to also be an idiot and make your life unnecessarily hard.

Convert the data to numeric as soon as you get it (use a proper informat), and use formats to display them in the manner the pumpkins expect.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PaigeMiller · Posted 12-02-2022 06:21 PM

@mothert wrote:

I agree that 1 and 0 would make this easier but this is the data that I have and this is the format required by whom we report the data to. I am trying to verify their counts with what we submitted.

Then convert the Y Z N to 1 0 missing yourself, and then the code I provided will work*.

Otherwise, please provide an example SAS data set of your Y N Z data, with actual valid SAS data set names, following these instructions or by typing in working SAS data step code yourself. Emphasis on working. Screen captures and Excel files are not acceptable. Once we have that, we will show you how to convert the Y N Z to 1 0 missing, and then the code I provided will work*.

* — assuming you adjust the code for the new variable names of the 1 0 missing variables.

--
Paige Miller

mothert · Posted 11-01-2023 03:06 PM

I forgot to post that I was able to transform the alpha characters to 1,0. I then used your code to aggregate that data. Thank you!

ballardw · Posted 12-02-2022 03:54 PM

What is the purpose of that Unknown column in the matrix? It has no values and you do not mention it at all. That matrix doesn't look like much of guideline.

Provide some example of your data. Best is as working data step code. At least provide the names of your SAS variables and some example values. Best would be include an identification variable so we can discuss things like "why when ID=23 does xxx occur?" instead of trying to count lines. Try to cover as many cases as you think you may have in the data.

For that example provide what you expect as a result. It is best to separate the "have" starting data and the "want" result so there is no confusion.

mothert · Posted 12-02-2022 04:15 PM

I need a count of the Unknown which is the value of 'Z' in the race fields. I need to add variables, Multi-Racial and Unknown. And the matrix I provided is an example of the data. The variable names were RWHT = White, RBLK = Black, RASN = Asian, RAIN = American Indian, RHIP = Hawaiian/Pacific Islander, HISP = Hispanic. Need to add Mutliracial and Unknow. Mutliracial would be if any two or more of RWHT, RBLK, RASN, RAIN, RHIP = Y. Unknown would be if any race/ethinicty = Z and all others = N.

The result I am looking for would be as follows:

Unduplicated Headcount	2022
Asian/Pacific Islanders	128
Black (non-Hispanic)	595
Hispanic	478
American Indian/Alaskan Native	34
White (non-Hispanic)	4,552
Multi Racial	132
Unknown	100

The matrix is a sample of the actual date from the file I am reading into SAS.

ballardw · Posted 12-02-2022 05:58 PM

"The matrix is a sample of the actual date from the file I am reading into SAS."

So, how are you reading that??? The column headings aren't valid variable names and awkward to work with. And still not description or values for the column Uknown.

If that is a text file I will show you how I would read that data.

proc format;
invalue ynz (upcase)
'Y'=1
'N'=0
'Z'=.
other=_error_;
run;

data read;
  informat Hispanic White Black Asian AIAN HoPI ynz.;
  input Hispanic White Black Asian AIAN HoPI;
  label AIAN='American Indian/Alaska Native'
        Hopi='Hawaiian/Pacific Islander'
  ;
  /* NOTE providing actual examples of other 
  values and MULTIRACIAL
  */
datalines;
N Y N N N N  
N Y N N N N  
Z Y N N N N  
N Y N N Y N  
N Y N N N N  
Z Z Z Z Z Z  
Y N N N N N 
N  Y  Y  N  N  N
N  N  Y  n  n  n
n  n  n  y  n  n
n  n  n  n  y  y
n  n  n  n  n  y
;

The above uses a slightly advanced technique of a custom informat. The INVALUE makes and names numeric informat. That means that it will read TEXT and result in a numeric value. The UPCASE option tells SAS to make the text upper case before comparing to the list of values, which helps if someone isn't real consistent when entering data into a form. The

'Y'=1 means "read a Y and assign 1 as the value". The other=_error_ means if a value other than Y, N or Z (after UPCASE!) is encountered then treat the value as an error. That means the log will show a note about invalid data and assign a missing value to the variable. In real work that means investigate which value is encountered and you decide what to do with it.

The data step uses an Informat statement to assign the informat to the variables and then uses Input to read them.

Note: You did not provide any values in your Unknown column so I did not include them. I did include some rows with other values as you did not provide any Asian, AIAN or HoPI (often Hawaiian or Other Pacific Islander) values so I did.

I get lots of garbage data. Read it as makes sense and create the report to standard. The boss doesn't care as long as the results match the requirements, on time, accurate and repeatable (when the NEXT stupid file comes in).

Ksharp · Posted 12-02-2022 09:47 PM


data read;
  input (Hispanic White Black Asian AIAN HoPI) ($);
  label AIAN='American Indian/Alaska Native'
        Hopi='Hawaiian/Pacific Islander'
  ;
datalines;
N Y N N N N  
N Y N N N N  
Z Y N N N N  
N Y N N Y N  
N Y N N N N  
Z Z Z Z Z Z  
Y N N N N N 
;

proc transpose data=read(obs=0) out=temp;
var _all_;
run;
proc sql noprint;
select cat('sum(',_name_,'="Y") as ',_name_) into : list separated by ','
 from temp;

create table temp2 as
select &list. from read;
quit;
proc transpose data=temp2 out=want;
run;

Calculating Race/Ethnicity from a matrix

Re: Calculating Race/Ethnicity from a matrix

Re: Calculating Race/Ethnicity from a matrix

Re: Calculating Race/Ethnicity from a matrix

Re: Calculating Race/Ethnicity from a matrix

Re: Calculating Race/Ethnicity from a matrix

Re: Calculating Race/Ethnicity from a matrix

Re: Calculating Race/Ethnicity from a matrix

Re: Calculating Race/Ethnicity from a matrix

Re: Calculating Race/Ethnicity from a matrix

Re: Calculating Race/Ethnicity from a matrix

Catch up on SAS Innovate 2026