BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mothert
Calcite | Level 5

I am relatively new to SAS and I am trying to calculate Race and ethnicity from a matric table as Follows:

 

HispanicWhite BlackAsianAmerican Indian/Native AmericanHawaiian/Pacific IslanderUnknown
NYNNNN 
NYNNNN 
ZYNNNN 
NYNNYN 
NYNNNN 
ZZZZZZ 
YNNNNN 

 

Y = Yes, N=No, Z=Unknown

I just need to get a count of each race and add a 'Multiracial' variable, not sure how to do that since each race is its own variable and not sure how to code for multiracial. Multiracial would be Y to any two or more of the races. And if Hispanic = Y then only Hispanic regardless if any other race = Y.

Thanks,

Thomas

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Using letters Y N Z is not the easiest way to do things here, in my opinion.

 

Using numeric value 1 0 and missing makes the problem a lot simpler. So, if you start where the values are 1 0 and missing, then this gets you the desired result.

 

data have2;
    set have;
    if sum(of hispanic--hawaiian)>1 and hispanic^=1 then multiracial=1;
    else multiracial=0;
    if n(of hispanic--hawaiian)=0 then unknown=1; 
    else unknown=0;
run;
proc summary data=have2;
    var hispanic--hawaiian multiracial unknown;
    output out=want sum=;
run;

 

Now, since you are new, I point out that we can only work with data in SAS data set, not the screen capture of your data that you show which doesn't even have valid variable names. So you can make your SAS coding simple if your data is in meaningful form, and you must have valid SAS variable names.

--
Paige Miller

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

Using letters Y N Z is not the easiest way to do things here, in my opinion.

 

Using numeric value 1 0 and missing makes the problem a lot simpler. So, if you start where the values are 1 0 and missing, then this gets you the desired result.

 

data have2;
    set have;
    if sum(of hispanic--hawaiian)>1 and hispanic^=1 then multiracial=1;
    else multiracial=0;
    if n(of hispanic--hawaiian)=0 then unknown=1; 
    else unknown=0;
run;
proc summary data=have2;
    var hispanic--hawaiian multiracial unknown;
    output out=want sum=;
run;

 

Now, since you are new, I point out that we can only work with data in SAS data set, not the screen capture of your data that you show which doesn't even have valid variable names. So you can make your SAS coding simple if your data is in meaningful form, and you must have valid SAS variable names.

--
Paige Miller
mothert
Calcite | Level 5

I agree that 1 and 0 would make this easier but this is the data that I have and this is the format required by whom we report the data to. I am trying to verify their counts with what we submitted.

 

Kurt_Bremser
Super User

For all kinds of statistics, you are WAY better off with the numeric 0/1/missing. The fact that people who have no clue about computing deliver data in bad formats, layouts and contents does NOT force you to also be an idiot and make your life unnecessarily hard.

Convert the data to numeric as soon as you get it (use a proper informat), and use formats to display them in the manner the pumpkins expect.

PaigeMiller
Diamond | Level 26

@mothert wrote:

I agree that 1 and 0 would make this easier but this is the data that I have and this is the format required by whom we report the data to. I am trying to verify their counts with what we submitted.

 


Then convert the Y Z N to 1 0 missing yourself, and then the code I provided will work*.

 

Otherwise, please provide an example SAS data set of your Y N Z data, with actual valid SAS data set names, following these instructions or by typing in working SAS data step code yourself. Emphasis on working. Screen captures and Excel files are not acceptable. Once we have that, we will show you how to convert the Y N Z to 1 0 missing, and then the code I provided will work*.

 

* — assuming you adjust the code for the new variable names of the 1 0 missing variables.

--
Paige Miller
mothert
Calcite | Level 5
I forgot to post that I was able to transform the alpha characters to 1,0. I then used your code to aggregate that data. Thank you!
ballardw
Super User

What is the purpose of that Unknown column in the matrix? It has no values and you do not mention it at all. That matrix doesn't look like much of guideline.

 

Provide some example of your data. Best is as working data step code. At least provide the names of your SAS variables and some example values. Best would be include an identification variable so we can discuss things like "why when ID=23 does xxx occur?" instead of trying to count lines. Try to cover as many cases as you think you may have in the data.

 

For that example provide what you expect as a result. It is best to separate the "have" starting data and the "want" result so there is no confusion.

mothert
Calcite | Level 5

I need a count of the Unknown which is the value of 'Z' in the race fields.  I need to add variables, Multi-Racial and Unknown. And the matrix I provided is an example of the data. The variable names were RWHT = White, RBLK = Black, RASN = Asian, RAIN = American Indian, RHIP = Hawaiian/Pacific Islander, HISP = Hispanic. Need to add Mutliracial and Unknow. Mutliracial would be if any two or more of RWHT, RBLK, RASN, RAIN, RHIP = Y. Unknown would be if any race/ethinicty = Z and all others = N. 

 

The result I am looking for would be as follows:

 

Unduplicated Headcount2022
Asian/Pacific Islanders128
Black (non-Hispanic)595
Hispanic478
American Indian/Alaskan Native34
White (non-Hispanic)4,552
Multi Racial 132
Unknown100

 

The matrix is a sample of the actual date from the file I am reading into SAS.

 

 
ballardw
Super User

"The matrix is a sample of the actual date from the file I am reading into SAS."

 

So, how are you reading that??? The column headings aren't valid variable names and awkward to work with. And still not description or values for the column Uknown.

If that is a text file I will show you how I would read that data.

proc format;
invalue ynz (upcase)
'Y'=1
'N'=0
'Z'=.
other=_error_;
run;

data read;
  informat Hispanic White Black Asian AIAN HoPI ynz.;
  input Hispanic White Black Asian AIAN HoPI;
  label AIAN='American Indian/Alaska Native'
        Hopi='Hawaiian/Pacific Islander'
  ;
  /* NOTE providing actual examples of other 
  values and MULTIRACIAL
  */
datalines;
N Y N N N N  
N Y N N N N  
Z Y N N N N  
N Y N N Y N  
N Y N N N N  
Z Z Z Z Z Z  
Y N N N N N 
N  Y  Y  N  N  N
N  N  Y  n  n  n
n  n  n  y  n  n
n  n  n  n  y  y
n  n  n  n  n  y
;

The above uses a slightly advanced technique of a custom informat. The INVALUE makes and names numeric informat. That means that it will read TEXT and result in a numeric value. The UPCASE option tells SAS to make the text upper case before comparing to the list of values, which helps if someone isn't real consistent when entering data into a form. The

'Y'=1 means "read a Y and assign 1 as the value". The other=_error_ means if a value other than Y, N or Z (after UPCASE!) is encountered then treat the value as an error. That means the log will show a note about invalid data and assign a missing value to the variable. In real work that means investigate which value is encountered and you decide what to do with it.

The data step uses an Informat statement to assign the informat to the variables and then uses Input to read them.

 

Note: You did not provide any values in  your Unknown column so I did not include them. I did include some rows with other values as you did not provide any Asian, AIAN or HoPI (often Hawaiian or Other Pacific Islander) values so I did.

 

I get lots of garbage data. Read it as makes sense and create the report to standard. The boss doesn't care as long as the results match the requirements, on time, accurate and repeatable (when the NEXT stupid file comes in).

 

 

Ksharp
Super User

data read;
  input (Hispanic White Black Asian AIAN HoPI) ($);
  label AIAN='American Indian/Alaska Native'
        Hopi='Hawaiian/Pacific Islander'
  ;
datalines;
N Y N N N N  
N Y N N N N  
Z Y N N N N  
N Y N N Y N  
N Y N N N N  
Z Z Z Z Z Z  
Y N N N N N 
;

proc transpose data=read(obs=0) out=temp;
var _all_;
run;
proc sql noprint;
select cat('sum(',_name_,'="Y") as ',_name_) into : list separated by ','
 from temp;

create table temp2 as
select &list. from read;
quit;
proc transpose data=temp2 out=want;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2003 views
  • 1 like
  • 5 in conversation