BookmarkSubscribeRSS Feed
whyamihere
Calcite | Level 5

I have been working on this code for three hours for a school assignment, can you please help? I have a large dataset where I need to combine the ethnicity and race and create a variable called group with the following values: 1 (black non-Hispanic) ,  2 (white non-Hispanic), and 3 (Hispanic).

 

The following is the "key" 

"1"= "American Indian or Alaska Native"
"2"= "Asian"
"3"= "Black or African American"
"4"= "Native Hawaiian or Other Pacific Islander"
"5"= "White"
"6"= "Other"
"7"= "Unknown";
"E1"= "Hispanic or Latino"
"E2"= "Non-Hispanic or Latino"
"E7"= "Unknown";

I think I might be on to something here with this code but I think I need to make a variable named "group"

/*Create a new variable called group*/
DATA=APPENDICITIS.PROJECT3;
if ETHNICITY = "E2" AND RACE = "3" THEN GROUP = "1";
if ETHNICITY = "E2" AND RACE = "5" THEN GROUP = "2";
IF ETHNICITY = "E1" THEN GROUP = "3";
RUN;

Thanks so much for your expertise and time. 

5 REPLIES 5
SASKiwi
PROC Star

There's no equals sign in a DATA statement and LIBREFs are limited to 8 characters so I've shortened APPENDICITIS to APP. This logic is a bit better:

DATA APP.PROJECT3;
  set ???; * Need to read your input dataset;
if ETHNICITY = "E2" AND RACE = "3" THEN GROUP = "1";
else if ETHNICITY = "E2" AND RACE = "5" THEN GROUP = "2";
else IF ETHNICITY = "E1" THEN GROUP = "3";
RUN;
Ksharp
Super User
You could create a new variable by combining these two variable ,and create a format to get this GROUP variable, like:

data have;
set have;
new=catx(' ',ETHNICITY ,RACE );
run;
proc format;
value $ fmt
'E2 3'='1'
'E2 5'='2'
'E1' ='3'
;
run;
data want;
set have;
group=put(new,$fmt.);
run;
Tom
Super User Tom
Super User

You need to use a valid data step.  Can you fix that?

Your code is only assigning values to the new variable for 3 of the possible 32 combinations of your two variables.

RACE has 7 valid values and I assume could also have missing values, so 8 possible values.

ETHNICITY has 3 valid values and again I assume could also have missing value, so 4 possible values.

8*4= 32 possible combinations.

 

Do you just want group to be missing for the other 31 possible combinations?

whyamihere
Calcite | Level 5

Thanks for responding. What do you mean by using a valid data step? I have bitten off more than I can chew in this class but really want to finish, so please dumb it down for me. My instructions only want me to analyze these 3 values in relationship to mean age.

 

Tom
Super User Tom
Super User

@whyamihere wrote:

Thanks for responding. What do you mean by using a valid data step? I have bitten off more than I can chew in this class but really want to finish, so please dumb it down for me. My instructions only want me to analyze these 3 values in relationship to mean age.

 


Data step starts with a DATA statement that names the dataset(s) it is creating.

It needs to have a source of data.  If that is an existing dataset then you need a SET statement.

Example:

data want;
  set have;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1084 views
  • 0 likes
  • 4 in conversation