I am working on creating groups and giving them new lables.
I am trying to create two groups NHW (NonHispanicWhite) and OTHER.
Right now the data allows for 7 different answers for race/ethnicity. I am trying to make this into two groups and label them accordingly. The way I am trying to do this is have everyone who answered 1 be NHW and everyone who answered 2,3,4,5,6,7 be OTHER. NHW is to = 1 and Other is to = 0 and then have the correct labels. My code is below of me attempting this: Pop is what race/ethnicity is coded as in my data dictionary
Lastly, i am trying to see if race/ethnicity has a correlation with hesitancy which is what my last piece of code is down there.
not sure if I should be using a IF ELSE statement or the PROC FORMAT
Thank you!
DATA WORK.vaccine_hesitancy_data;
SET WORK.vaccine_hesitancy_data;
PROC FORMAT;
VALUE POP 1= 'NHW'
2,3,4,5,6,7,= 'Other';
PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;
RUN;
I am working on a project where I am trying to find levels of hesitancy in different demographics.
right now I am trying to group my race and ethnicity group into two groups instead of seven since some groups have too low of respondents to work with.
My race/ethnicity is coded as 'pop' in the data dictionary.
I am trying to make everyone who answered 1 = NHW (nonhispanic white)
I am trying to make everyone who answered anything other than that (2,3,4,5,6,7) = 0
I am trying to make 1's label be NHM
I am trying to make 0's label be Other
lastly I am trying to run a table to see their hesitancy which I have already coded for. Below is my current code attempting this. Thank you for any insight! I am not sure if I should be using an IF ELSE statement or possibily a PROC FORMAT
*******************************************
DATA WORK.vaccine_hesitancy_data;
SET WORK.vaccine_hesitancy_data;
PROC FORMAT;
VALUE POP 1= 'NHW'
2,3,4,5,6,7,= 'Other';
PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;
RUN;
Essentially identical question combined.
You need to associate the FORMAT with a variable.
format pop pop.;
Assuming the variable is also POP but I suspect that ETHNIC or RACE is more likely.
PROC FORMAT;
VALUE POP_FMT
1= 'NHW'
2,3,4,5,6,7,= 'Other';
RUN;
PROC FREQ data=vaccine_hesitancy_data;
TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;
format POP POP_FMT.;
RUN;
You didn't apply the format.
And just a note that your first two lines (DATA/SET) do nothing.
@Guerraje wrote:
I am working on creating groups and giving them new lables.
I am trying to create two groups NHW (NonHispanicWhite) and OTHER.
Right now the data allows for 7 different answers for race/ethnicity. I am trying to make this into two groups and label them accordingly. The way I am trying to do this is have everyone who answered 1 be NHW and everyone who answered 2,3,4,5,6,7 be OTHER. NHW is to = 1 and Other is to = 0 and then have the correct labels. My code is below of me attempting this: Pop is what race/ethnicity is coded as in my data dictionary
Lastly, i am trying to see if race/ethnicity has a correlation with hesitancy which is what my last piece of code is down there.
not sure if I should be using a IF ELSE statement or the PROC FORMAT
Thank you!
DATA WORK.vaccine_hesitancy_data;
SET WORK.vaccine_hesitancy_data;
PROC FORMAT;
VALUE POP 1= 'NHW'
2,3,4,5,6,7,= 'Other';
PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;
RUN;
Your PROC Format has a value list ending in a comma, hence it says something is missing.
Instead of
PROC FORMAT; VALUE POP_FMT 1= 'NHW' 2,3,4,5,6,7,= 'Other'; RUN;
The format value list should be:
PROC FORMAT; VALUE POP_FMT 1= 'NHW' 2,3,4,5,6,7= 'Other'; RUN;
If your intent was to include missing values you have to include the missing value indicator . in the list.
Or better if you did want missing values in the other category is to use:
PROC FORMAT; VALUE POP_FMT 1= 'NHW' other = 'Other'; RUN;
the "other" in a value list says to apply this label to all values not explicitly included in another range list.
Since your syntax was incorrect the format was not created. So there was no format available to apply to your Proc Print (or other procedures) output.
As asked this is my entire code, this contains everything I am working on.
I am stuck at a few places in this code. This forum i am trying to target the race/ethnicity and trying to combine it into the two groups of NHW and other which is at the bottom.
Thank you
/* Generated Code (IMPORT) */
/* Source File: vaccine_hesitancy_data.csv */
/* Source Path: /home/u57317860/Capstone data */
/* Code generated on: 9/24/21, 4:45 AM */
%web_drop_table(WORK.vaccine_hesitancy_data);
FILENAME REFFILE '/home/u57317860/Capstone data/vaccine_hesitancy_data.csv';
PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=WORK.vaccine_hesitancy_data;
GETNAMES=YES;
RUN;
PROC CONTENTS DATA=WORK.vaccine_hesitancy_data; RUN;
%web_open_table(WORK.vaccine_hesitancy_data);
DATA WORK.vaccine_hesitancy_data;
SET WORK.vaccine_hesitancy_data;
IF VACCINEUPTAKE= 0 AND VACCINEINTEND IN (2,3,4,5,9) THEN VACCINEHESITANT=1;
ELSE VACCINEHESITANT= 0;
IF SEX IN ("Unknow", "") THEN DELETE;
RUN;
PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (VACCINEUPTAKE VACCINEINTEND SEX) * VACCINEHESITANT/ MISSING CHISQ;
RUN;
PROC TTEST data=WORK.vaccine_hesitancy_data;
VAR AGE;
CLASS VACCINEHESITANT;
RUN;
PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (SEX AGE POP EDUCATION IMPACTPHYSICAL IMPACTMENTAL IMPACTFAMILY IMPACTEMPLOYMENT) * VACCINEHESITANT/ MISSING CHISQ;
RUN;
*********ATTEMPTING TO RUN EDUCATION VARIABLE*******************;
PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (EDUCATION) * VACCINEHESITANT/ MISSING CHISQ;
RUN;
*********ATTEMPTING TO CREATE AGE VARIABLES AND RUN T TEST *****;
DATA WORK.vaccine_hesitancy_data;
SET WORK.vaccine_hesitancy_data;
if AGE < 30 then AGE_NEW = "18-30";
else if AGE = >30 then AGE_NEW = "30-65";
DATA WORK.vaccine_hesitancy_data;
SET WORK.vaccine_hesitancy_data;
IF AGE > 65 THEN AGE NEW= "65+";
ELSE IF AGE = < 65+ THEN AGE NEW = "UNDER65+";
PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (AGE) * VACCINEHESITANT/ MISSING CHISQ;
PROC TTEST data=WORK.vaccine_hesitancy_data;
VAR AGE;
CLASS VACCINEHESITANT;
RUN;
************* Trying to turn race/ethnicity into two variables nhw and other****;
PROC FORMAT;
VALUE POP_FMT
1= 'NHW'
2,3,4,5,6,7,= 'Other';
RUN;
PROC FREQ data=vaccine_hesitancy_data;
TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;
format POP POP_FMT.;
RUN;
****************** HAS ANYONE THAT LIVES WITH YOU TESTED POSITIVE this code deletes the unknown delete the if then if we decide not to do this *****;
DATA WORK.vaccine_hesitancy_data;
SET WORK.vaccine_hesitancy_data;
IF livetestpos = '.' then delete;
RUN;
PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (livetestpos) * VACCINEHESITANT/ MISSING CHISQ;
run;
DO NOT code like this:
DATA WORK.vaccine_hesitancy_data;
SET WORK.vaccine_hesitancy_data;
This means each time you're replacing your original data set. If you make a mistake you've destroyed your original data.
Also, organize your code so that you have it as follows:
FYI - given your code I suspect your data wasn't imported correctly and got truncated somewhere. I would recommend not using PROC IMPORT or verifying the data against the source again. In the example below I've used GUESSINGROWS=MAX to ensure the data is read in correctly, but it will slow down the read.
Your ultimate issue is a bug in your format code - you have an extra comma in there - see the ERROR in the log?
Here's a fully revised code that should be cleaner for you but obviously untested since I don't have your data.
/*****************
Import data
******************/
FILENAME REFFILE '/home/u57317860/Capstone data/vaccine_hesitancy_data.csv';
PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=WORK.vaccine_hesitancy_data;
GETNAMES=YES;
GUESSINGROWS=MAX;
RUN;
PROC CONTENTS DATA=WORK.vaccine_hesitancy_data;
RUN;
DATA vax_data;
SET vaccine_hesitancy_data;
IF VACCINEUPTAKE= 0 AND VACCINEINTEND IN (2,3,4,5,9) THEN VACCINEHESITANT=1;
ELSE VACCINEHESITANT= 0;
*check if the value is UNKNOWN;
IF SEX IN ("Unknow", "") THEN DELETE;
if AGE < 30 then AGE_CAT30 = "18-30";
else if AGE = >30 then AGE_CAT30 = "30-65";
IF AGE > 65 THEN AGE_CAT65= "65+";
ELSE IF AGE = < 65+ THEN AGE_CAT65 = "UNDER65+";
RUN;
PROC FREQ data=vax_data;
TABLES (VACCINEUPTAKE VACCINEINTEND SEX) * VACCINEHESITANT/ MISSING CHISQ;
RUN;
PROC TTEST data=vax_data;
VAR AGE;
CLASS VACCINEHESITANT;
RUN;
PROC FREQ data=vax_data;
TABLES (SEX AGE POP EDUCATION IMPACTPHYSICAL IMPACTMENTAL IMPACTFAMILY IMPACTEMPLOYMENT) * VACCINEHESITANT/ MISSING CHISQ;
RUN;
*********ATTEMPTING TO RUN EDUCATION VARIABLE*******************;
PROC FREQ data=WORK.vax_data;
TABLES (EDUCATION) * VACCINEHESITANT/ MISSING CHISQ;
RUN;
PROC FREQ data=WORK.vax_data;
TABLES (AGE) * VACCINEHESITANT/ MISSING CHISQ;
PROC TTEST data=WORK.vax_data;
VAR AGE;
CLASS VACCINEHESITANT;
RUN;
PROC FORMAT;
VALUE POP_FMT
1= 'NHW'
2,3,4,5,6,7 = 'Other';
RUN;
PROC FREQ data=vax_data;
TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;
format POP POP_FMT.;
RUN;
PROC FREQ data=WORK.vax_data;
where not missing(livetestpos);
TABLES (livetestpos) * VACCINEHESITANT/ MISSING CHISQ;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.