BookmarkSubscribeRSS Feed
Guerraje
Quartz | Level 8

I am working on creating groups and giving them new lables. 

 

I am trying to create two groups NHW (NonHispanicWhite) and OTHER.

Right now the data allows for 7 different answers for race/ethnicity. I am trying to make this into two groups and label them accordingly. The way I am trying to do this is have everyone who answered 1 be NHW and everyone who answered 2,3,4,5,6,7 be OTHER. NHW is to = 1 and Other is to = 0 and then have the correct labels. My code is below of me attempting this: Pop is what race/ethnicity is coded as in my data dictionary

Lastly, i am trying to see if race/ethnicity has a correlation with hesitancy which is what my last piece of code is down there. 

not sure if I should be using a IF ELSE statement or the PROC FORMAT 

Thank you!  

 

DATA WORK.vaccine_hesitancy_data;

SET WORK.vaccine_hesitancy_data;

PROC FORMAT;
VALUE POP 1= 'NHW'
2,3,4,5,6,7,= 'Other';


PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;
RUN;

11 REPLIES 11
Guerraje
Quartz | Level 8

I am working on a project where I am trying to find levels of hesitancy in different demographics. 

 

right now I am trying to group my race and ethnicity group into two groups instead of seven since some groups have too low of respondents to work with. 

 

My race/ethnicity is coded as 'pop' in the data dictionary. 

I am trying to make everyone who answered 1 = NHW (nonhispanic white)

I am trying to make everyone who answered anything other than that (2,3,4,5,6,7) = 0 

I am trying to make 1's label be NHM

I am trying to make 0's label be Other

 

lastly I am trying to run a table to see their hesitancy which I have already coded for. Below is my current code attempting this. Thank you for any insight! I am not sure if I should be using an IF ELSE statement or possibily a PROC FORMAT

*******************************************

 

DATA WORK.vaccine_hesitancy_data;

SET WORK.vaccine_hesitancy_data;

PROC FORMAT;
VALUE POP 1= 'NHW'
2,3,4,5,6,7,= 'Other';


PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;
RUN;

ballardw
Super User

Essentially identical question combined.

data_null__
Jade | Level 19

You need to associate the FORMAT with a variable.  

 

format pop pop.;

Assuming the variable is also POP but I suspect that ETHNIC or RACE is more likely.

Reeza
Super User


PROC FORMAT;
VALUE POP_FMT
1= 'NHW'
2,3,4,5,6,7,= 'Other';
RUN;


PROC FREQ data=vaccine_hesitancy_data;

TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;

format POP POP_FMT.;

RUN;

You didn't apply the format. 

And just a note that your first two lines (DATA/SET) do nothing. 

 


@Guerraje wrote:

I am working on creating groups and giving them new lables. 

 

I am trying to create two groups NHW (NonHispanicWhite) and OTHER.

Right now the data allows for 7 different answers for race/ethnicity. I am trying to make this into two groups and label them accordingly. The way I am trying to do this is have everyone who answered 1 be NHW and everyone who answered 2,3,4,5,6,7 be OTHER. NHW is to = 1 and Other is to = 0 and then have the correct labels. My code is below of me attempting this: Pop is what race/ethnicity is coded as in my data dictionary

Lastly, i am trying to see if race/ethnicity has a correlation with hesitancy which is what my last piece of code is down there. 

not sure if I should be using a IF ELSE statement or the PROC FORMAT 

Thank you!  

 

DATA WORK.vaccine_hesitancy_data;

SET WORK.vaccine_hesitancy_data;

PROC FORMAT;
VALUE POP 1= 'NHW'
2,3,4,5,6,7,= 'Other';


PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;
RUN;


 

Guerraje
Quartz | Level 8
When running that code my table is still split up in all seven responses possible for race/ethnicity instead of two groups
Guerraje
Quartz | Level 8
 
Reeza
Super User
Show your full code and log and please.
Guerraje
Quartz | Level 8
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
68
69 PROC FORMAT;
70 VALUE POP_FMT
71 1= 'NHW'
72 2,3,4,5,6,7,= 'Other';
_
22
76
ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant,
a missing value, LOW, OTHER.

ERROR 76-322: Syntax error, statement will be ignored.

NOTE: The previous statement has been deleted.
73 RUN;

NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 269.43k
OS Memory 34992.00k
Timestamp 10/05/2021 07:28:55 PM
Step Count 128 Switch Count 0
Page Faults 0
Page Reclaims 14
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 16

NOTE: The SAS System stopped processing this step because of errors.
74
75


76 PROC FREQ data=vaccine_hesitancy_data;
77
78 TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;
79
80 format POP POP_FMT.;
81
82 RUN;

NOTE: There were 5993 observations read from the data set WORK.VACCINE_HESITANCY_DATA.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.06 seconds
user cpu time 0.06 seconds
system cpu time 0.01 seconds
memory 3488.06k
OS Memory 36540.00k
Timestamp 10/05/2021 07:28:55 PM
Step Count 129 Switch Count 4
Page Faults 0
Page Reclaims 376
Page Swaps 0
Voluntary Context Switches 21
Involuntary Context Switches 0
Block Input Operations 816
Block Output Operations 536


83
84 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
94
User: u57317860
ballardw
Super User

Your PROC Format has a value list ending in a comma, hence it says something is missing.

Instead of

PROC FORMAT;
VALUE POP_FMT
1= 'NHW'
2,3,4,5,6,7,= 'Other';
RUN;

The format value list should be:

PROC FORMAT;
VALUE POP_FMT
1= 'NHW'
2,3,4,5,6,7= 'Other';
RUN;


 

If your intent was to include missing values you have to include the missing value indicator . in the list.

Or better if you did want missing values in the other category is to use:

PROC FORMAT;
VALUE POP_FMT
1= 'NHW'
other = 'Other';
RUN;

the "other" in a value list says to apply this label to all values not explicitly included in another range list.

 

 

Since your syntax was incorrect the format was not created. So there was no format available to apply to your Proc Print (or other procedures) output.

Guerraje
Quartz | Level 8

As asked this is my entire code, this contains everything I am working on.

I am stuck at a few places in this code. This forum i am trying to target the race/ethnicity and trying to combine it into the two groups of NHW and other which is at the bottom.

Thank you 

 

/* Generated Code (IMPORT) */
/* Source File: vaccine_hesitancy_data.csv */
/* Source Path: /home/u57317860/Capstone data */
/* Code generated on: 9/24/21, 4:45 AM */

%web_drop_table(WORK.vaccine_hesitancy_data);


FILENAME REFFILE '/home/u57317860/Capstone data/vaccine_hesitancy_data.csv';

PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=WORK.vaccine_hesitancy_data;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=WORK.vaccine_hesitancy_data; RUN;


%web_open_table(WORK.vaccine_hesitancy_data);

DATA WORK.vaccine_hesitancy_data;

SET WORK.vaccine_hesitancy_data;

IF VACCINEUPTAKE= 0 AND VACCINEINTEND IN (2,3,4,5,9) THEN VACCINEHESITANT=1;
ELSE VACCINEHESITANT= 0;
IF SEX IN ("Unknow", "") THEN DELETE;

RUN;
PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (VACCINEUPTAKE VACCINEINTEND SEX) * VACCINEHESITANT/ MISSING CHISQ;
RUN;

PROC TTEST data=WORK.vaccine_hesitancy_data;
VAR AGE;
CLASS VACCINEHESITANT;
RUN;


PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (SEX AGE POP EDUCATION IMPACTPHYSICAL IMPACTMENTAL IMPACTFAMILY IMPACTEMPLOYMENT) * VACCINEHESITANT/ MISSING CHISQ;
RUN;

*********ATTEMPTING TO RUN EDUCATION VARIABLE*******************;
PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (EDUCATION) * VACCINEHESITANT/ MISSING CHISQ;
RUN;


*********ATTEMPTING TO CREATE AGE VARIABLES AND RUN T TEST *****;
DATA WORK.vaccine_hesitancy_data;

SET WORK.vaccine_hesitancy_data;
if AGE < 30 then AGE_NEW = "18-30";
else if AGE = >30 then AGE_NEW = "30-65";

DATA WORK.vaccine_hesitancy_data;

SET WORK.vaccine_hesitancy_data;

IF AGE > 65 THEN AGE NEW= "65+";
ELSE IF AGE = < 65+ THEN AGE NEW = "UNDER65+";

PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (AGE) * VACCINEHESITANT/ MISSING CHISQ;

PROC TTEST data=WORK.vaccine_hesitancy_data;
VAR AGE;
CLASS VACCINEHESITANT;
RUN;

************* Trying to turn race/ethnicity into two variables nhw and other****;
PROC FORMAT;
VALUE POP_FMT
1= 'NHW'
2,3,4,5,6,7,= 'Other';
RUN;


PROC FREQ data=vaccine_hesitancy_data;

TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;

format POP POP_FMT.;

RUN;

****************** HAS ANYONE THAT LIVES WITH YOU TESTED POSITIVE this code deletes the unknown delete the if then if we decide not to do this *****;
DATA WORK.vaccine_hesitancy_data;

SET WORK.vaccine_hesitancy_data;
IF livetestpos = '.' then delete;
RUN;

PROC FREQ data=WORK.vaccine_hesitancy_data;

TABLES (livetestpos) * VACCINEHESITANT/ MISSING CHISQ;

run;

Reeza
Super User

DO NOT code like this:

DATA WORK.vaccine_hesitancy_data;

SET WORK.vaccine_hesitancy_data;

This means each time you're replacing your original data set. If you make a mistake you've destroyed your original data. 

Also, organize your code so that you have it as follows:

 

  1. Data import
  2. Data wrangling -> includes cleaning, recategorizing, labelling etc
  3. Supplemental processes - example formats or adding other data for lookups
  4. Data analysis/reporting  all together. If in the process of analysis, you realize you need a new variable go back to Step 2 and fix it there. 

FYI - given your code I suspect your data wasn't imported correctly and got truncated somewhere. I would recommend not using PROC IMPORT or verifying the data against the source again. In the example below I've used GUESSINGROWS=MAX to ensure the data is read in correctly, but it will slow down the read. 

 

Your ultimate issue is a bug in your format code - you have an extra comma in there - see the ERROR in the log?

 

Here's a fully revised code that should be cleaner for you but obviously untested since I don't have your data.

 


/*****************
Import data
******************/
FILENAME REFFILE '/home/u57317860/Capstone data/vaccine_hesitancy_data.csv';


PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=WORK.vaccine_hesitancy_data;
GETNAMES=YES;
GUESSINGROWS=MAX;
RUN;

PROC CONTENTS DATA=WORK.vaccine_hesitancy_data;
 RUN;


DATA vax_data;

SET vaccine_hesitancy_data;

IF VACCINEUPTAKE= 0 AND VACCINEINTEND IN (2,3,4,5,9) THEN VACCINEHESITANT=1;
ELSE VACCINEHESITANT= 0;

*check if the value is UNKNOWN;
IF SEX IN ("Unknow", "") THEN DELETE;

if AGE < 30 then AGE_CAT30 = "18-30";
else if AGE = >30 then AGE_CAT30 = "30-65";

IF AGE > 65 THEN AGE_CAT65= "65+";
ELSE IF AGE = < 65+ THEN AGE_CAT65 = "UNDER65+";



RUN;


PROC FREQ data=vax_data;
TABLES (VACCINEUPTAKE VACCINEINTEND SEX) * VACCINEHESITANT/ MISSING CHISQ;
RUN;

PROC TTEST data=vax_data;
VAR AGE;
CLASS VACCINEHESITANT;
RUN;


PROC FREQ data=vax_data;
TABLES (SEX AGE POP EDUCATION IMPACTPHYSICAL IMPACTMENTAL IMPACTFAMILY IMPACTEMPLOYMENT) * VACCINEHESITANT/ MISSING CHISQ;
RUN;

*********ATTEMPTING TO RUN EDUCATION VARIABLE*******************;
PROC FREQ data=WORK.vax_data;
TABLES (EDUCATION) * VACCINEHESITANT/ MISSING CHISQ;
RUN;




PROC FREQ data=WORK.vax_data;
TABLES (AGE) * VACCINEHESITANT/ MISSING CHISQ;

PROC TTEST data=WORK.vax_data;
VAR AGE;
CLASS VACCINEHESITANT;
RUN;

PROC FORMAT;
VALUE POP_FMT
1= 'NHW'
2,3,4,5,6,7 = 'Other';
RUN;


PROC FREQ data=vax_data;

TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;

format POP POP_FMT.;

RUN;


PROC FREQ data=WORK.vax_data;
where not missing(livetestpos);
TABLES (livetestpos) * VACCINEHESITANT/ MISSING CHISQ;
run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 784 views
  • 4 likes
  • 4 in conversation