Hello there, I am trying to run code on a set of individuals 18-90. I want to group them 18-30, 30-65 and 65+.
When running the code no changes take place. I am wondering if I could get some help with why my code is not working.
Thank you !
if AGE < 30 then AGE_CAT30 = "18-30";
else if AGE = >30 then AGE_CAT30 = "30-65";
IF AGE > 65 THEN AGE_CAT65= "65+";
ELSE IF AGE = < 65+ THEN AGE_CAT65 = "UNDER65+";
Okay, how about this:
proc format;
value agef 0-29='0-29' 30-65='30-65' 65-high='>65';
run;
proc freq data = have;
tables age*vaccinehesitant;
format age agef.;
run;
Your method won't work because you have split the ages into TWO DIFFERENT age variables each with two levels, one named AGE_CAT30 and AGE_CAT65, when you really just need the age variable split three ways, not split into two variables with two levels.
if AGE < 30 then AGE_CAT30 = "18-30";
else if AGE = >30 then AGE_CAT30 = "30-65";
IF AGE > 65 THEN AGE_CAT65= "65+";
ELSE IF AGE = < 65+ THEN AGE_CAT65 = "UNDER65+";
RUN;
PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (AGE) * VACCINEHESITANT/ MISSING CHISQ;
PROC TTEST data=WORK.vaccine_hesitancy_data;
VAR AGE;
CLASS VACCINEHESITANT;
RUN;
This isn't the full code, your IF statements must be in a DATA step, please show us the full DATA step.
Also, please show us the log for this code. (All of it, every single line, with nothing deleted)
Attached is my full log and full code.
Thank you for all the help
Reposting my initial solution - from your previous question.
Your issue here shows some of the issues I noted in the previous thread. You really shouldn't code in that fashion, it makes your work hard to debug and fix.
Here's the updated code with that section corrected - note that I cannot test the code as I don't have your data so there's still the possibility of errors in here. I would highly suggest at minimum you add the GUESSINGROWS=MAX to your proc import and having the same data set name in the DATA and SET statement is very bad programming style and will cause you many issues over time.
data work.vaccine_hesitancy_data;
set work.vaccine_hesitancy_data;
Fixed code:
/*****************
Import data
******************/
FILENAME REFFILE '/home/u57317860/Capstone data/vaccine_hesitancy_data.csv';
PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=WORK.vaccine_hesitancy_data;
GETNAMES=YES;
GUESSINGROWS=MAX;
RUN;
PROC CONTENTS DATA=WORK.vaccine_hesitancy_data;
RUN;
DATA vax_data;
SET vaccine_hesitancy_data;
IF VACCINEUPTAKE= 0 AND VACCINEINTEND IN (2,3,4,5,9) THEN VACCINEHESITANT=1;
ELSE VACCINEHESITANT= 0;
*check if the value is UNKNOWN;
IF SEX IN ("Unknow", "") THEN DELETE;
if AGE < 30 then AGE_CAT30 = "18-30";
else if AGE >=30 then AGE_CAT30 = "30-65";
IF AGE > 65 THEN AGE_CAT65= "65+";
ELSE IF AGE <= 65 THEN AGE_CAT65 = "UNDER65+";
RUN;
PROC FREQ data=vax_data;
TABLES (VACCINEUPTAKE VACCINEINTEND SEX) * VACCINEHESITANT/ MISSING CHISQ;
RUN;
PROC TTEST data=vax_data;
VAR AGE;
CLASS VACCINEHESITANT;
RUN;
PROC FREQ data=vax_data;
TABLES (SEX AGE POP EDUCATION IMPACTPHYSICAL IMPACTMENTAL IMPACTFAMILY IMPACTEMPLOYMENT) * VACCINEHESITANT/ MISSING CHISQ;
RUN;
*********ATTEMPTING TO RUN EDUCATION VARIABLE*******************;
PROC FREQ data=WORK.vax_data;
TABLES (EDUCATION) * VACCINEHESITANT/ MISSING CHISQ;
RUN;
PROC FREQ data=WORK.vax_data;
TABLES (AGE) * VACCINEHESITANT/ MISSING CHISQ;
PROC TTEST data=WORK.vax_data;
VAR AGE;
CLASS VACCINEHESITANT;
RUN;
PROC FORMAT;
VALUE POP_FMT
1= 'NHW'
2,3,4,5,6,7 = 'Other';
RUN;
PROC FREQ data=vax_data;
TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;
format POP POP_FMT.;
RUN;
PROC FREQ data=WORK.vax_data;
where not missing(livetestpos);
TABLES (livetestpos) * VACCINEHESITANT/ MISSING CHISQ;
run;
Did you run the PROC FREQ on the new variables, not AGE?
So your new variables are AGE_CAT30 and AGE_CAT65.
Code shown only references the original variable AGE, not the newly recoded values so you'll see only the original values.
proc freq data=vax_data;
table age*age_cat30;
table age*age_cat65;
table age_cat30;
table age_cat65;
run;
@Guerraje wrote:
I ran the code exactly as you have it above and my FREQ Procedure table for age still list every individual row for ages 18-90 instead of grouping them into
18-30
30-65
65 and over
Thank you,
ELSE IF AGE = < 65+ THEN AGE_CAT65 = "UNDER65+";
This is invalid syntax in SAS. The + sign is invalid in age<=65+. Remove it and your code should work.
@Guerraje wrote:
I removed the + and all of my ages are still being shown as an individual row for ages 18 all the way through 90. I am still unable to categorize them 18-30, 30-65, and 65 and over
Thank you,
Yes, the code I fixed leaves everything as individual rows. What type of categorization do you need? Show us the desired output.
I used word to make a mock table of what I am trying to do.
My project is looking at vaccine hesitancy. Vaccine hesitancy is coded as 1, and nonhesitant is coded as 0.
so my columns are labeled as:
AGE HESTIANT (1) NONHESITANT (0)
18-30
30-65
65+
Then my age as the rows.
I am trying to make all the numeric age rows into categorical rows instead.
thank you!
Do you define a length for the Age_cat65 before use? If not the code you use will assign a length of 3 characters because the first time you use the variable you assign "65+" and that is only 3 characters. So "Under65+" will only show 3 characters.
Strongly suggest that you consider learning to use formats. Then you likely only need one variable Age, and can create groups that are used by reporting, analysis and graphing procedures. A short example using a data set you should have available:
Proc format; value age13_ low - 13='13 and under' 14 - high='14 and older' ; value age3grps 11,12,13 = '11,12 and 13' 14,15 = '14 and 15' 16,17 = '16 and 17' ; title "Age13_ format"; proc print data=sashelp.class; var name age; format age age13_.; run; proc freq data=sashelp.class; format age age13_.; tables sex*age; run; title "Age3grps format"; proc print data=sashelp.class; var name age; format age age3grps.; run; proc freq data=sashelp.class; format age age3grps.; tables sex*age; run; title;
Important: Format names cannot end in a numeric value as the number would be interpreted as the number of characters to display.
I have had projects where we reported on up to 8 "standard" age groups. Then often we would get a question about "what happens if we change the age boundary from 25 to 26" (or 27, or multiple differences). Instead of creating a bunch of hard to keep track of variables we just make a new format and use that for the analysis or report.
SAS also has a format type called Multilabel that would make overlapping groups as separate categories for a small number of procedures that honor them.
Okay, how about this:
proc format;
value agef 0-29='0-29' 30-65='30-65' 65-high='>65';
run;
proc freq data = have;
tables age*vaccinehesitant;
format age agef.;
run;
Your method won't work because you have split the ages into TWO DIFFERENT age variables each with two levels, one named AGE_CAT30 and AGE_CAT65, when you really just need the age variable split three ways, not split into two variables with two levels.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.