BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Guerraje
Quartz | Level 8

Hello there, I am trying to run code on a set of individuals 18-90. I want to group them 18-30, 30-65 and 65+. 

When running the code no changes take place. I am wondering if I could get some help with why my code is not working. 

Thank you !

 

if AGE < 30 then AGE_CAT30 = "18-30";
else if AGE = >30 then AGE_CAT30 = "30-65";

IF AGE > 65 THEN AGE_CAT65= "65+";
ELSE IF AGE = < 65+ THEN AGE_CAT65 = "UNDER65+";

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Okay, how about this:

 

proc format;
    value agef 0-29='0-29' 30-65='30-65' 65-high='>65';
run;

proc freq data = have;
    tables age*vaccinehesitant;
    format age agef.;
run;

Your method won't work because you have split the ages into TWO DIFFERENT age variables each with two levels, one named AGE_CAT30 and AGE_CAT65, when you really just need the age variable split three ways, not split into two variables with two levels.

--
Paige Miller

View solution in original post

15 REPLIES 15
Reeza
Super User
Please show the log from your code and the full code.
Guerraje
Quartz | Level 8


if AGE < 30 then AGE_CAT30 = "18-30";
else if AGE = >30 then AGE_CAT30 = "30-65";

IF AGE > 65 THEN AGE_CAT65= "65+";
ELSE IF AGE = < 65+ THEN AGE_CAT65 = "UNDER65+";

 

RUN;

PROC FREQ data=WORK.vaccine_hesitancy_data;
TABLES (AGE) * VACCINEHESITANT/ MISSING CHISQ;

PROC TTEST data=WORK.vaccine_hesitancy_data;
VAR AGE;
CLASS VACCINEHESITANT;
RUN;

PaigeMiller
Diamond | Level 26

This isn't the full code, your IF statements must be in a DATA step, please show us the full DATA step.

 

Also, please show us the log for this code. (All of it, every single line, with nothing deleted)

--
Paige Miller
Guerraje
Quartz | Level 8

Attached is my full log and full code.

 

Thank you for all the help 

Reeza
Super User

Reposting my initial solution - from your previous question. 

Your issue here shows some of the issues I noted in the previous thread. You really shouldn't code in that fashion, it makes your work hard to debug and fix. 

Here's the updated code with that section corrected - note that I cannot test the code as I don't have your data so there's still the possibility of errors in here. I would highly suggest at minimum you add the GUESSINGROWS=MAX to your proc import and having the same data set name in the DATA and SET statement is very bad programming style and will cause you many issues over time. 

 

data work.vaccine_hesitancy_data;
set work.vaccine_hesitancy_data;

Fixed code:

 

/*****************
Import data
******************/
FILENAME REFFILE '/home/u57317860/Capstone data/vaccine_hesitancy_data.csv';


PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=WORK.vaccine_hesitancy_data;
GETNAMES=YES;
GUESSINGROWS=MAX;
RUN;

PROC CONTENTS DATA=WORK.vaccine_hesitancy_data;
 RUN;


DATA vax_data;

SET vaccine_hesitancy_data;

IF VACCINEUPTAKE= 0 AND VACCINEINTEND IN (2,3,4,5,9) THEN VACCINEHESITANT=1;
ELSE VACCINEHESITANT= 0;

*check if the value is UNKNOWN;
IF SEX IN ("Unknow", "") THEN DELETE;

if AGE < 30 then AGE_CAT30 = "18-30";
else if AGE >=30 then AGE_CAT30 = "30-65";

IF AGE > 65 THEN AGE_CAT65= "65+";
ELSE IF AGE <= 65 THEN AGE_CAT65 = "UNDER65+";



RUN;


PROC FREQ data=vax_data;
TABLES (VACCINEUPTAKE VACCINEINTEND SEX) * VACCINEHESITANT/ MISSING CHISQ;
RUN;

PROC TTEST data=vax_data;
VAR AGE;
CLASS VACCINEHESITANT;
RUN;


PROC FREQ data=vax_data;
TABLES (SEX AGE POP EDUCATION IMPACTPHYSICAL IMPACTMENTAL IMPACTFAMILY IMPACTEMPLOYMENT) * VACCINEHESITANT/ MISSING CHISQ;
RUN;

*********ATTEMPTING TO RUN EDUCATION VARIABLE*******************;
PROC FREQ data=WORK.vax_data;
TABLES (EDUCATION) * VACCINEHESITANT/ MISSING CHISQ;
RUN;




PROC FREQ data=WORK.vax_data;
TABLES (AGE) * VACCINEHESITANT/ MISSING CHISQ;

PROC TTEST data=WORK.vax_data;
VAR AGE;
CLASS VACCINEHESITANT;
RUN;

PROC FORMAT;
VALUE POP_FMT
1= 'NHW'
2,3,4,5,6,7 = 'Other';
RUN;


PROC FREQ data=vax_data;

TABLES (POP) * VACCINEHESITANT/ MISSING CHISQ;

format POP POP_FMT.;

RUN;


PROC FREQ data=WORK.vax_data;
where not missing(livetestpos);
TABLES (livetestpos) * VACCINEHESITANT/ MISSING CHISQ;
run;
Guerraje
Quartz | Level 8
I ran the code exactly as you have it above and my FREQ Procedure table for age still list every individual row for ages 18-90 instead of grouping them into
18-30
30-65
65 and over

Thank you,
Reeza
Super User

Did you run the PROC FREQ on the new variables, not AGE?

So your new variables are AGE_CAT30 and AGE_CAT65. 

 

Code shown only references the original variable AGE, not the newly recoded values so you'll see only the original values. 

 

proc freq data=vax_data;
table age*age_cat30;
table age*age_cat65;
table age_cat30;
table age_cat65;
run;

@Guerraje wrote:
I ran the code exactly as you have it above and my FREQ Procedure table for age still list every individual row for ages 18-90 instead of grouping them into
18-30
30-65
65 and over

Thank you,

 

 

 

PaigeMiller
Diamond | Level 26
ELSE IF AGE = < 65+ THEN AGE_CAT65 = "UNDER65+";

 

This is invalid syntax in SAS. The + sign is invalid in age<=65+. Remove it and your code should work.

--
Paige Miller
Guerraje
Quartz | Level 8
I removed the + and all of my ages are still being shown as an individual row for ages 18 all the way through 90. I am still unable to categorize them 18-30, 30-65, and 65 and over

Thank you,
PaigeMiller
Diamond | Level 26

@Guerraje wrote:
I removed the + and all of my ages are still being shown as an individual row for ages 18 all the way through 90. I am still unable to categorize them 18-30, 30-65, and 65 and over

Thank you,

Yes, the code I fixed leaves everything as individual rows. What type of categorization do you need? Show us the desired output.

--
Paige Miller
Guerraje
Quartz | Level 8

I used word to make a mock table of what I am trying to do. 

 

My project is looking at vaccine hesitancy. Vaccine hesitancy is coded as 1, and nonhesitant is coded as 0. 

so my columns are labeled as:

AGE   HESTIANT (1)   NONHESITANT (0)

18-30

30-65

65+

Then my age as the rows. 

 

I am trying to make all the numeric age rows into categorical rows instead. 

 

thank you! 

 

ballardw
Super User

Do you define a length for the Age_cat65 before use? If not the code you use will assign a length of 3 characters because the first time you use the variable you assign "65+" and that is only 3 characters. So "Under65+" will only show 3 characters.

 

Strongly suggest that you consider learning to use formats. Then you likely only need one variable Age, and can create groups that are used by reporting, analysis and graphing procedures. A short example using a data set you should have available:

Proc format;
value age13_
low - 13='13 and under'
14 - high='14 and older'
;
value age3grps
11,12,13 = '11,12 and 13'
14,15 = '14 and 15'
16,17 = '16 and 17'
;

title "Age13_ format";
proc print data=sashelp.class;
  var name age;
  format age age13_.;
run;
proc freq data=sashelp.class;
   format age age13_.;
   tables sex*age;
run;
title "Age3grps format";
proc print data=sashelp.class;
  var name age;
  format age age3grps.;
run;
proc freq data=sashelp.class;
   format age age3grps.;
   tables sex*age;
run;
title;

Important: Format names cannot end in a numeric value as the number would be interpreted as the number of characters to display.

 

I have had projects where we reported on up to 8 "standard" age groups. Then often we would get a question about "what happens if we change the age boundary from 25 to 26" (or 27, or multiple differences). Instead of creating a bunch of hard to keep track of variables we just make a new format and use that for the analysis or report.

SAS also has a format type called Multilabel that would make overlapping groups as separate categories for a small number of procedures that honor them.

Guerraje
Quartz | Level 8
My age range is from 18-90
would I theoretically in the step below list all of my ages for example 18,19,20....all the way up to 90 for this to work?:
value age3grps
11,12,13 = '11,12 and 13'
14,15 = '14 and 15'
16,17 = '16 and 17'
;
PaigeMiller
Diamond | Level 26

Okay, how about this:

 

proc format;
    value agef 0-29='0-29' 30-65='30-65' 65-high='>65';
run;

proc freq data = have;
    tables age*vaccinehesitant;
    format age agef.;
run;

Your method won't work because you have split the ages into TWO DIFFERENT age variables each with two levels, one named AGE_CAT30 and AGE_CAT65, when you really just need the age variable split three ways, not split into two variables with two levels.

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 3741 views
  • 8 likes
  • 4 in conversation