Hello,
I'm working with birth certificate data and my raw data has the variable "MonthCareBegan" which is a character variable listed as a number 1-99 (missing data is either a blank, question mark or #99). I want to make three new variables: tri1, tri2, tri3 to determine which trimester the mother started prenatal care.
data project.project1;
set project.project1;
length tri1 $3.
tri2 $3.
tri3 $3.;
If MonthCareBegan=' ' or '99' or '?' then tri1=' ' and tri2=' ' and tri3=' ';
Else if MonthCareBegan = "1" OR "2" OR "3" then tri1="Yes";
else if MonthCareBegan = "4" OR "5" OR "6" then tri2="Yes";
else tri3="Yes";
run;
But this is what i'm getting...
Any suggestions?
@krisk wrote:
Hello,
I'm working with birth certificate data and my raw data has the variable "MonthCareBegan" which is a character variable listed as a number 1-99 (missing data is either a blank, question mark or #99). I want to make three new variables: tri1, tri2, tri3 to determine which trimester the mother started prenatal care.
data project.project1;
set project.project1;
length tri1 $3.
tri2 $3.
tri3 $3.;
If MonthCareBegan=' ' or '99' or '?' then tri1=' ' and tri2=' ' and tri3=' ';
Else if MonthCareBegan = "1" OR "2" OR "3" then tri1="Yes";
else if MonthCareBegan = "4" OR "5" OR "6" then tri2="Yes";
else tri3="Yes";
run;
But this is what i'm getting...
Any suggestions?
Did you read the log for the data step?
If you want to assign values to multiple variables after an "IF" then you use a do block:
If MonthCareBegan=' ' or MonthCareBegan= '99' or MonthCareBegan= '?' then do; tri1=' ' ; tri2=' ' ; tri3=' '; end;
Second if you want to compare a single variable to a list of values you either repeat the variable name with each comparison:
Else if MonthCareBegan = "1" OR MonthCareBegan = "2" OR MonthCareBegan = "3" then tri1="Yes";
Or specifically for checking multiple values as equal us the IN operator:
if MonthCareBegan IN ( "1" "2" "3") then tri1="Yes";
(both your variables)
And a final hint: You are likely much better off creating a single variable that holds the trimester of start
if MonthCareBegan IN ( "1" "2" "3") then tri = 1; else if MonthCareBegan IN ( "4" "5" "6") then tri=2; else if MonthCareBegan IN ( "7" "8" "9") then tri=3; label tri='Trimester prenatal care starts';
Then when some asks a question like "are there differences in when prenatal care starts between (values of some other demographic category like race, age, ethnicity, geographic location) you use something like
Proc freq data=have;
tables tri * (other variable).
;
Otherwise with 3 variables you get to manually do lots of extra work to get meaningful percentages.
For almost any purpose involving categories it is more flexible to have a single variable holding the categories instead of creating multiple variables.
And you will find that creating custom formats may even be easier.
data have; input MonthCareBegan $; datalines; ' ' . 99 1 2 3 4 5 6 7 8 9 ; proc format; value $tri "1","2","3"= "1" "4","5","6"= "2" "7","8","9"= "3" other= . ; ; proc freq data=have; tables monthcarebegan /; format monthcarebegan $tri.; run;
Groups created by custom formats will be honored by all the SAS reporting and analysis procedures and most graphing procedures.
Advantage of formats is you do not have to change data. I could change the FORMAT definition values to read "1st Trimester" and similar and not change the data and the summary would read nicely.
Plus if someone wants to know about 1st trimester vs 2d and 3rd combined, a different custom format would work without changing the data.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.