Hi,
I have a dataset in which there is a variable (diagnosis_dt) for which there are values in the DDMONYYYY format. I want to create a categorical variable out of this data, and I want to make 3 categories: <= 1 year, 2-7 years, and >= 8 years.
My original set is EA1_1.
Here is what I have tried. I admittedly have no idea what I am doing. When I input this, all of my outputs end up being 2-7 years.
data EA1_2;
set EA1_1;
length TimeSinceDiagnosis $3.;
IF diagnosis_dt > = '31DEC2012'd THEN TimeSinceDiagnosis ='<1yr';
ELSE IF diagnosis_dt < = '31DEC2012'd or diagnosis_dt > = '31DEC2005'd THEN TimeSinceDiagnosis = '2-7yrs';
ELSE IF diagnosis_dt < = '01JAN2005'd THEN TimeSinceDiagnosis = '> 8yrs';
run;
1) Specify a greater length to TimeSinceDiagnosis.
2) All the diagnosis dates I can see are before 31DEC2012. So the first else-if statement is always fulfilled, since you specify an OR condition and not an AND condition. That way, you never get to the >8Years part. Try the code below.
data EA1_2;
set EA1_1;
length TimeSinceDiagnosis $20;
IF diagnosis_dt > = '31DEC2012'd THEN
TimeSinceDiagnosis ='<1yr';
ELSE IF diagnosis_dt < = '31DEC2012'd and diagnosis_dt > = '31DEC2005'd THEN
TimeSinceDiagnosis = '2-7yrs';
ELSE IF diagnosis_dt < = '01JAN2005'd THEN
TimeSinceDiagnosis = '> 8yrs';
run;
Can you show us what your data looks like?
Here is a screenshot of what I am looking at!
1) Specify a greater length to TimeSinceDiagnosis.
2) All the diagnosis dates I can see are before 31DEC2012. So the first else-if statement is always fulfilled, since you specify an OR condition and not an AND condition. That way, you never get to the >8Years part. Try the code below.
data EA1_2;
set EA1_1;
length TimeSinceDiagnosis $20;
IF diagnosis_dt > = '31DEC2012'd THEN
TimeSinceDiagnosis ='<1yr';
ELSE IF diagnosis_dt < = '31DEC2012'd and diagnosis_dt > = '31DEC2005'd THEN
TimeSinceDiagnosis = '2-7yrs';
ELSE IF diagnosis_dt < = '01JAN2005'd THEN
TimeSinceDiagnosis = '> 8yrs';
run;
Anytime 🙂
Your mistake is here
ELSE IF diagnosis_dt < = '31DEC2012'd or diagnosis_dt > = '31DEC2005'd
it should be
ELSE IF diagnosis_dt < = '31DEC2012'd and diagnosis_dt > = '31DEC2005'd
You can streamline your code to make it easier to use in the future:
%let now='01jul2013'd; /* just one date in 2013 */
/* this is all you have to set in the future */
/* the following calculates the cutoff dates from the initial value */
%let cut1 = %sysfunc(intnx(year,&now.,-1,e));
%let cut7 = %sysfunc(intnx(year,&now.,-8,e));
/* the following is for informational purposes only, you can remove it later */
data _null_;
cut1 = &cut1;
cut7 = &cut7;
put cut1= yymmdd10.;
put cut7= yymmdd10.;
run;
/* this is your streamlined code */
data EA1_2;
set EA1_1;
length TimeSinceDiagnosis $7.;
if diagnosis_dt > &cut1. then TimeSinceDiagnosis = '<1yr';
else if diagnosis_dt > &cut7. then TimeSinceDiagnosis = '2-7yrs';
else TimeSinceDiagnosis = '>=8yrs';
run;
/* note that the else-if makes some conditions redundant */
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.