BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
araren
Calcite | Level 5

Hi,

 

I have a dataset in which there is a variable (diagnosis_dt) for which there are values in the DDMONYYYY format. I want to create a categorical variable out of this data, and I want to make 3 categories: <= 1 year, 2-7 years, and >= 8 years. 

 

My original set is EA1_1.

 

Here is what I have tried. I admittedly have no idea what I am doing. When I input this, all of my outputs end up being 2-7 years. 

 

data EA1_2;

        set EA1_1;

  length TimeSinceDiagnosis $3.;

        IF diagnosis_dt > = '31DEC2012'd  THEN TimeSinceDiagnosis ='<1yr';

        ELSE IF diagnosis_dt < =  '31DEC2012'd or diagnosis_dt >  = '31DEC2005'd THEN TimeSinceDiagnosis = '2-7yrs';

ELSE IF diagnosis_dt < =  '01JAN2005'd THEN TimeSinceDiagnosis = '> 8yrs';

    run;

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

1) Specify a greater length to TimeSinceDiagnosis.

 

2) All the diagnosis dates I can see are before 31DEC2012. So the first else-if statement is always fulfilled, since you specify an OR condition and not an AND condition. That way, you never get to the >8Years part. Try the code below.

 

data EA1_2;
    set EA1_1;
    length TimeSinceDiagnosis $20;

    IF diagnosis_dt > = '31DEC2012'd THEN
        TimeSinceDiagnosis ='<1yr';
    ELSE IF diagnosis_dt < =  '31DEC2012'd and diagnosis_dt >  = '31DEC2005'd THEN
        TimeSinceDiagnosis = '2-7yrs';
    ELSE IF diagnosis_dt < =  '01JAN2005'd THEN
        TimeSinceDiagnosis = '> 8yrs';
run;

 

View solution in original post

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

Can you show us what your data looks like?

araren
Calcite | Level 5

Here is a screenshot of what I am looking at!Screen Shot 2020-11-16 at 2.48.24 AM.png

PeterClemmensen
Tourmaline | Level 20

1) Specify a greater length to TimeSinceDiagnosis.

 

2) All the diagnosis dates I can see are before 31DEC2012. So the first else-if statement is always fulfilled, since you specify an OR condition and not an AND condition. That way, you never get to the >8Years part. Try the code below.

 

data EA1_2;
    set EA1_1;
    length TimeSinceDiagnosis $20;

    IF diagnosis_dt > = '31DEC2012'd THEN
        TimeSinceDiagnosis ='<1yr';
    ELSE IF diagnosis_dt < =  '31DEC2012'd and diagnosis_dt >  = '31DEC2005'd THEN
        TimeSinceDiagnosis = '2-7yrs';
    ELSE IF diagnosis_dt < =  '01JAN2005'd THEN
        TimeSinceDiagnosis = '> 8yrs';
run;

 

araren
Calcite | Level 5
It worked! You're right, I didn't understand what the first part of my code meant (the $ and TimeSinceDiagnosis). This helped so much, thank you!
Kurt_Bremser
Super User

Your mistake is here

ELSE IF diagnosis_dt < =  '31DEC2012'd or diagnosis_dt >  = '31DEC2005'd

it should be

ELSE IF diagnosis_dt < =  '31DEC2012'd and diagnosis_dt >  = '31DEC2005'd
Kurt_Bremser
Super User

You can streamline your code to make it easier to use in the future:

%let now='01jul2013'd; /* just one date in 2013 */
/* this is all you have to set in the future */

/* the following calculates the cutoff dates from the initial value */
%let cut1 = %sysfunc(intnx(year,&now.,-1,e));
%let cut7 = %sysfunc(intnx(year,&now.,-8,e));

/* the following is for informational purposes only, you can remove it later */
data _null_;
cut1 = &cut1;
cut7 = &cut7;
put cut1= yymmdd10.;
put cut7= yymmdd10.;
run;

/* this is your streamlined code */
data EA1_2;
set EA1_1;
length TimeSinceDiagnosis $7.;
if diagnosis_dt > &cut1. then TimeSinceDiagnosis = '<1yr';
else if diagnosis_dt  > &cut7. then TimeSinceDiagnosis = '2-7yrs';
else TimeSinceDiagnosis = '>=8yrs';
run;
/* note that the else-if makes some conditions redundant */