Solved: How to create a new variable for numeric variable and get survival tim...

nwang5 · Posted 09-10-2023 01:53 PM

Hi SAS community,

I hope you are doing well. I'm studying recurrent depression risk factors using a dataset with 'ID' and 'Depression9-Depression15' variables. The study period between depression 9 and depression 10 was two years. I want to categorize outcomes as 'recurrent,' 'depression,' or 'no depression.' 'Recurrent' means experiencing depression twice. Notably, 'Depression9' participants didn't have depression. How can I create this outcome and calculate survival time in these columns? Thanks for your help!

data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent 6
2 0 0 1 0 . 1 . Recurrent 10
3 0 0 0 0 0 0 0 No Depression 12
4 0 . 1 0 1 . . Recurrent 8
5 0 . . 1 1 1 1 Depression 6
6 0 1 . . 0 1 . Recurrent 10
7 0 . 1 . . 0 1 Recurrent 12
8 0 0 . . . 0 . No Depression 10
9 0 . 1 1 0 . . Depression 4
10 0 1 0 1 1 0 1 Recurrent 6
run;

mkeintz · Posted 09-10-2023 08:40 PM

This code reproduces the results described. But, it assigns a single depression episode as "Depression" in the absence of specific instructions:

data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent       6
2 0 0 1 0 . 1 . Recurrent      10
3 0 0 0 0 0 0 0 No Depression  12
4 0 . 1 0 1 . . Recurrent       8
5 0 . . 1 1 1 1 Depression      6
6 0 1 . . 0 1 . Recurrent      10
7 0 . 1 . . 0 1 Recurrent      12
8 0 0 . . . 0 . No Depression  10
9 0 . 1 1 0 . . Depression      4
10 0 1 0 1 1 0 1 Recurrent      6
run;

data want (drop=i _:);
  set have;

  array dep {*} depression: ;
  /*With 0 and . as word separators, count N of "words" in the concatenated depression sequence*/
  _n_depression_cycles=countw(cat(of dep{*}),'0.');

  if _n_depression_cycles=0 then do;
    outcome='No Depression';
    do i=dim(dep) to 1 by -1 until(dep{i}=0); *Find last zero **;
    end;
  end;
  else if _n_depression_cycles=1 then do;
    outcome='Depression';
    do i=1 to dim(dep) until (dep{i}=1);      *Find first 1 ;
    end;
  end;
  else do;
    outcome='Recurrent' ;                     *Find start of recurrance;
    do i=whichn(1,of dep{*})+2 to dim(dep) until (dep{i}=1 and dep{i-1}^=1);
    end;
  end;

  survtime=2*(i-1);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

PaigeMiller · Posted 09-10-2023 02:04 PM

Why is ID 5 shown as "Depression" and not "Recurrent"?

How is exp_survtime calculated from this data? Please explain in words.

What does this sentence have to do with solution to the problem? "The study period between depression 9 and depression 10 was two years."

--
Paige Miller

nwang5 · Posted 09-10-2023 02:16 PM

Thank you for quick response and your question.

ID 5 did not recover after depression 6, so it's categorized as having depression.

The survival time was defined as the duration from depression 9 to the first occurrence of depression, resulting in a survival time of six years for ID 5.

For recurrent depression, let's look at ID 1 as an example. ID 1 experienced depression in depression 10, recovered in depression 11, and then had another episode of depression in depression 12. Therefore, ID 1 is classified as recurrent. The time gap between each depression episode is two years. The time difference between depression 12 and depression 9 is 3 times 2, which equals 6 years.

PaigeMiller · Posted 09-10-2023 02:39 PM

Thank you. None of this was stated in the original problem statement, making solutions based upon the original problem statement impossible. It would be a very good idea for you to include all relevant information from now on in your original problem statement, you will get faster and better answers that way.

proc format;
    value outf 0='No Depression' 1='Depression' 2='Recurrent';
run;
data want;
    set have(drop=exp:);
    array depr depression:;
	first_occurence_of_1=whichn(1,of depression:);
	exp_survtime=(first_occurence_of_1-1)*2;
	zero_flag=0;
	one_flag=0;
	if sum(of depression:)=0 then expected_outcome=0;
	else do i=(first_occurence_of_1+1) to dim(depr);
	    if depr(i)=0 then zero_flag=i;
	    if depr(i)=1 then one_flag=i;
	    if zero_flag>0 and one_flag>zero_flag then do;
	        expected_outcome=2;
	        leave;
        end;
        if one_flag>0 and zero_flag=0 then do;
            expected_outcome=1;
            leave;
        end;
    end;
    drop i one_flag zero_flag;
    format expected_outcome outf.;
run;

--
Paige Miller

mkeintz · Posted 09-10-2023 04:19 PM

Thank you for providing a ~~working~~ data step for code testing.

Edit: (On second look, it's not working as submitted - please see note at bottom.)

Question:

What OUTCOME is assigned if a person experiences depression only once,

as the last depression value?
as a middle value?

---------------------------------------

The program submitted for sample data was apparently not tested. It generates only 8 obs from 10 records, as submitted, and generates this log message:

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

More space is needed between expected_outcome and exp_survtime, per below:

data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent       6
2 0 0 1 0 . 1 . Recurrent      10
3 0 0 0 0 0 0 0 No Depression  12
4 0 . 1 0 1 . . Recurrent       8
5 0 . . 1 1 1 1 Depression      6
6 0 1 . . 0 1 . Recurrent      10
7 0 . 1 . . 0 1 Recurrent      12
8 0 0 . . . 0 . No Depression  10
9 0 . 1 1 0 . . Depression      4
10 0 1 0 1 1 0 1 Recurrent      6
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 09-10-2023 08:40 PM

This code reproduces the results described. But, it assigns a single depression episode as "Depression" in the absence of specific instructions:

data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent       6
2 0 0 1 0 . 1 . Recurrent      10
3 0 0 0 0 0 0 0 No Depression  12
4 0 . 1 0 1 . . Recurrent       8
5 0 . . 1 1 1 1 Depression      6
6 0 1 . . 0 1 . Recurrent      10
7 0 . 1 . . 0 1 Recurrent      12
8 0 0 . . . 0 . No Depression  10
9 0 . 1 1 0 . . Depression      4
10 0 1 0 1 1 0 1 Recurrent      6
run;

data want (drop=i _:);
  set have;

  array dep {*} depression: ;
  /*With 0 and . as word separators, count N of "words" in the concatenated depression sequence*/
  _n_depression_cycles=countw(cat(of dep{*}),'0.');

  if _n_depression_cycles=0 then do;
    outcome='No Depression';
    do i=dim(dep) to 1 by -1 until(dep{i}=0); *Find last zero **;
    end;
  end;
  else if _n_depression_cycles=1 then do;
    outcome='Depression';
    do i=1 to dim(dep) until (dep{i}=1);      *Find first 1 ;
    end;
  end;
  else do;
    outcome='Recurrent' ;                     *Find start of recurrance;
    do i=whichn(1,of dep{*})+2 to dim(dep) until (dep{i}=1 and dep{i-1}^=1);
    end;
  end;

  survtime=2*(i-1);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

How to create a new variable for numeric variable and get survival time?

Re: How to create a new variable for numeric variable and get survival time?

Re: How to create a new variable for numeric variable and get survival time?

Re: How to create a new variable for numeric variable and get survival time?

Re: How to create a new variable for numeric variable and get survival time?

Re: How to create a new variable for numeric variable and get survival time?

Re: How to create a new variable for numeric variable and get survival time?

Catch up on SAS Innovate 2026

How to create a new variable for numeric variable and get survival time?

Re: How to create a new variable for numeric variable and get survival time?

Re: How to create a new variable for numeric variable and get survival time?

Re: How to create a new variable for numeric variable and get survival time?

Re: How to create a new variable for numeric variable and get survival time?

Re: How to create a new variable for numeric variable and get survival time?

Re: How to create a new variable for numeric variable and get survival time?

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away