BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
nwang5
Obsidian | Level 7

Hi SAS community,

 

I hope you are doing well. I'm studying recurrent depression risk factors using a dataset with 'ID' and 'Depression9-Depression15' variables. The study period between depression 9 and depression 10 was two years. I want to categorize outcomes as 'recurrent,' 'depression,' or 'no depression.' 'Recurrent' means experiencing depression twice. Notably, 'Depression9' participants didn't have depression. How can I create this outcome and calculate survival time in these columns? Thanks for your help!

 

 

data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent 6
2 0 0 1 0 . 1 . Recurrent 10
3 0 0 0 0 0 0 0 No Depression 12
4 0 . 1 0 1 . . Recurrent 8
5 0 . . 1 1 1 1 Depression 6
6 0 1 . . 0 1 . Recurrent 10
7 0 . 1 . . 0 1 Recurrent 12
8 0 0 . . . 0 . No Depression 10
9 0 . 1 1 0 . . Depression 4
10 0 1 0 1 1 0 1 Recurrent 6
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This code reproduces the results described.  But, it assigns a single depression episode as "Depression" in the absence of specific instructions:

 

data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent       6
2 0 0 1 0 . 1 . Recurrent      10
3 0 0 0 0 0 0 0 No Depression  12
4 0 . 1 0 1 . . Recurrent       8
5 0 . . 1 1 1 1 Depression      6
6 0 1 . . 0 1 . Recurrent      10
7 0 . 1 . . 0 1 Recurrent      12
8 0 0 . . . 0 . No Depression  10
9 0 . 1 1 0 . . Depression      4
10 0 1 0 1 1 0 1 Recurrent      6
run;

data want (drop=i _:);
  set have;

  array dep {*} depression: ;
  /*With 0 and . as word separators, count N of "words" in the concatenated depression sequence*/
  _n_depression_cycles=countw(cat(of dep{*}),'0.');

  if _n_depression_cycles=0 then do;
    outcome='No Depression';
    do i=dim(dep) to 1 by -1 until(dep{i}=0); *Find last zero **;
    end;
  end;
  else if _n_depression_cycles=1 then do;
    outcome='Depression';
    do i=1 to dim(dep) until (dep{i}=1);      *Find first 1 ;
    end;
  end;
  else do;
    outcome='Recurrent' ;                     *Find start of recurrance;
    do i=whichn(1,of dep{*})+2 to dim(dep) until (dep{i}=1 and dep{i-1}^=1);
    end;
  end;

  survtime=2*(i-1);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

Why is ID 5 shown as "Depression" and not "Recurrent"?

 

How is exp_survtime calculated from this data? Please explain in words.

 

What does this sentence have to do with solution to the problem? "The study period between depression 9 and depression 10 was two years."

--
Paige Miller
nwang5
Obsidian | Level 7
Thank you for quick response and your question.

ID 5 did not recover after depression 6, so it's categorized as having depression.

The survival time was defined as the duration from depression 9 to the first occurrence of depression, resulting in a survival time of six years for ID 5.

For recurrent depression, let's look at ID 1 as an example. ID 1 experienced depression in depression 10, recovered in depression 11, and then had another episode of depression in depression 12. Therefore, ID 1 is classified as recurrent. The time gap between each depression episode is two years. The time difference between depression 12 and depression 9 is 3 times 2, which equals 6 years.
PaigeMiller
Diamond | Level 26

Thank you. None of this was stated in the original problem statement, making solutions based upon the original problem statement impossible. It would be a very good idea for you to include all relevant information from now on in your original problem statement, you will get faster and better answers that way.

 

proc format;
    value outf 0='No Depression' 1='Depression' 2='Recurrent';
run;
data want;
    set have(drop=exp:);
    array depr depression:;
	first_occurence_of_1=whichn(1,of depression:);
	exp_survtime=(first_occurence_of_1-1)*2;
	zero_flag=0;
	one_flag=0;
	if sum(of depression:)=0 then expected_outcome=0;
	else do i=(first_occurence_of_1+1) to dim(depr);
	    if depr(i)=0 then zero_flag=i;
	    if depr(i)=1 then one_flag=i;
	    if zero_flag>0 and one_flag>zero_flag then do;
	        expected_outcome=2;
	        leave;
        end;
        if one_flag>0 and zero_flag=0 then do;
            expected_outcome=1;
            leave;
        end;
    end;
    drop i one_flag zero_flag;
    format expected_outcome outf.;
run;

 

 

--
Paige Miller
mkeintz
PROC Star

Thank you for providing a working data step for code testing.

Edit: (On second look, it's not working as submitted - please see note at bottom.)

 

Question:

 

What OUTCOME is assigned if a person experiences depression only once,

  1. as the last depression value?
  2. as a middle value?

 

---------------------------------------

The program submitted for sample data was apparently not tested.  It generates only 8 obs from 10 records, as submitted, and generates this log message:

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

More space is needed between expected_outcome and exp_survtime, per below:

 

data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent       6
2 0 0 1 0 . 1 . Recurrent      10
3 0 0 0 0 0 0 0 No Depression  12
4 0 . 1 0 1 . . Recurrent       8
5 0 . . 1 1 1 1 Depression      6
6 0 1 . . 0 1 . Recurrent      10
7 0 . 1 . . 0 1 Recurrent      12
8 0 0 . . . 0 . No Depression  10
9 0 . 1 1 0 . . Depression      4
10 0 1 0 1 1 0 1 Recurrent      6
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

This code reproduces the results described.  But, it assigns a single depression episode as "Depression" in the absence of specific instructions:

 

data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent       6
2 0 0 1 0 . 1 . Recurrent      10
3 0 0 0 0 0 0 0 No Depression  12
4 0 . 1 0 1 . . Recurrent       8
5 0 . . 1 1 1 1 Depression      6
6 0 1 . . 0 1 . Recurrent      10
7 0 . 1 . . 0 1 Recurrent      12
8 0 0 . . . 0 . No Depression  10
9 0 . 1 1 0 . . Depression      4
10 0 1 0 1 1 0 1 Recurrent      6
run;

data want (drop=i _:);
  set have;

  array dep {*} depression: ;
  /*With 0 and . as word separators, count N of "words" in the concatenated depression sequence*/
  _n_depression_cycles=countw(cat(of dep{*}),'0.');

  if _n_depression_cycles=0 then do;
    outcome='No Depression';
    do i=dim(dep) to 1 by -1 until(dep{i}=0); *Find last zero **;
    end;
  end;
  else if _n_depression_cycles=1 then do;
    outcome='Depression';
    do i=1 to dim(dep) until (dep{i}=1);      *Find first 1 ;
    end;
  end;
  else do;
    outcome='Recurrent' ;                     *Find start of recurrance;
    do i=whichn(1,of dep{*})+2 to dim(dep) until (dep{i}=1 and dep{i-1}^=1);
    end;
  end;

  survtime=2*(i-1);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1155 views
  • 3 likes
  • 3 in conversation