BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
nwang5
Obsidian | Level 7

Hi SAS community,

 

I hope you are doing well. I'm studying recurrent depression risk factors using a dataset with 'ID' and 'Depression9-Depression15' variables. The study period between depression 9 and depression 10 was two years. I want to categorize outcomes as 'recurrent,' 'depression,' or 'no depression.' 'Recurrent' means experiencing depression twice. Notably, 'Depression9' participants didn't have depression. How can I create this outcome and calculate survival time in these columns? Thanks for your help!

 

 

data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent 6
2 0 0 1 0 . 1 . Recurrent 10
3 0 0 0 0 0 0 0 No Depression 12
4 0 . 1 0 1 . . Recurrent 8
5 0 . . 1 1 1 1 Depression 6
6 0 1 . . 0 1 . Recurrent 10
7 0 . 1 . . 0 1 Recurrent 12
8 0 0 . . . 0 . No Depression 10
9 0 . 1 1 0 . . Depression 4
10 0 1 0 1 1 0 1 Recurrent 6
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This code reproduces the results described.  But, it assigns a single depression episode as "Depression" in the absence of specific instructions:

 

data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent       6
2 0 0 1 0 . 1 . Recurrent      10
3 0 0 0 0 0 0 0 No Depression  12
4 0 . 1 0 1 . . Recurrent       8
5 0 . . 1 1 1 1 Depression      6
6 0 1 . . 0 1 . Recurrent      10
7 0 . 1 . . 0 1 Recurrent      12
8 0 0 . . . 0 . No Depression  10
9 0 . 1 1 0 . . Depression      4
10 0 1 0 1 1 0 1 Recurrent      6
run;

data want (drop=i _:);
  set have;

  array dep {*} depression: ;
  /*With 0 and . as word separators, count N of "words" in the concatenated depression sequence*/
  _n_depression_cycles=countw(cat(of dep{*}),'0.');

  if _n_depression_cycles=0 then do;
    outcome='No Depression';
    do i=dim(dep) to 1 by -1 until(dep{i}=0); *Find last zero **;
    end;
  end;
  else if _n_depression_cycles=1 then do;
    outcome='Depression';
    do i=1 to dim(dep) until (dep{i}=1);      *Find first 1 ;
    end;
  end;
  else do;
    outcome='Recurrent' ;                     *Find start of recurrance;
    do i=whichn(1,of dep{*})+2 to dim(dep) until (dep{i}=1 and dep{i-1}^=1);
    end;
  end;

  survtime=2*(i-1);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

Why is ID 5 shown as "Depression" and not "Recurrent"?

 

How is exp_survtime calculated from this data? Please explain in words.

 

What does this sentence have to do with solution to the problem? "The study period between depression 9 and depression 10 was two years."

--
Paige Miller
nwang5
Obsidian | Level 7
Thank you for quick response and your question.

ID 5 did not recover after depression 6, so it's categorized as having depression.

The survival time was defined as the duration from depression 9 to the first occurrence of depression, resulting in a survival time of six years for ID 5.

For recurrent depression, let's look at ID 1 as an example. ID 1 experienced depression in depression 10, recovered in depression 11, and then had another episode of depression in depression 12. Therefore, ID 1 is classified as recurrent. The time gap between each depression episode is two years. The time difference between depression 12 and depression 9 is 3 times 2, which equals 6 years.
PaigeMiller
Diamond | Level 26

Thank you. None of this was stated in the original problem statement, making solutions based upon the original problem statement impossible. It would be a very good idea for you to include all relevant information from now on in your original problem statement, you will get faster and better answers that way.

 

proc format;
    value outf 0='No Depression' 1='Depression' 2='Recurrent';
run;
data want;
    set have(drop=exp:);
    array depr depression:;
	first_occurence_of_1=whichn(1,of depression:);
	exp_survtime=(first_occurence_of_1-1)*2;
	zero_flag=0;
	one_flag=0;
	if sum(of depression:)=0 then expected_outcome=0;
	else do i=(first_occurence_of_1+1) to dim(depr);
	    if depr(i)=0 then zero_flag=i;
	    if depr(i)=1 then one_flag=i;
	    if zero_flag>0 and one_flag>zero_flag then do;
	        expected_outcome=2;
	        leave;
        end;
        if one_flag>0 and zero_flag=0 then do;
            expected_outcome=1;
            leave;
        end;
    end;
    drop i one_flag zero_flag;
    format expected_outcome outf.;
run;

 

 

--
Paige Miller
mkeintz
PROC Star

Thank you for providing a working data step for code testing.

Edit: (On second look, it's not working as submitted - please see note at bottom.)

 

Question:

 

What OUTCOME is assigned if a person experiences depression only once,

  1. as the last depression value?
  2. as a middle value?

 

---------------------------------------

The program submitted for sample data was apparently not tested.  It generates only 8 obs from 10 records, as submitted, and generates this log message:

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

More space is needed between expected_outcome and exp_survtime, per below:

 

data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent       6
2 0 0 1 0 . 1 . Recurrent      10
3 0 0 0 0 0 0 0 No Depression  12
4 0 . 1 0 1 . . Recurrent       8
5 0 . . 1 1 1 1 Depression      6
6 0 1 . . 0 1 . Recurrent      10
7 0 . 1 . . 0 1 Recurrent      12
8 0 0 . . . 0 . No Depression  10
9 0 . 1 1 0 . . Depression      4
10 0 1 0 1 1 0 1 Recurrent      6
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

This code reproduces the results described.  But, it assigns a single depression episode as "Depression" in the absence of specific instructions:

 

data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent       6
2 0 0 1 0 . 1 . Recurrent      10
3 0 0 0 0 0 0 0 No Depression  12
4 0 . 1 0 1 . . Recurrent       8
5 0 . . 1 1 1 1 Depression      6
6 0 1 . . 0 1 . Recurrent      10
7 0 . 1 . . 0 1 Recurrent      12
8 0 0 . . . 0 . No Depression  10
9 0 . 1 1 0 . . Depression      4
10 0 1 0 1 1 0 1 Recurrent      6
run;

data want (drop=i _:);
  set have;

  array dep {*} depression: ;
  /*With 0 and . as word separators, count N of "words" in the concatenated depression sequence*/
  _n_depression_cycles=countw(cat(of dep{*}),'0.');

  if _n_depression_cycles=0 then do;
    outcome='No Depression';
    do i=dim(dep) to 1 by -1 until(dep{i}=0); *Find last zero **;
    end;
  end;
  else if _n_depression_cycles=1 then do;
    outcome='Depression';
    do i=1 to dim(dep) until (dep{i}=1);      *Find first 1 ;
    end;
  end;
  else do;
    outcome='Recurrent' ;                     *Find start of recurrance;
    do i=whichn(1,of dep{*})+2 to dim(dep) until (dep{i}=1 and dep{i-1}^=1);
    end;
  end;

  survtime=2*(i-1);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 930 views
  • 3 likes
  • 3 in conversation