Hi SAS community,
I hope you are doing well. I'm studying recurrent depression risk factors using a dataset with 'ID' and 'Depression9-Depression15' variables. The study period between depression 9 and depression 10 was two years. I want to categorize outcomes as 'recurrent,' 'depression,' or 'no depression.' 'Recurrent' means experiencing depression twice. Notably, 'Depression9' participants didn't have depression. How can I create this outcome and calculate survival time in these columns? Thanks for your help!
data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent 6
2 0 0 1 0 . 1 . Recurrent 10
3 0 0 0 0 0 0 0 No Depression 12
4 0 . 1 0 1 . . Recurrent 8
5 0 . . 1 1 1 1 Depression 6
6 0 1 . . 0 1 . Recurrent 10
7 0 . 1 . . 0 1 Recurrent 12
8 0 0 . . . 0 . No Depression 10
9 0 . 1 1 0 . . Depression 4
10 0 1 0 1 1 0 1 Recurrent 6
run;
This code reproduces the results described. But, it assigns a single depression episode as "Depression" in the absence of specific instructions:
data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent 6
2 0 0 1 0 . 1 . Recurrent 10
3 0 0 0 0 0 0 0 No Depression 12
4 0 . 1 0 1 . . Recurrent 8
5 0 . . 1 1 1 1 Depression 6
6 0 1 . . 0 1 . Recurrent 10
7 0 . 1 . . 0 1 Recurrent 12
8 0 0 . . . 0 . No Depression 10
9 0 . 1 1 0 . . Depression 4
10 0 1 0 1 1 0 1 Recurrent 6
run;
data want (drop=i _:);
set have;
array dep {*} depression: ;
/*With 0 and . as word separators, count N of "words" in the concatenated depression sequence*/
_n_depression_cycles=countw(cat(of dep{*}),'0.');
if _n_depression_cycles=0 then do;
outcome='No Depression';
do i=dim(dep) to 1 by -1 until(dep{i}=0); *Find last zero **;
end;
end;
else if _n_depression_cycles=1 then do;
outcome='Depression';
do i=1 to dim(dep) until (dep{i}=1); *Find first 1 ;
end;
end;
else do;
outcome='Recurrent' ; *Find start of recurrance;
do i=whichn(1,of dep{*})+2 to dim(dep) until (dep{i}=1 and dep{i-1}^=1);
end;
end;
survtime=2*(i-1);
run;
Why is ID 5 shown as "Depression" and not "Recurrent"?
How is exp_survtime calculated from this data? Please explain in words.
What does this sentence have to do with solution to the problem? "The study period between depression 9 and depression 10 was two years."
Thank you. None of this was stated in the original problem statement, making solutions based upon the original problem statement impossible. It would be a very good idea for you to include all relevant information from now on in your original problem statement, you will get faster and better answers that way.
proc format;
value outf 0='No Depression' 1='Depression' 2='Recurrent';
run;
data want;
set have(drop=exp:);
array depr depression:;
first_occurence_of_1=whichn(1,of depression:);
exp_survtime=(first_occurence_of_1-1)*2;
zero_flag=0;
one_flag=0;
if sum(of depression:)=0 then expected_outcome=0;
else do i=(first_occurence_of_1+1) to dim(depr);
if depr(i)=0 then zero_flag=i;
if depr(i)=1 then one_flag=i;
if zero_flag>0 and one_flag>zero_flag then do;
expected_outcome=2;
leave;
end;
if one_flag>0 and zero_flag=0 then do;
expected_outcome=1;
leave;
end;
end;
drop i one_flag zero_flag;
format expected_outcome outf.;
run;
Thank you for providing a working data step for code testing.
Edit: (On second look, it's not working as submitted - please see note at bottom.)
Question:
What OUTCOME is assigned if a person experiences depression only once,
---------------------------------------
The program submitted for sample data was apparently not tested. It generates only 8 obs from 10 records, as submitted, and generates this log message:
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
More space is needed between expected_outcome and exp_survtime, per below:
data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent 6
2 0 0 1 0 . 1 . Recurrent 10
3 0 0 0 0 0 0 0 No Depression 12
4 0 . 1 0 1 . . Recurrent 8
5 0 . . 1 1 1 1 Depression 6
6 0 1 . . 0 1 . Recurrent 10
7 0 . 1 . . 0 1 Recurrent 12
8 0 0 . . . 0 . No Depression 10
9 0 . 1 1 0 . . Depression 4
10 0 1 0 1 1 0 1 Recurrent 6
run;
This code reproduces the results described. But, it assigns a single depression episode as "Depression" in the absence of specific instructions:
data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent 6
2 0 0 1 0 . 1 . Recurrent 10
3 0 0 0 0 0 0 0 No Depression 12
4 0 . 1 0 1 . . Recurrent 8
5 0 . . 1 1 1 1 Depression 6
6 0 1 . . 0 1 . Recurrent 10
7 0 . 1 . . 0 1 Recurrent 12
8 0 0 . . . 0 . No Depression 10
9 0 . 1 1 0 . . Depression 4
10 0 1 0 1 1 0 1 Recurrent 6
run;
data want (drop=i _:);
set have;
array dep {*} depression: ;
/*With 0 and . as word separators, count N of "words" in the concatenated depression sequence*/
_n_depression_cycles=countw(cat(of dep{*}),'0.');
if _n_depression_cycles=0 then do;
outcome='No Depression';
do i=dim(dep) to 1 by -1 until(dep{i}=0); *Find last zero **;
end;
end;
else if _n_depression_cycles=1 then do;
outcome='Depression';
do i=1 to dim(dep) until (dep{i}=1); *Find first 1 ;
end;
end;
else do;
outcome='Recurrent' ; *Find start of recurrance;
do i=whichn(1,of dep{*})+2 to dim(dep) until (dep{i}=1 and dep{i-1}^=1);
end;
end;
survtime=2*(i-1);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.