Hi SAS community,
I hope you are doing well. I'm studying recurrent depression risk factors using a dataset with 'ID' and 'Depression9-Depression15' variables. The study period between depression 9 and depression 10 was two years. I want to categorize outcomes as 'recurrent,' 'depression,' or 'no depression.' 'Recurrent' means experiencing depression twice. Notably, 'Depression9' participants didn't have depression. How can I create this outcome and calculate survival time in these columns? Thanks for your help!
data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent 6
2 0 0 1 0 . 1 . Recurrent 10
3 0 0 0 0 0 0 0 No Depression 12
4 0 . 1 0 1 . . Recurrent 8
5 0 . . 1 1 1 1 Depression 6
6 0 1 . . 0 1 . Recurrent 10
7 0 . 1 . . 0 1 Recurrent 12
8 0 0 . . . 0 . No Depression 10
9 0 . 1 1 0 . . Depression 4
10 0 1 0 1 1 0 1 Recurrent 6
run;
This code reproduces the results described. But, it assigns a single depression episode as "Depression" in the absence of specific instructions:
data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent 6
2 0 0 1 0 . 1 . Recurrent 10
3 0 0 0 0 0 0 0 No Depression 12
4 0 . 1 0 1 . . Recurrent 8
5 0 . . 1 1 1 1 Depression 6
6 0 1 . . 0 1 . Recurrent 10
7 0 . 1 . . 0 1 Recurrent 12
8 0 0 . . . 0 . No Depression 10
9 0 . 1 1 0 . . Depression 4
10 0 1 0 1 1 0 1 Recurrent 6
run;
data want (drop=i _:);
set have;
array dep {*} depression: ;
/*With 0 and . as word separators, count N of "words" in the concatenated depression sequence*/
_n_depression_cycles=countw(cat(of dep{*}),'0.');
if _n_depression_cycles=0 then do;
outcome='No Depression';
do i=dim(dep) to 1 by -1 until(dep{i}=0); *Find last zero **;
end;
end;
else if _n_depression_cycles=1 then do;
outcome='Depression';
do i=1 to dim(dep) until (dep{i}=1); *Find first 1 ;
end;
end;
else do;
outcome='Recurrent' ; *Find start of recurrance;
do i=whichn(1,of dep{*})+2 to dim(dep) until (dep{i}=1 and dep{i-1}^=1);
end;
end;
survtime=2*(i-1);
run;
Why is ID 5 shown as "Depression" and not "Recurrent"?
How is exp_survtime calculated from this data? Please explain in words.
What does this sentence have to do with solution to the problem? "The study period between depression 9 and depression 10 was two years."
Thank you. None of this was stated in the original problem statement, making solutions based upon the original problem statement impossible. It would be a very good idea for you to include all relevant information from now on in your original problem statement, you will get faster and better answers that way.
proc format;
value outf 0='No Depression' 1='Depression' 2='Recurrent';
run;
data want;
set have(drop=exp:);
array depr depression:;
first_occurence_of_1=whichn(1,of depression:);
exp_survtime=(first_occurence_of_1-1)*2;
zero_flag=0;
one_flag=0;
if sum(of depression:)=0 then expected_outcome=0;
else do i=(first_occurence_of_1+1) to dim(depr);
if depr(i)=0 then zero_flag=i;
if depr(i)=1 then one_flag=i;
if zero_flag>0 and one_flag>zero_flag then do;
expected_outcome=2;
leave;
end;
if one_flag>0 and zero_flag=0 then do;
expected_outcome=1;
leave;
end;
end;
drop i one_flag zero_flag;
format expected_outcome outf.;
run;
Thank you for providing a working data step for code testing.
Edit: (On second look, it's not working as submitted - please see note at bottom.)
Question:
What OUTCOME is assigned if a person experiences depression only once,
---------------------------------------
The program submitted for sample data was apparently not tested. It generates only 8 obs from 10 records, as submitted, and generates this log message:
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
More space is needed between expected_outcome and exp_survtime, per below:
data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent 6
2 0 0 1 0 . 1 . Recurrent 10
3 0 0 0 0 0 0 0 No Depression 12
4 0 . 1 0 1 . . Recurrent 8
5 0 . . 1 1 1 1 Depression 6
6 0 1 . . 0 1 . Recurrent 10
7 0 . 1 . . 0 1 Recurrent 12
8 0 0 . . . 0 . No Depression 10
9 0 . 1 1 0 . . Depression 4
10 0 1 0 1 1 0 1 Recurrent 6
run;
This code reproduces the results described. But, it assigns a single depression episode as "Depression" in the absence of specific instructions:
data have;
input ID (depression9-depression15) (:1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 0 1 0 1 . . . Recurrent 6
2 0 0 1 0 . 1 . Recurrent 10
3 0 0 0 0 0 0 0 No Depression 12
4 0 . 1 0 1 . . Recurrent 8
5 0 . . 1 1 1 1 Depression 6
6 0 1 . . 0 1 . Recurrent 10
7 0 . 1 . . 0 1 Recurrent 12
8 0 0 . . . 0 . No Depression 10
9 0 . 1 1 0 . . Depression 4
10 0 1 0 1 1 0 1 Recurrent 6
run;
data want (drop=i _:);
set have;
array dep {*} depression: ;
/*With 0 and . as word separators, count N of "words" in the concatenated depression sequence*/
_n_depression_cycles=countw(cat(of dep{*}),'0.');
if _n_depression_cycles=0 then do;
outcome='No Depression';
do i=dim(dep) to 1 by -1 until(dep{i}=0); *Find last zero **;
end;
end;
else if _n_depression_cycles=1 then do;
outcome='Depression';
do i=1 to dim(dep) until (dep{i}=1); *Find first 1 ;
end;
end;
else do;
outcome='Recurrent' ; *Find start of recurrance;
do i=whichn(1,of dep{*})+2 to dim(dep) until (dep{i}=1 and dep{i-1}^=1);
end;
end;
survtime=2*(i-1);
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.