Hi SAS community,
I hope you are doing well. I am currently delving into the investigation of risk factors associated with recurrent depression using a longitudinal dataset. In this dataset, I have the variables 'ID' and 'Depression9-Depression15' as original variables. My objective is to create a three-categorical outcome, which comprises 'recurrent,' 'depression,' and 'no depression.' Specifically, 'recurrent' should be defined as experiencing depression for the second time.
It's worth noting that all the participants in 'Depression9' did not experience depression. I would greatly appreciate your guidance on creating this outcome and calculating survival time in the two highlighted columns. Thank you for your assistance!
ID | Depression9 | Depression10 | Depression11 | Depression12 | Depression13 | Depression14 | Depression15 | outcome | Survival time |
1 | x | √ | x | √ | . | . | . | Recurrent | 6 |
2 | x | x | √ | x | . | √ | . | Recurrent | 10 |
3 | x | x | x | x | x | x | x | No Depression | 12 |
4 | x | . | √ | x | √ | . | . | Recurrent | 8 |
5 | x | . | . | √ | √ | √ | √ | Depression | 6 |
6 | x | √ | . | . | x | √ | . | Recurrent | 10 |
7 | x | . | √ | . | . | x | √ | Recurrent | 12 |
8 | x | x | . | . | . | x | . | No Depression | 10 |
9 | x | . | √ | √ | x | . | . | Depression | 4 |
10 | x | √ | x | √ | √ | x | √ | Recurrent | 6 |
The options in my SAS session converts the "√" character to a "v" in the program below:
data have;
input ID (depression9-depression15) (:$1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 x √ x √ . . . Recurrent 6
2 x x √ x . √ . Recurrent 10
3 x x x x x x x No Depression 12
4 x . √ x √ . . Recurrent 8
5 x . . √ √ √ √ Depression 6
6 x √ . . x √ . Recurrent 10
7 x . √ . . x √ Recurrent 12
8 x x . . . x . No Depression 10
9 x . √ √ x . . Depression 4
10 x √ x √ √ x √ Recurrent 6
run;
data want (drop=_: d);
set have;
array dep {*} depression: ;
_string=cat(of depression10-depression15);
_npositive=lengthn(compress(_string,'x '));
if _npositive<=1 then do;
outcome='No Depression';
survtime=2*(length(_string));
end;
else if countw(_string,'x ')=1 then do;
outcome='Depression';
survtime=2*findc(_string,'v');
end;
else do;
outcome='Recurrent';
do d=1 by 1 until (dep{d}='v'), d+1 to dim(dep) until (dep{d}='v' and dep{d-1}^='v');
end;
survtime=2*(d-1);
end;
run;
How is survival time defined?
I still don't get it. Why is survival time = 6 in the first obs? And 10 in the second?
Hi,
I think I'm missing something here.
I have assumed that a diagnosis of depression is indicated with "√", is this correct? If yes, then please respond to the below, otherwise please explain what the symbols mean in the table in the question, thanks.
Looking at the table in the question I see depression for obs 1 is marked under Depression10 and Depression12; for observation 2 it is Depression11 and Depression14.:
But in a later post you refer to obs1 having D9 & D12 (table shows D10 & D12) and obs 2 having D9 & D14 (table shows D11 & D14), but I cannot match that description with the original table that was posted:
Sorry for the confusion. First obs recurrent depression in depression 12. The time periods between depression12 and depression9 is six. The time gap between each depression is two years.
Second obs recurrent depression in 14 and the time periods between depression 14 and depression 9 is ten.
Please clarify any misunderstanding I might have.
Thanks & kind regards,
Amir.
The options in my SAS session converts the "√" character to a "v" in the program below:
data have;
input ID (depression9-depression15) (:$1.) expected_outcome :$&13. exp_Survtime ;
datalines;
1 x √ x √ . . . Recurrent 6
2 x x √ x . √ . Recurrent 10
3 x x x x x x x No Depression 12
4 x . √ x √ . . Recurrent 8
5 x . . √ √ √ √ Depression 6
6 x √ . . x √ . Recurrent 10
7 x . √ . . x √ Recurrent 12
8 x x . . . x . No Depression 10
9 x . √ √ x . . Depression 4
10 x √ x √ √ x √ Recurrent 6
run;
data want (drop=_: d);
set have;
array dep {*} depression: ;
_string=cat(of depression10-depression15);
_npositive=lengthn(compress(_string,'x '));
if _npositive<=1 then do;
outcome='No Depression';
survtime=2*(length(_string));
end;
else if countw(_string,'x ')=1 then do;
outcome='Depression';
survtime=2*findc(_string,'v');
end;
else do;
outcome='Recurrent';
do d=1 by 1 until (dep{d}='v'), d+1 to dim(dep) until (dep{d}='v' and dep{d-1}^='v');
end;
survtime=2*(d-1);
end;
run;
@nwang5 wrote:
Thank you so much. I changed "√" as 1 and "x" as 0. I was wondering if there is any easier way to do it. Thanks!
I've already spent time and effort to convert your data table into a SAS data step once, in order to provide a tested program. It's not a habit I want to frequently indulge - it takes too much time, and can set unrealistic expectations.
Please provide the revised data in the form of a working data step. That will make it far easier for me to make it easier for you.
I am sooo sorry. I am not very familiar with my dataset. I will accept your answer as a solution and start a new topic to show my real data. Sorry again.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.