Hi SAS community,
I hope you are doing well. I am currently working on calculating the number of prior episodes, the duration of prior episodes, and the time since the latest episode. The columns highlighted in bold are the expected outcomes.
In the context provided, '1' signifies depression while '0' indicates the absence of depression. 'Depression2' through 'depression8' corresponds to different study waves, specifically from wave 4 to wave 8. An 'episode' is defined as an instance of depression, regardless of its duration.
For instance, taking 'hhidpn=10210020' as an example, if 'depression2' is equal to 1, it is considered a depression episode. If the participant recovers, 'depression4' becomes 0. Subsequently, if 'depression5' is 1, it is recognized as a new depression episode. In this case, the total number of episodes would be 2.
The 'depression5' episode is identified as the most recent one. The duration of this latest episode is calculated as 4, given that 'depression5', 'depression6', 'depression7', and 'depression8' are all equal to 1 during these waves. The 'time since the onset of the most recent episode' is determined by subtracting wave 4 from wave 8, resulting in a duration of 4 waves.
Thank you so much!
data have;
input HHIDPN (depression2-depression8) (:$1.);
datalines;
10210020 1 1 0 1 1 1 1
10395020 0 0 0 0 0 0 1
10475010 0 0 0 0 0 1 1
10533011 . . . . . 0 1
10577010 0 0 0 1 0 1 1
10756010 1 0 . . . 0 0
10962010 1 0 0 1 0 0 0
11368010 1 0 1 1 1 0 0
11368020 0 0 1 1 1 0 0
11423010 1 1 1 1 1 1 1
11862030 0 0 0 0 1 0 0
12033011 0 0 0 0 0 1 0
12104010 0 1 0 0 1 0 0
12218020 0 0 0 1 0 0 1
12285010 0 0 1 0 0 0 0
12344020 0 0 0 0 1 0 0
12549010 0 0 0 0 0 0 1
12573010 0 0 0 1 0 0 0
12762020 . . 1 0 0 0 0
15618010 0 . 1 0 0 . 0
15657010 0 0 1 1 1 0 0
15658010 1 0 1 0 1 0 0
16517010 0 . 0 . 0 . 1
16517012 1 0 1 . 1 0 0
run;
HHIDPN | depression2 | depression3 | depression4 | depression5 | depression6 | depression7 | depression8 | No. of prior episodes | Duration of prior episod | Time since onset of most recent episode |
10210020 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 2 | 4 | 4 |
10395020 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
10475010 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 2 | 2 |
10533011 | . | . | . | . | . | 0 | 1 | 1 | 1 | 1 |
10577010 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 2 | 2 | 2 |
10756010 | 1 | 0 | . | . | . | 0 | 0 | 1 | 1 | 7 |
10962010 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 2 | 1 | 4 |
11368010 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 2 | 3 | 5 |
11368020 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 3 | 5 |
11423010 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 7 | 7 |
11862030 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 3 |
12033011 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 2 |
12104010 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 2 | 1 | 3 |
12218020 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 2 | 1 | 1 |
12285010 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 5 |
12344020 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 3 |
12549010 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
12573010 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 4 |
12762020 | . | . | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 5 |
15618010 | 0 | . | 1 | 0 | 0 | . | 0 | 1 | 1 | 5 |
15657010 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 3 | 5 |
15658010 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 3 | 1 | 3 |
16517010 | 0 | . | 0 | . | 0 | . | 1 | 1 | 1 | 1 |
16517012 | 1 | 0 | 1 | . | 1 | 0 | 0 | 2 | 1 | 3 |
I'm not sure how to interpret any of that as "an episode" and duration tends to imply something that involves time. You have clearly defined anything relating to time. Your example of want tends to actually obfuscate "duration" and "episode" because it appears to be a COUNT not and actual interval. "Since latest" is going to have something interesting as a definition of latest as well.
If I had a data set with a person identifier and the date of "an episode" then duration and "time since" would make sense.
See if you can describe what your shown data means in some terms related to time and/or duration, indicate for some of those what value you are using a "prior episode" and "latest episode" in clear terms of the content of the data.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.