Hello!
I'm struggling with a data transformation where I have about a years worth of days, 600 or so participants, and dates at the beginning of each week (always starting on Monday). It looks like this:
id | weekstartdate | day | Abs_yes |
808 | 1/3/2022 | 1 | 0 |
808 | 1/3/2022 | 2 | 0 |
808 | 1/3/2022 | 3 | 0 |
808 | 1/3/2022 | 4 | 0 |
808 | 1/3/2022 | 5 | 1 |
808 | 1/3/2022 | 6 | 1 |
808 | 1/3/2022 | 7 | 1 |
808 | 1/10/2022 | 1 | 0 |
808 | 1/10/2022 | 2 | 0 |
808 | 1/10/2022 | 3 | 0 |
808 | 1/10/2022 | 4 | 1 |
I am trying to get my output dataset so it can be used for a recurrent event survival analysis and it would count each event (abs_yes = 1) and non (event (abs_yes = 1) within the individual where there are events (even as few as 1 in a row). I currently only have the start date of each week (which are consecutive and would also like to output the beginning date (weekstartdate) of each event. In the example below, the abs_yes variable serves as an indicator for the length of the episode.
id | weekstartdate | Abs_yes | start | ends |
808 | 1/3/2022 | 0 | 0 | 4 |
808 | 1/7/2022 | 1 | 4 | 7 |
808 | 1/10/2022 | 0 | 7 | 10 |
808 | 1/13/2022 | 0 | 10 | 11 |
I have had many false starts using nested do-loops. This data had been a wide file with days of abstinence by week across the row. So, I have gotten the data closer to the objective but am not sure how to get the program to repeat a count within an id (I can get it to count how many abs_yes within each id but not length in days of episode).
I would really appreciate any help/tips that I can get!
Thanks,
Joelle
Changing the meaning of a variable, especially without a LOT of details as to what is calculated, is generally not a good idea.
Where does this come from:
808 | 1/7/2022 | 1 | 4 | 7 |
If 1/3/2022 and 1/10/2022 are "startweek" mondays then 1/7/2022 can't be the start of a week. Same with your 1/13/2022?
Here's what I think you are asking for.
data have; input id weekstartdate :mmddyy10. day Abs_yes; format weekstartdate mmddyy10.; datalines; 808 1/3/2022 1 0 808 1/3/2022 2 0 808 1/3/2022 3 0 808 1/3/2022 4 0 808 1/3/2022 5 1 808 1/3/2022 6 1 808 1/3/2022 7 1 808 1/10/2022 1 0 808 1/10/2022 2 0 808 1/10/2022 3 0 808 1/10/2022 4 1 ; data want; set have; dateofvalue = weekstartdate + day -1; by id weekstartdate abs_yes notsorted; retain start ends eventstart 0; format dateofvalue eventstart mmddyy10.; if first.id then do; ends=0; eventstart=weekstartdate; start=0; end;; ends+1; if first.abs_yes then eventstart=dateofvalue; if last.abs_yes then do; output; start=ends; end; /* drop variables not wanted my guess*/ /* drop weekstart day dateofvalue;*/ run;
Key bits: RETAIN will keep values across iterations of the data step.
When you use BY group processing SAS creates automatic variables that indicate whether the current record is the first or last of a group of values. The Not sorted allows for a value that may change to larger/smaller values multiple times within a larger group (ID or weekstart for your abs_yes as an example) without violating the "must be sorted by the values" for BY groups. The First and last can be used to conditionally set things. Such as resetting the counters for the ID and conditionallly write to the output data set.
If you haven't used OUTPUT statements explicitly they only write to the data set when executed. So timing of when variables are assigned and written is possible to control the output at a desired point in the data.
Really, don't change meanings of variables without very good reason. Some that uses your data and is told Weekstartdate is the start of a collection period will have good reason to believe that in other data sets that is what the variable is. Which is why I substitute "eventstart". Then the name means what the data is intended to indicate.
If you can't think of good short names then stick information about variables into a LABEL so it is clear what the variable is.
First, I would like to say thank you for your response. I probably mis-communicated the problem so, the output that I am getting is not quite right. It was giving me weekly output with abstinence marked. What I would like to do is to add the counter (labelled abs_yes) by whether it's on or off (1 vs 0) but sequentially. So for example, if the person has 2 days of off (abs_yes = 0), it would look like this:
700 | 0 | 0 | 2 | 01/21/2013 |
700 | 1 | 2 | 15 | 01/23/2013 |
700 | 0 | 15 | 17 | 01/25/2013 |
In this example, I have recoded abs_yes to abs (in order to not change any variable names, per your suggestion). I'm assuming this requires some sort of lag function where I would track how many sequential days are on or off on the original (abs_yes)? If it helps, the eventstart variable is less important than the start and ends variable, which count the number of sequential days abstinence or smoking within an individual.
I'm going to propose you don't form START and ENDS values, but rather START_DATE and END_DATE, representing calendar dates for the first and last day of each ABS_YES "regime":
The code:
data have;
input id weekstartdate :mmddyy10. day Abs_yes;
format weekstartdate mmddyy10.;
datalines;
808 1/3/2022 1 0
808 1/3/2022 2 0
808 1/3/2022 3 0
808 1/3/2022 4 0
808 1/3/2022 5 1
808 1/3/2022 6 1
808 1/3/2022 7 1
808 1/10/2022 1 0
808 1/10/2022 2 0
808 1/10/2022 3 0
808 1/10/2022 4 1
run;
data want (drop=weekstartdate day);
do until (last.abs_yes);
set have;
by id weekstartdate abs_yes notsorted;
if first.abs_yes then do;
beg_date=weekstartdate+day-1;
end;
end;
end_date=weekstartdate+day-1;
format beg_date end_date mmddyy10.;
run;
proc print;run;
produces this output:
Obs id Abs_yes beg_date end_date
1 808 0 01/03/2022 01/06/2022
2 808 1 01/07/2022 01/09/2022
3 808 0 01/10/2022 01/12/2022
4 808 1 01/13/2022 01/13/2022
I could not get this code to count abstinence intervals. It just counts weekly or daily intervals (whichever I denote as my day variable). Here is my code:
I have renamed (weekstartdate to date and created a daily variable).
data want2 (drop=date day);
do until (last.abs_yes);
set counting_dates;
retain day 1;
by id date abs_yes notsorted;
if first.abs_yes then do;
beg_date= date + day-1;
end;
end;
end_date=date+day-1;
format beg_date end_date mmddyy10.;
run;
proc print data=want2 (obs =100);
RUN;
Can you help? I'm just not sure what I'm doing wrong. It runs, I'm just not getting the same output.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.