BookmarkSubscribeRSS Feed
JFerron
Fluorite | Level 6

Hello!

I'm struggling with a data transformation where I have about a years worth of days, 600 or so participants, and dates at the beginning of each week (always starting on Monday). It looks like this:

idweekstartdatedayAbs_yes
8081/3/202210
8081/3/202220
8081/3/202230
8081/3/202240
8081/3/202251
8081/3/202261
8081/3/202271
8081/10/202210
8081/10/202220
8081/10/202230
8081/10/202241

 

I am trying to get my output dataset so it can be used for a recurrent event survival analysis and it would count each event (abs_yes = 1) and non (event (abs_yes = 1) within the individual where there are events (even as few as 1 in a row).  I currently only have the start date of each week (which are consecutive and would also like to output the beginning date (weekstartdate) of each event. In the example below, the abs_yes variable serves as an indicator for the length of the episode.

idweekstartdateAbs_yesstartends
8081/3/2022004
8081/7/2022147
8081/10/20220710
8081/13/202201011

 

I have had many false starts using nested do-loops. This data had been a wide file with days of abstinence by week across the row. So, I have gotten the data closer to the objective but am not sure how to get the program to repeat a count within an id (I can get it to count how many abs_yes within each id but not length in days of episode).

I would really appreciate any help/tips that I can get!

Thanks,

Joelle

4 REPLIES 4
ballardw
Super User

Changing the meaning of a variable, especially without a LOT of details as to what is calculated, is generally not a good idea.

Where does this come from:

808 1/7/2022 1 4 7

If 1/3/2022 and 1/10/2022 are "startweek" mondays then 1/7/2022 can't be the start of a week. Same with your 1/13/2022?

 

Here's what I think you are asking for.

data have;
   input id	weekstartdate :mmddyy10.	day	Abs_yes;
   format  weekstartdate mmddyy10.;
datalines;
808	1/3/2022	1	0
808	1/3/2022	2	0
808	1/3/2022	3	0
808	1/3/2022	4	0
808	1/3/2022	5	1
808	1/3/2022	6	1
808	1/3/2022	7	1
808	1/10/2022	1	0
808	1/10/2022	2	0
808	1/10/2022	3	0
808	1/10/2022	4	1
;

data want;
   set have;
   dateofvalue = weekstartdate + day -1;
   by id weekstartdate abs_yes notsorted;
   retain start ends eventstart 0;
   format dateofvalue eventstart mmddyy10.;

   if first.id then do;
      ends=0;
      eventstart=weekstartdate;
      start=0;
   end;;
   ends+1;
   if first.abs_yes then eventstart=dateofvalue;
   if last.abs_yes then do;
      output;
      start=ends;
   end;
   /* drop variables not wanted my guess*/
   /* drop weekstart day dateofvalue;*/
run;

Key bits: RETAIN will keep values across iterations of the data step.

When you use BY group processing SAS creates automatic variables that indicate whether the current record is the first or last of a group of values. The Not sorted allows for a value that may change to larger/smaller values multiple times within a larger group (ID or weekstart for your abs_yes as an example) without violating the "must be sorted by the values" for BY groups. The First and last can be used to conditionally set things. Such as resetting the counters for the ID and conditionallly write to the output data set.

If you haven't used OUTPUT statements explicitly they only write to the data set when executed. So timing of when variables are assigned and written is possible to control the output at a desired point in the data.

 

Really, don't change meanings of variables without very good reason. Some that uses your data and is told Weekstartdate is the start of a collection period will have good reason to believe that in other data sets that is what the variable is. Which is why I substitute "eventstart". Then the name means what the data is intended to indicate.

If you can't think of good short names then stick information about variables into a LABEL so it is clear what the variable is.

JFerron
Fluorite | Level 6

First, I would like to say thank you for your response. I probably mis-communicated the problem so, the output that I am getting is not quite right. It was giving me weekly output with abstinence marked. What I would like to do is to add the counter (labelled abs_yes) by whether it's on or off (1 vs 0) but sequentially. So for example, if the person has 2 days of off (abs_yes = 0), it would look like this:

Obs id Abs start ends eventstart123
70000201/21/2013
700121501/23/2013
7000151701/25/2013

In this example, I have recoded abs_yes to abs (in order to not change any variable names, per your suggestion). I'm assuming this requires some sort of lag function where I would track how many sequential days are on or off on the original (abs_yes)? If it helps, the eventstart variable is less important than the start and ends variable, which count the number of sequential days abstinence or smoking within an individual.

mkeintz
PROC Star

I'm going to propose you don't form START and ENDS values, but rather START_DATE and END_DATE, representing calendar dates for the first and last day of each ABS_YES "regime":

The code:

 

data have;
  input id weekstartdate :mmddyy10. day Abs_yes;
  format weekstartdate mmddyy10.;
datalines;
808	1/3/2022	1	0
808	1/3/2022	2	0
808	1/3/2022	3	0
808	1/3/2022	4	0
808	1/3/2022	5	1
808	1/3/2022	6	1
808	1/3/2022	7	1
808	1/10/2022	1	0
808	1/10/2022	2	0
808	1/10/2022	3	0
808	1/10/2022	4	1
run;

data want (drop=weekstartdate day);
  do until (last.abs_yes);
    set have;
    by id weekstartdate abs_yes notsorted;
    if first.abs_yes then do;
      beg_date=weekstartdate+day-1;
    end;
  end;
  end_date=weekstartdate+day-1;
  format beg_date end_date mmddyy10.;
run;
proc print;run;

produces this output:

 Obs     id    Abs_yes      beg_date      end_date

  1     808       0       01/03/2022    01/06/2022
  2     808       1       01/07/2022    01/09/2022
  3     808       0       01/10/2022    01/12/2022
  4     808       1       01/13/2022    01/13/2022
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
JFerron
Fluorite | Level 6

I could not get this code to count abstinence intervals. It just counts weekly or daily intervals (whichever I denote as my day variable). Here is my code:

I have renamed (weekstartdate to date and created a daily variable).
data want2 (drop=date day);
do until (last.abs_yes);
set counting_dates;
retain day 1;
by id date abs_yes notsorted;
if first.abs_yes then do;
beg_date= date + day-1;
end;
end;
end_date=date+day-1;
format beg_date end_date mmddyy10.;
run;


proc print data=want2 (obs =100);
RUN;

 

Can you help? I'm just not sure what I'm doing wrong. It runs, I'm just not getting the same output.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 592 views
  • 2 likes
  • 3 in conversation