BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mvhoya
Obsidian | Level 7

Hi SAS Communities, 

 

I'm working on a dataset in wide structure and this is a simplified example of how the dataset looks like:

 

ID      visit_1        event_1      visit_2        event_2      visit_3       event_3         visit_4     event_4       

1      2/28/2020        0           3/05/2020         0         08/31/2020        1                 .                    .

2      4/10/2020        0           07/01/2020       1                 .                  .                 .                    .

3      03/12/2020      0           05/29/2020       1         08/05/2020        0            12/21/2020       1

4      07/07/2020      1                  .                  .                  .                  .                  .                   .

5     09/03/2020       0          10/15/2020        0          10/31/2020       0             11/15/2020       1   

 

My goal is to calculate the duration measuring the time (in months) from visit_1 to the visit date of the first instance of an event = 1. I believe I would need to create an array to loop through each row and determine the date of the first occurrence of an event=1, subtract the visit_1 date from it, and output it into a new variable called 'duration'.  So I would have a duration for each ID. 

 

Important things to note:

1. Each ID has a varying number of visits. ID 15 may have 10 visits and 10 events, while ID 55 only has 3 visits and 3 events. There isn't a consistent pattern in the number of visits.

2. My visits are all already in SAS date format

3. Visit numbers and event numbers are grouped together by the number following the underscore. For example, event_2 takes place on the date listed for visit_2.

 

I'm unsure of how to write my code in a way that would determine the first event=1, pull the date from that event and then calculate duration. I would appreciate any guidance on this problem. I can provide additional information/details if necessary. Thank you in advance! 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Your description of what you've got and your problem statement was pretty clear. Only thing missing to make this "perfect" would have been an actual SAS data step creating the sample data.

Is below giving you what you're after? Assuming that the before/after time period you want to use should always be the same and not depend on calendar month duration I've interpreted your 6 months as 182 days.

data have;
  attrib 
    id      informat=$10.
    visit_1 informat=mmddyy10. format=date9.
    event_1 informat=$2.
    visit_2 informat=mmddyy10. format=date9.
    event_2 informat=$2.
    visit_3 informat=mmddyy10. format=date9.
    event_3 informat=$2.
    visit_4 informat=mmddyy10. format=date9.
    event_4 informat=$2.
  ;
  infile datalines truncover dlm=' ' dsd;
  input 
    ID $
    visit_1 event_1
    visit_2 event_2
    visit_3 event_3
    visit_4 event_4
  ;
  ;
  datalines;
1 2/28/2020 0 3/05/2020 0 08/31/2020 1 . .
2 4/10/2020 0 07/01/2020 1 . . . .
3 03/12/2020 0 05/29/2020 1 08/05/2020 0 12/21/2020 1
4 07/07/2020 1 . . . . . .
5 09/03/2020 0 10/15/2020 0 10/31/2020 0 11/15/2020 1 
;

data want;
  length duration_in_days 8;
  set have;
  array _visit {*} visit_:;
  array _event {*} event_:;

  n_event1=whichc('1',of _event[*]);
  if n_event1>=1 then
    do;
      duration_in_days=_visit[n_event1]-_visit[1];
    end;
run;

Patrick_0-1619958769634.png

 

View solution in original post

5 REPLIES 5
sbxkoenk
SAS Super FREQ

Hello,

I do not make use of the event_ value. I just take the last date available for an id.

data have;
infile cards delimiter='|';
input
ID   visit_1 event_1      
     visit_2 event_2
     visit_3 event_3 
     visit_4 event_4;
informat visit_: mmddyy10.;
format   visit_: mmddyy10.;
cards; 
1 |02/28/2020 |0 |03/05/2020  |0 |08/31/2020 |1  |.           |.
2 |04/10/2020 |0 |07/01/2020  |1 |.          |.  |.           |.
3 |03/12/2020 |0 |05/29/2020  |1 |08/05/2020 |0  |12/21/2020  |1
4 |07/07/2020 |1 |.           |. |.          |.  |.           |.
5 |09/03/2020 |0 |10/15/2020  |0 |10/31/2020 |0  |11/15/2020  |1
;
run;

PROC TRANSPOSE data=have out=want;
 by id;
run;

data want;
 set want(rename=(COL1=date));
 by id;
 where substr(_NAME_,1,6)='visit_';
 where also date is not missing;
 if first.id OR last.id then output;
 format date mmddyy10.;
run;

data want(drop=lagdate date);
 set want;
 by id;
 lagdate=lag(date);
 duration1=INTCK('MONTH',lagdate,date,'DISCRETE');
 duration2=INTCK('MONTH',lagdate,date,'CONTINUOUS');
 duration1=max(duration1,0);
 duration2=max(duration2,0);
 if last.id then output;
run;
/* end of program */

Cheers,

Koen

sbxkoenk
SAS Super FREQ

Hi,

I'm sorry.

The last data step won't always work well if you have ID's with only one visit (event_1=1), like your ID=4.

Replace last data step by this data step:

data want(drop=lagdate date);
 set want;
 by id;
 lagdate=lag(date);
 duration1=0;
 duration2=0;
 if last.id NE first.id then do;
  duration1=INTCK('MONTH',lagdate,date,'DISCRETE');
  duration2=INTCK('MONTH',lagdate,date,'CONTINUOUS');
 end;
 if last.id then output;
run;
/* end of program */

Koen

mkeintz
PROC Star

This is a good place to use an array (of visit dates).  Since your set of date variables always seem to end in event=1, you merely need to know how many valid dates are in the array (call it n_dates).  Then generate the time from visit_1 to the n_dateth visit:

 

data have;
infile cards delimiter='|';
input
ID   visit_1 event_1      
     visit_2 event_2
     visit_3 event_3 
     visit_4 event_4;
informat visit_: mmddyy10.;
format   visit_: mmddyy10.;
cards; 
1 |02/28/2020 |0 |03/05/2020  |0 |08/31/2020 |1  |.           |.
2 |04/10/2020 |0 |07/01/2020  |1 |.          |.  |.           |.
3 |03/12/2020 |0 |05/29/2020  |1 |08/05/2020 |0  |12/21/2020  |1
4 |07/07/2020 |1 |.           |. |.          |.  |.           |.
5 |09/03/2020 |0 |10/15/2020  |0 |10/31/2020 |0  |11/15/2020  |1
;

data want;
  set have;
  array vis {*} visit_: ;
  n_dates=n(of vis{*});
  date_event=vis{n_dates};
  n_months=intck('month',visit_1,date_event,'continuous');
  format date_event mmddyy10. ;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

Your description of what you've got and your problem statement was pretty clear. Only thing missing to make this "perfect" would have been an actual SAS data step creating the sample data.

Is below giving you what you're after? Assuming that the before/after time period you want to use should always be the same and not depend on calendar month duration I've interpreted your 6 months as 182 days.

data have;
  attrib 
    id      informat=$10.
    visit_1 informat=mmddyy10. format=date9.
    event_1 informat=$2.
    visit_2 informat=mmddyy10. format=date9.
    event_2 informat=$2.
    visit_3 informat=mmddyy10. format=date9.
    event_3 informat=$2.
    visit_4 informat=mmddyy10. format=date9.
    event_4 informat=$2.
  ;
  infile datalines truncover dlm=' ' dsd;
  input 
    ID $
    visit_1 event_1
    visit_2 event_2
    visit_3 event_3
    visit_4 event_4
  ;
  ;
  datalines;
1 2/28/2020 0 3/05/2020 0 08/31/2020 1 . .
2 4/10/2020 0 07/01/2020 1 . . . .
3 03/12/2020 0 05/29/2020 1 08/05/2020 0 12/21/2020 1
4 07/07/2020 1 . . . . . .
5 09/03/2020 0 10/15/2020 0 10/31/2020 0 11/15/2020 1 
;

data want;
  length duration_in_days 8;
  set have;
  array _visit {*} visit_:;
  array _event {*} event_:;

  n_event1=whichc('1',of _event[*]);
  if n_event1>=1 then
    do;
      duration_in_days=_visit[n_event1]-_visit[1];
    end;
run;

Patrick_0-1619958769634.png

 

mvhoya
Obsidian | Level 7
This is exactly what I was looking for and it worked very smoothly with my dataset. Thank you very much for your help!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1897 views
  • 1 like
  • 4 in conversation