Hi SAS Communities,
I'm working on a dataset in wide structure and this is a simplified example of how the dataset looks like:
ID visit_1 event_1 visit_2 event_2 visit_3 event_3 visit_4 event_4
1 2/28/2020 0 3/05/2020 0 08/31/2020 1 . .
2 4/10/2020 0 07/01/2020 1 . . . .
3 03/12/2020 0 05/29/2020 1 08/05/2020 0 12/21/2020 1
4 07/07/2020 1 . . . . . .
5 09/03/2020 0 10/15/2020 0 10/31/2020 0 11/15/2020 1
My goal is to calculate the duration measuring the time (in months) from visit_1 to the visit date of the first instance of an event = 1. I believe I would need to create an array to loop through each row and determine the date of the first occurrence of an event=1, subtract the visit_1 date from it, and output it into a new variable called 'duration'. So I would have a duration for each ID.
Important things to note:
1. Each ID has a varying number of visits. ID 15 may have 10 visits and 10 events, while ID 55 only has 3 visits and 3 events. There isn't a consistent pattern in the number of visits.
2. My visits are all already in SAS date format
3. Visit numbers and event numbers are grouped together by the number following the underscore. For example, event_2 takes place on the date listed for visit_2.
I'm unsure of how to write my code in a way that would determine the first event=1, pull the date from that event and then calculate duration. I would appreciate any guidance on this problem. I can provide additional information/details if necessary. Thank you in advance!
Your description of what you've got and your problem statement was pretty clear. Only thing missing to make this "perfect" would have been an actual SAS data step creating the sample data.
Is below giving you what you're after? Assuming that the before/after time period you want to use should always be the same and not depend on calendar month duration I've interpreted your 6 months as 182 days.
data have;
attrib
id informat=$10.
visit_1 informat=mmddyy10. format=date9.
event_1 informat=$2.
visit_2 informat=mmddyy10. format=date9.
event_2 informat=$2.
visit_3 informat=mmddyy10. format=date9.
event_3 informat=$2.
visit_4 informat=mmddyy10. format=date9.
event_4 informat=$2.
;
infile datalines truncover dlm=' ' dsd;
input
ID $
visit_1 event_1
visit_2 event_2
visit_3 event_3
visit_4 event_4
;
;
datalines;
1 2/28/2020 0 3/05/2020 0 08/31/2020 1 . .
2 4/10/2020 0 07/01/2020 1 . . . .
3 03/12/2020 0 05/29/2020 1 08/05/2020 0 12/21/2020 1
4 07/07/2020 1 . . . . . .
5 09/03/2020 0 10/15/2020 0 10/31/2020 0 11/15/2020 1
;
data want;
length duration_in_days 8;
set have;
array _visit {*} visit_:;
array _event {*} event_:;
n_event1=whichc('1',of _event[*]);
if n_event1>=1 then
do;
duration_in_days=_visit[n_event1]-_visit[1];
end;
run;
Hello,
I do not make use of the event_ value. I just take the last date available for an id.
data have;
infile cards delimiter='|';
input
ID visit_1 event_1
visit_2 event_2
visit_3 event_3
visit_4 event_4;
informat visit_: mmddyy10.;
format visit_: mmddyy10.;
cards;
1 |02/28/2020 |0 |03/05/2020 |0 |08/31/2020 |1 |. |.
2 |04/10/2020 |0 |07/01/2020 |1 |. |. |. |.
3 |03/12/2020 |0 |05/29/2020 |1 |08/05/2020 |0 |12/21/2020 |1
4 |07/07/2020 |1 |. |. |. |. |. |.
5 |09/03/2020 |0 |10/15/2020 |0 |10/31/2020 |0 |11/15/2020 |1
;
run;
PROC TRANSPOSE data=have out=want;
by id;
run;
data want;
set want(rename=(COL1=date));
by id;
where substr(_NAME_,1,6)='visit_';
where also date is not missing;
if first.id OR last.id then output;
format date mmddyy10.;
run;
data want(drop=lagdate date);
set want;
by id;
lagdate=lag(date);
duration1=INTCK('MONTH',lagdate,date,'DISCRETE');
duration2=INTCK('MONTH',lagdate,date,'CONTINUOUS');
duration1=max(duration1,0);
duration2=max(duration2,0);
if last.id then output;
run;
/* end of program */
Cheers,
Koen
Hi,
I'm sorry.
The last data step won't always work well if you have ID's with only one visit (event_1=1), like your ID=4.
Replace last data step by this data step:
data want(drop=lagdate date);
set want;
by id;
lagdate=lag(date);
duration1=0;
duration2=0;
if last.id NE first.id then do;
duration1=INTCK('MONTH',lagdate,date,'DISCRETE');
duration2=INTCK('MONTH',lagdate,date,'CONTINUOUS');
end;
if last.id then output;
run;
/* end of program */
Koen
This is a good place to use an array (of visit dates). Since your set of date variables always seem to end in event=1, you merely need to know how many valid dates are in the array (call it n_dates). Then generate the time from visit_1 to the n_dateth visit:
data have;
infile cards delimiter='|';
input
ID visit_1 event_1
visit_2 event_2
visit_3 event_3
visit_4 event_4;
informat visit_: mmddyy10.;
format visit_: mmddyy10.;
cards;
1 |02/28/2020 |0 |03/05/2020 |0 |08/31/2020 |1 |. |.
2 |04/10/2020 |0 |07/01/2020 |1 |. |. |. |.
3 |03/12/2020 |0 |05/29/2020 |1 |08/05/2020 |0 |12/21/2020 |1
4 |07/07/2020 |1 |. |. |. |. |. |.
5 |09/03/2020 |0 |10/15/2020 |0 |10/31/2020 |0 |11/15/2020 |1
;
data want;
set have;
array vis {*} visit_: ;
n_dates=n(of vis{*});
date_event=vis{n_dates};
n_months=intck('month',visit_1,date_event,'continuous');
format date_event mmddyy10. ;
run;
Your description of what you've got and your problem statement was pretty clear. Only thing missing to make this "perfect" would have been an actual SAS data step creating the sample data.
Is below giving you what you're after? Assuming that the before/after time period you want to use should always be the same and not depend on calendar month duration I've interpreted your 6 months as 182 days.
data have;
attrib
id informat=$10.
visit_1 informat=mmddyy10. format=date9.
event_1 informat=$2.
visit_2 informat=mmddyy10. format=date9.
event_2 informat=$2.
visit_3 informat=mmddyy10. format=date9.
event_3 informat=$2.
visit_4 informat=mmddyy10. format=date9.
event_4 informat=$2.
;
infile datalines truncover dlm=' ' dsd;
input
ID $
visit_1 event_1
visit_2 event_2
visit_3 event_3
visit_4 event_4
;
;
datalines;
1 2/28/2020 0 3/05/2020 0 08/31/2020 1 . .
2 4/10/2020 0 07/01/2020 1 . . . .
3 03/12/2020 0 05/29/2020 1 08/05/2020 0 12/21/2020 1
4 07/07/2020 1 . . . . . .
5 09/03/2020 0 10/15/2020 0 10/31/2020 0 11/15/2020 1
;
data want;
length duration_in_days 8;
set have;
array _visit {*} visit_:;
array _event {*} event_:;
n_event1=whichc('1',of _event[*]);
if n_event1>=1 then
do;
duration_in_days=_visit[n_event1]-_visit[1];
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.