Solved: Re: create cycle windows from start and end date

jenim514 · Posted 02-20-2018 12:45 PM

Hi all,

I would like to create cycle windows for variables that contain several 'day' timepoints within one cycle.

So, I have values that look like this:

Visit	DATE
C1 Day 1	01AUG2017
C1 Day 5	04AUG2017
C2 Day 1	15AUG2017
C2 Day 5	19AUG2017
C3 Day 1	25AUG2017
C3 Day 15	05SEP2017
C4 Day 1	10SEP2017

And I would like to create new variables that display the start and end date within each cycle- where the end date is one day before the start of the next cycle. New variables: Cycle, Cycle Start, Cycle End. I'm not sure how to get the end date. Maybe with lag or something? Any help is greatly appreciated!!

This is the logic...

Visit	DATE	CYCLE	CYCLE START	CYCLE END
C1 Day 1	01AUG2017	Cycle 1	01AUG2015	14AUG2017
C1 Day 5	04AUG2017	Cycle 1	01AUG2015	14AUG2015
C2 Day 1	15AUG2017	Cycle 2	15AUG2017	24AUG2017
C2 Day 5	19AUG2017	Cycle 2	15AUG2017	24AUG2017
C3 Day 1	25AUG2017	Cycle 3	25AUG2017	09SEP2017
C3 Day 15	05SEP2017	Cycle 3	25AUG2017	09SEP2017
C4 Day 1	10SEP2017

but this is how it should look so I can join to other tables:

CYCLE	CYCLE_START	CYCLE_END
C1	01AUG2015	14AUG2017
C2	15AUG2017	24AUG2017
C3	25AUG2017	09SEP2017

Here is where I started for the start date...
data want
set have (keep=patient visit vsdat);
if scan(visit,3)='1' then do;
cycle_start=date;
Cycle=scan(visit,1);

PGStats · Posted 02-20-2018 05:09 PM

Keep it simple:

data have;
input patient $ (Visit1-Visit4) ($) DATE :date9.;
Visit = catx(" ", Visit1, Visit2, Visit3, Visit4);
format date date9.;
drop Visit1-Visit4;
datalines; 
A Cycle 1 Day 1 14NOV2017 
A Cycle 1 Day 8 21NOV2017 
A Cycle 1 Day 15 28NOV2017
A Cycle 1 Day 22 06DEC2017
A Cycle 2 Day 1 13DEC2017 
A Cycle 3 Day 1 22DEC2017 
B Cycle 1 Day 1 14NOV2017 
B Cycle 1 Day 8 21NOV2017 
B Cycle 1 Day 15 28NOV2017
B Cycle 1 Day 22 06DEC2017
;

proc sql;
create table want as
select
    patient,
    input(scan(Visit, 2), best.) as cycle,
    min(date) as cycle_start format=date9.,    
    max(date) as cycle_end format=date9.
from have
group by patient, calculated cycle;
quit;

PG

View solution in original post

Reeza · Posted 02-20-2018 01:38 PM

You show that you know you need lag but you don't have any usage of that in your code.

Can you show an attempt at using LAG or RETAIN?

@jenim514 wrote:

Hi all,

I would like to create cycle windows for variables that contain several 'day' timepoints within one cycle.

So, I have values that look like this:

Visit DATE

C1 Day 1 01AUG2017

C1 Day 5 04AUG2017

C2 Day 1 15AUG2017

C2 Day 5 19AUG2017

C3 Day 1 25AUG2017

C3 Day 15 05SEP2017

C4 Day 1 10SEP2017

And I would like to create new variables that display the start and end date within each cycle- where the end date is one day before the start of the next cycle. New variables: Cycle, Cycle Start, Cycle End. I'm not sure how to get the end date. Maybe with lag or something? Any help is greatly appreciated!!

This is the logic...

Visit DATE CYCLE CYCLE START CYCLE END

C1 Day 1 01AUG2017 Cycle 1 01AUG2015 14AUG2017

C1 Day 5 04AUG2017 Cycle 1 01AUG2015 14AUG2015

C2 Day 1 15AUG2017 Cycle 2 15AUG2017 24AUG2017

C2 Day 5 19AUG2017 Cycle 2 15AUG2017 24AUG2017

C3 Day 1 25AUG2017 Cycle 3 25AUG2017 09SEP2017

C3 Day 15 05SEP2017 Cycle 3 25AUG2017 09SEP2017

C4 Day 1 10SEP2017

but this is how it should look so I can join to other tables:

CYCLE CYCLE_START CYCLE_END

C1 01AUG2015 14AUG2017

C2 15AUG2017 24AUG2017

C3 25AUG2017 09SEP2017

Here is where I started for the start date...
data want
set have (keep=patient visit vsdat);
if scan(visit,3)='1' then do;
cycle_start=date;
Cycle=scan(visit,1);

ErikLund_Jensen · Posted 02-20-2018 03:38 PM

Hi

That is a bit tricky, because it requires the opposite of LAG, and there are no look-ahead functions in SAS, has probably something to do with the way observations are brought into the program vector. But years back a smart guy came up with a solution at a SAS conference.

So here wo go:

* Test data;
data have;
	length Visit $10 Date 8;
	format date date9.;
	Visit = 'C1 Day 1';	DATE='01AUG2017'd; output;
	Visit = 'C1 Day 5';	DATE='04AUG2017'd; output;
	Visit = 'C2 Day 1';	DATE='15AUG2017'd; output;
	Visit = 'C2 Day 5';	DATE='19AUG2017'd; output;
	Visit = 'C3 Day 1';	DATE='25AUG2017'd; output;
	Visit = 'C3 Day 15';DATE='05SEP2017'd; output;
	Visit = 'C4 Day 1';DATE='10SEP2017'd; output;
run;

* Create Cycle variable and make sure input is in right order;;
proc sql;
	create table inter1 as
		select Visit, Date,
		Tranwrd(scan(Visit,1,' '),'C','Cycle ') as Cycle
	from want
	order by Cycle, Date;
quit;

* Set start and end; 
* Trick simulating look-ahead by using 2 set statements, one with firstobs=2; 
data inter2 (keep=Cycle CycleStart CycleEnd);
	format CycleStart cycleEnd date9.;
	retain CycleStart;

	if eof=0 then set inter1(firstobs=2 keep=Cycle Date rename=(Cycle=nextCycle date=nextDate)) end=eof;

	set inter1; by Cycle;
	if first.Cycle then cycleStart = Date;
	if last.Cycle then do;
		cycleEnd = nextDate-1;
		if cycleStart < cycleEnd then output;
	end;
run;

* Join all cycles/dates with start/end;
proc sql;
	create table want as
		select 
		a.Visit, 
		a.Date,
		b.Cycle,
		b.cycleStart,
		b.cycleEnd
	from inter1 as a left join inter2 as b
	on a.Cycle = b.Cycle
	order by 
		a.Cycle, 
		a.Date;
quit;

jenim514 · Posted 02-20-2018 04:05 PM

I'm going to attempt this and let you know!

jenim514 · Posted 02-20-2018 04:03 PM

Maybe this is a little easier to work with.

I created a sequence variable based on the cycle and day within each subject. When a new cycle starts, sequence resets to 1.

I would like to ouput the visdat as the start or end date based on the sequence in each cycle.

So, i have formatted data to look like this:

Obs	PATIENT	VISIT	VISDAT	CYCX	DAYX	CSEQ
1	A	Cycle 1 Day 1	14NOV2017	1	1	1
2	A	Cycle 1 Day 8	21NOV2017	1	8	2
3	A	Cycle 1 Day 15	28NOV2017	1	15	3
4	A	Cycle 1 Day 22	06DEC2017	1	22	4
5	A	Cycle 2 Day 1	13DEC2017	2	1	1
6	A	Cycle 3 Day 1	22DEC2017	3	1	1
7	B	Cycle 1 Day 1	14NOV2017	1	1	1
8	B	Cycle 1 Day 8	21NOV2017	1	8	2
9	B	Cycle 1 Day 15	28NOV2017	1	15	3
10	B	Cycle 1 Day 22	06DEC2017	1	22	4

I would like this code (this is my attempt but all the dates are populating start and end...not just the first and last in each cycle):

data vs5;
set vs4;
by patient cycx cseq;
if first.cseq then cycle_start=visdat;
else cycle_start=.;
if last.cseq then cycle_end=visdat;
else cycle_end=.;
run;

to do this:

PATIENT	VISIT	VISDAT	CYCX	DAYX	CSEQ	CYCLE_START	CYCLE_END
A	Cycle 1 Day 1	14NOV2017	1	1	1	14-Nov-17
A	Cycle 1 Day 8	21NOV2017	1	8	2
A	Cycle 1 Day 15	28NOV2017	1	15	3
A	Cycle 1 Day 22	06DEC2017	1	22	4		6-Dec-17
A	Cycle 2 Day 1	13DEC2017	2	1	1	13-Dec-17	13-Dec-17
A	Cycle 3 Day 1	22DEC2017	3	1	1	22-Dec-17	22-Dec-18
B	Cycle 1 Day 1	14NOV2017	1	1	1	14-Nov-17
B	Cycle 1 Day 8	21NOV2017	1	8	2
B	Cycle 1 Day 15	28NOV2017	1	15	3
B	Cycle 1 Day 22	06DEC2017	1	22	4		6-Dec-17

PGStats · Posted 02-20-2018 05:09 PM

Keep it simple:

data have;
input patient $ (Visit1-Visit4) ($) DATE :date9.;
Visit = catx(" ", Visit1, Visit2, Visit3, Visit4);
format date date9.;
drop Visit1-Visit4;
datalines; 
A Cycle 1 Day 1 14NOV2017 
A Cycle 1 Day 8 21NOV2017 
A Cycle 1 Day 15 28NOV2017
A Cycle 1 Day 22 06DEC2017
A Cycle 2 Day 1 13DEC2017 
A Cycle 3 Day 1 22DEC2017 
B Cycle 1 Day 1 14NOV2017 
B Cycle 1 Day 8 21NOV2017 
B Cycle 1 Day 15 28NOV2017
B Cycle 1 Day 22 06DEC2017
;

proc sql;
create table want as
select
    patient,
    input(scan(Visit, 2), best.) as cycle,
    min(date) as cycle_start format=date9.,    
    max(date) as cycle_end format=date9.
from have
group by patient, calculated cycle;
quit;

PG

jenim514 · Posted 02-20-2018 07:04 PM

@PGStats this worked great! Thank you!

ErikLund_Jensen · Posted 02-21-2018 04:59 AM

Hi

I like simple solutions too, and it's fine as long as jenim514 find it useful. But is does not solve the problem raised in the original question. The problem - which made this a Little bit complicated - was to set a Cycle End-date as the day before the NEXT Cycle's Start-date, and that is not what happens.

jenim514 · Posted 02-21-2018 03:24 PM

I actually used a combination of both your suggestions (I would accept both if i could!)

This was my final code (variable names vary from sample data provided to match actual data set)

proc sql;
create table vs5 as
select
patient,
/* input(scan(Visit, 2), best.) as cycle,*/
input (scan(cyc,1),best.) as cycle,
min(visdat) as cycle_start format date9.,
max(visdat) as cycle_end format date9.
from vs4
group by patient, calculated cycle;
quit;

proc sort data=vs5;
by patient cycle;
run;

data cycle_end (drop=cycle_end next_start);
set vs5;
by patient;
set vs5 ( firstobs = 2 keep = cycle_start rename = (cycle_start = Next_start))
vs5 ( obs = 1 drop = _all_);
/*Prev_start = ifn( first.patient, (.), lag(cycle_start) );*/
Next_start = ifn( last.patient, (.), Next_start );
if next_start ^=. then new_end=next_start-1;
format new_end date9.;
if cycle=0 then new_end=cycle_start;
run;

PGStats · Posted 02-21-2018 03:34 PM

@ErikLund_Jensen, the accepted solution was based on the latest post by OP, which differed from the original in that respect. To get the day before the last cycle day, change

max(date)

to

max(min(date), intnx("DAY", max(date), -1))

PG

mkeintz · Posted 02-20-2018 06:01 PM

You want one record per complete cycle, right ? (i.e. no cycle 4 is output because there's no cycle 5 data to establish end-of-cycle-4) .

Then (1) only read in the "day 1" cases. The others are noise. (2) for each day 1 record use the lag function to get the date and id of the prior cycle:

data want (drop=visit date );
  set have;
  where scan(visit,3)='1';
  by pat_id;
  
  cycle=lag(scan(visit,1));
  start=lag(date);
  end=date-1;
  format start end date9.;
  if first.pat_id=0;
run;

Note the subsetting "if first.pat_id=0" allows you to use lagged values yet avoid outputting a cycle at the beginning of each pat_id that is contaminated by data from the prior pat_id.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Classroom Training Available!