Hi all,
I would like to create cycle windows for variables that contain several 'day' timepoints within one cycle.
So, I have values that look like this:
Visit | DATE |
C1 Day 1 | 01AUG2017 |
C1 Day 5 | 04AUG2017 |
C2 Day 1 | 15AUG2017 |
C2 Day 5 | 19AUG2017 |
C3 Day 1 | 25AUG2017 |
C3 Day 15 | 05SEP2017 |
C4 Day 1 | 10SEP2017 |
And I would like to create new variables that display the start and end date within each cycle- where the end date is one day before the start of the next cycle. New variables: Cycle, Cycle Start, Cycle End. I'm not sure how to get the end date. Maybe with lag or something? Any help is greatly appreciated!!
This is the logic...
Visit | DATE | CYCLE | CYCLE START | CYCLE END |
C1 Day 1 | 01AUG2017 | Cycle 1 | 01AUG2015 | 14AUG2017 |
C1 Day 5 | 04AUG2017 | Cycle 1 | 01AUG2015 | 14AUG2015 |
C2 Day 1 | 15AUG2017 | Cycle 2 | 15AUG2017 | 24AUG2017 |
C2 Day 5 | 19AUG2017 | Cycle 2 | 15AUG2017 | 24AUG2017 |
C3 Day 1 | 25AUG2017 | Cycle 3 | 25AUG2017 | 09SEP2017 |
C3 Day 15 | 05SEP2017 | Cycle 3 | 25AUG2017 | 09SEP2017 |
C4 Day 1 | 10SEP2017 |
but this is how it should look so I can join to other tables:
CYCLE | CYCLE_START | CYCLE_END |
C1 | 01AUG2015 | 14AUG2017 |
C2 | 15AUG2017 | 24AUG2017 |
C3 | 25AUG2017 | 09SEP2017 |
Here is where I started for the start date...
data want
set have (keep=patient visit vsdat);
if scan(visit,3)='1' then do;
cycle_start=date;
Cycle=scan(visit,1);
Keep it simple:
data have;
input patient $ (Visit1-Visit4) ($) DATE :date9.;
Visit = catx(" ", Visit1, Visit2, Visit3, Visit4);
format date date9.;
drop Visit1-Visit4;
datalines;
A Cycle 1 Day 1 14NOV2017
A Cycle 1 Day 8 21NOV2017
A Cycle 1 Day 15 28NOV2017
A Cycle 1 Day 22 06DEC2017
A Cycle 2 Day 1 13DEC2017
A Cycle 3 Day 1 22DEC2017
B Cycle 1 Day 1 14NOV2017
B Cycle 1 Day 8 21NOV2017
B Cycle 1 Day 15 28NOV2017
B Cycle 1 Day 22 06DEC2017
;
proc sql;
create table want as
select
patient,
input(scan(Visit, 2), best.) as cycle,
min(date) as cycle_start format=date9.,
max(date) as cycle_end format=date9.
from have
group by patient, calculated cycle;
quit;
You show that you know you need lag but you don't have any usage of that in your code.
Can you show an attempt at using LAG or RETAIN?
@jenim514 wrote:
Hi all,
I would like to create cycle windows for variables that contain several 'day' timepoints within one cycle.
So, I have values that look like this:
Visit DATE C1 Day 1 01AUG2017 C1 Day 5 04AUG2017 C2 Day 1 15AUG2017 C2 Day 5 19AUG2017 C3 Day 1 25AUG2017 C3 Day 15 05SEP2017 C4 Day 1 10SEP2017
And I would like to create new variables that display the start and end date within each cycle- where the end date is one day before the start of the next cycle. New variables: Cycle, Cycle Start, Cycle End. I'm not sure how to get the end date. Maybe with lag or something? Any help is greatly appreciated!!
This is the logic...
Visit DATE CYCLE CYCLE START CYCLE END C1 Day 1 01AUG2017 Cycle 1 01AUG2015 14AUG2017 C1 Day 5 04AUG2017 Cycle 1 01AUG2015 14AUG2015 C2 Day 1 15AUG2017 Cycle 2 15AUG2017 24AUG2017 C2 Day 5 19AUG2017 Cycle 2 15AUG2017 24AUG2017 C3 Day 1 25AUG2017 Cycle 3 25AUG2017 09SEP2017 C3 Day 15 05SEP2017 Cycle 3 25AUG2017 09SEP2017 C4 Day 1 10SEP2017
but this is how it should look so I can join to other tables:
CYCLE CYCLE_START CYCLE_END C1 01AUG2015 14AUG2017 C2 15AUG2017 24AUG2017 C3 25AUG2017 09SEP2017
Here is where I started for the start date...
data want
set have (keep=patient visit vsdat);
if scan(visit,3)='1' then do;
cycle_start=date;
Cycle=scan(visit,1);
Hi
That is a bit tricky, because it requires the opposite of LAG, and there are no look-ahead functions in SAS, has probably something to do with the way observations are brought into the program vector. But years back a smart guy came up with a solution at a SAS conference.
So here wo go:
* Test data; data have; length Visit $10 Date 8; format date date9.; Visit = 'C1 Day 1'; DATE='01AUG2017'd; output; Visit = 'C1 Day 5'; DATE='04AUG2017'd; output; Visit = 'C2 Day 1'; DATE='15AUG2017'd; output; Visit = 'C2 Day 5'; DATE='19AUG2017'd; output; Visit = 'C3 Day 1'; DATE='25AUG2017'd; output; Visit = 'C3 Day 15';DATE='05SEP2017'd; output; Visit = 'C4 Day 1';DATE='10SEP2017'd; output; run; * Create Cycle variable and make sure input is in right order;; proc sql; create table inter1 as select Visit, Date, Tranwrd(scan(Visit,1,' '),'C','Cycle ') as Cycle from want order by Cycle, Date; quit; * Set start and end; * Trick simulating look-ahead by using 2 set statements, one with firstobs=2; data inter2 (keep=Cycle CycleStart CycleEnd); format CycleStart cycleEnd date9.; retain CycleStart; if eof=0 then set inter1(firstobs=2 keep=Cycle Date rename=(Cycle=nextCycle date=nextDate)) end=eof; set inter1; by Cycle; if first.Cycle then cycleStart = Date; if last.Cycle then do; cycleEnd = nextDate-1; if cycleStart < cycleEnd then output; end; run; * Join all cycles/dates with start/end; proc sql; create table want as select a.Visit, a.Date, b.Cycle, b.cycleStart, b.cycleEnd from inter1 as a left join inter2 as b on a.Cycle = b.Cycle order by a.Cycle, a.Date; quit;
Maybe this is a little easier to work with.
I created a sequence variable based on the cycle and day within each subject. When a new cycle starts, sequence resets to 1.
I would like to ouput the visdat as the start or end date based on the sequence in each cycle.
So, i have formatted data to look like this:
Obs | PATIENT | VISIT | VISDAT | CYCX | DAYX | CSEQ |
1 | A | Cycle 1 Day 1 | 14NOV2017 | 1 | 1 | 1 |
2 | A | Cycle 1 Day 8 | 21NOV2017 | 1 | 8 | 2 |
3 | A | Cycle 1 Day 15 | 28NOV2017 | 1 | 15 | 3 |
4 | A | Cycle 1 Day 22 | 06DEC2017 | 1 | 22 | 4 |
5 | A | Cycle 2 Day 1 | 13DEC2017 | 2 | 1 | 1 |
6 | A | Cycle 3 Day 1 | 22DEC2017 | 3 | 1 | 1 |
7 | B | Cycle 1 Day 1 | 14NOV2017 | 1 | 1 | 1 |
8 | B | Cycle 1 Day 8 | 21NOV2017 | 1 | 8 | 2 |
9 | B | Cycle 1 Day 15 | 28NOV2017 | 1 | 15 | 3 |
10 | B | Cycle 1 Day 22 | 06DEC2017 | 1 | 22 | 4 |
I would like this code (this is my attempt but all the dates are populating start and end...not just the first and last in each cycle):
data vs5;
set vs4;
by patient cycx cseq;
if first.cseq then cycle_start=visdat;
else cycle_start=.;
if last.cseq then cycle_end=visdat;
else cycle_end=.;
run;
to do this:
PATIENT | VISIT | VISDAT | CYCX | DAYX | CSEQ | CYCLE_START | CYCLE_END |
A | Cycle 1 Day 1 | 14NOV2017 | 1 | 1 | 1 | 14-Nov-17 | |
A | Cycle 1 Day 8 | 21NOV2017 | 1 | 8 | 2 | ||
A | Cycle 1 Day 15 | 28NOV2017 | 1 | 15 | 3 | ||
A | Cycle 1 Day 22 | 06DEC2017 | 1 | 22 | 4 | 6-Dec-17 | |
A | Cycle 2 Day 1 | 13DEC2017 | 2 | 1 | 1 | 13-Dec-17 | 13-Dec-17 |
A | Cycle 3 Day 1 | 22DEC2017 | 3 | 1 | 1 | 22-Dec-17 | 22-Dec-18 |
B | Cycle 1 Day 1 | 14NOV2017 | 1 | 1 | 1 | 14-Nov-17 | |
B | Cycle 1 Day 8 | 21NOV2017 | 1 | 8 | 2 | ||
B | Cycle 1 Day 15 | 28NOV2017 | 1 | 15 | 3 | ||
B | Cycle 1 Day 22 | 06DEC2017 | 1 | 22 | 4 | 6-Dec-17 |
Keep it simple:
data have;
input patient $ (Visit1-Visit4) ($) DATE :date9.;
Visit = catx(" ", Visit1, Visit2, Visit3, Visit4);
format date date9.;
drop Visit1-Visit4;
datalines;
A Cycle 1 Day 1 14NOV2017
A Cycle 1 Day 8 21NOV2017
A Cycle 1 Day 15 28NOV2017
A Cycle 1 Day 22 06DEC2017
A Cycle 2 Day 1 13DEC2017
A Cycle 3 Day 1 22DEC2017
B Cycle 1 Day 1 14NOV2017
B Cycle 1 Day 8 21NOV2017
B Cycle 1 Day 15 28NOV2017
B Cycle 1 Day 22 06DEC2017
;
proc sql;
create table want as
select
patient,
input(scan(Visit, 2), best.) as cycle,
min(date) as cycle_start format=date9.,
max(date) as cycle_end format=date9.
from have
group by patient, calculated cycle;
quit;
Hi
I like simple solutions too, and it's fine as long as jenim514 find it useful. But is does not solve the problem raised in the original question. The problem - which made this a Little bit complicated - was to set a Cycle End-date as the day before the NEXT Cycle's Start-date, and that is not what happens.
I actually used a combination of both your suggestions (I would accept both if i could!)
This was my final code (variable names vary from sample data provided to match actual data set)
proc sql;
create table vs5 as
select
patient,
/* input(scan(Visit, 2), best.) as cycle,*/
input (scan(cyc,1),best.) as cycle,
min(visdat) as cycle_start format date9.,
max(visdat) as cycle_end format date9.
from vs4
group by patient, calculated cycle;
quit;
proc sort data=vs5;
by patient cycle;
run;
data cycle_end (drop=cycle_end next_start);
set vs5;
by patient;
set vs5 ( firstobs = 2 keep = cycle_start rename = (cycle_start = Next_start))
vs5 ( obs = 1 drop = _all_);
/*Prev_start = ifn( first.patient, (.), lag(cycle_start) );*/
Next_start = ifn( last.patient, (.), Next_start );
if next_start ^=. then new_end=next_start-1;
format new_end date9.;
if cycle=0 then new_end=cycle_start;
run;
@ErikLund_Jensen, the accepted solution was based on the latest post by OP, which differed from the original in that respect. To get the day before the last cycle day, change
max(date)
to
max(min(date), intnx("DAY", max(date), -1))
You want one record per complete cycle, right ? (i.e. no cycle 4 is output because there's no cycle 5 data to establish end-of-cycle-4) .
Then (1) only read in the "day 1" cases. The others are noise. (2) for each day 1 record use the lag function to get the date and id of the prior cycle:
data want (drop=visit date );
set have;
where scan(visit,3)='1';
by pat_id;
cycle=lag(scan(visit,1));
start=lag(date);
end=date-1;
format start end date9.;
if first.pat_id=0;
run;
Note the subsetting "if first.pat_id=0" allows you to use lagged values yet avoid outputting a cycle at the beginning of each pat_id that is contaminated by data from the prior pat_id.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.