BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jenim514
Pyrite | Level 9

Hi all,

 

I would like to create cycle windows for variables that contain several 'day' timepoints within one cycle.

 

So, I have values that look like this:

 

Visit DATE
C1 Day 1 01AUG2017
C1 Day  5 04AUG2017
C2 Day 1 15AUG2017
C2 Day 5 19AUG2017
C3 Day 1 25AUG2017
C3 Day 15 05SEP2017
C4 Day 1 10SEP2017

 

 

And I would like to create new variables that display the start and end date within each cycle- where the end date is one day before the start of the next cycle.  New variables:  Cycle, Cycle Start, Cycle End.  I'm not sure how to get the end date.  Maybe with lag or something?  Any help is greatly appreciated!!

 

 

This is the logic...

 

Visit DATE CYCLE CYCLE START CYCLE END
C1 Day 1 01AUG2017 Cycle 1 01AUG2015 14AUG2017
C1 Day  5 04AUG2017 Cycle 1 01AUG2015 14AUG2015
C2 Day 1 15AUG2017 Cycle 2 15AUG2017 24AUG2017
C2 Day 5 19AUG2017 Cycle 2 15AUG2017 24AUG2017
C3 Day 1 25AUG2017 Cycle 3 25AUG2017 09SEP2017
C3 Day 15 05SEP2017 Cycle 3 25AUG2017 09SEP2017
C4 Day 1 10SEP2017      

 

 

but this is how it should look so I can join to other tables:

 

CYCLE  CYCLE_START CYCLE_END
C1 01AUG2015 14AUG2017
C2 15AUG2017 24AUG2017
C3 25AUG2017 09SEP2017

 

Here is where I started for the start date...
data want
set have (keep=patient visit vsdat);
if scan(visit,3)='1' then do;
cycle_start=date;
Cycle=scan(visit,1);

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Keep it simple:

 

data have;
input patient $ (Visit1-Visit4) ($) DATE :date9.;
Visit = catx(" ", Visit1, Visit2, Visit3, Visit4);
format date date9.;
drop Visit1-Visit4;
datalines; 
A Cycle 1 Day 1 14NOV2017 
A Cycle 1 Day 8 21NOV2017 
A Cycle 1 Day 15 28NOV2017
A Cycle 1 Day 22 06DEC2017
A Cycle 2 Day 1 13DEC2017 
A Cycle 3 Day 1 22DEC2017 
B Cycle 1 Day 1 14NOV2017 
B Cycle 1 Day 8 21NOV2017 
B Cycle 1 Day 15 28NOV2017
B Cycle 1 Day 22 06DEC2017
;

proc sql;
create table want as
select
    patient,
    input(scan(Visit, 2), best.) as cycle,
    min(date) as cycle_start format=date9.,    
    max(date) as cycle_end format=date9.
from have
group by patient, calculated cycle;
quit;
PG

View solution in original post

10 REPLIES 10
Reeza
Super User

You show that you know you need lag but you don't have any usage of that in your code.

 

Can you show an attempt at using LAG or RETAIN?

 


@jenim514 wrote:

Hi all,

 

I would like to create cycle windows for variables that contain several 'day' timepoints within one cycle.

 

So, I have values that look like this:

 

Visit DATE
C1 Day 1 01AUG2017
C1 Day  5 04AUG2017
C2 Day 1 15AUG2017
C2 Day 5 19AUG2017
C3 Day 1 25AUG2017
C3 Day 15 05SEP2017
C4 Day 1 10SEP2017

 

 

And I would like to create new variables that display the start and end date within each cycle- where the end date is one day before the start of the next cycle.  New variables:  Cycle, Cycle Start, Cycle End.  I'm not sure how to get the end date.  Maybe with lag or something?  Any help is greatly appreciated!!

 

 

This is the logic...

 

Visit DATE CYCLE CYCLE START CYCLE END
C1 Day 1 01AUG2017 Cycle 1 01AUG2015 14AUG2017
C1 Day  5 04AUG2017 Cycle 1 01AUG2015 14AUG2015
C2 Day 1 15AUG2017 Cycle 2 15AUG2017 24AUG2017
C2 Day 5 19AUG2017 Cycle 2 15AUG2017 24AUG2017
C3 Day 1 25AUG2017 Cycle 3 25AUG2017 09SEP2017
C3 Day 15 05SEP2017 Cycle 3 25AUG2017 09SEP2017
C4 Day 1 10SEP2017      

 

 

but this is how it should look so I can join to other tables:

 

CYCLE  CYCLE_START CYCLE_END
C1 01AUG2015 14AUG2017
C2 15AUG2017 24AUG2017
C3 25AUG2017 09SEP2017

 

Here is where I started for the start date...
data want
set have (keep=patient visit vsdat);
if scan(visit,3)='1' then do;
cycle_start=date;
Cycle=scan(visit,1);


 

ErikLund_Jensen
Rhodochrosite | Level 12

Hi

That is a bit tricky, because it requires the opposite of LAG, and there are no look-ahead functions in SAS, has probably something to do with the way observations are brought into the program vector. But years back a smart guy came up with a solution at a SAS conference.

 

So here wo go:

* Test data;
data have;
	length Visit $10 Date 8;
	format date date9.;
	Visit = 'C1 Day 1';	DATE='01AUG2017'd; output;
	Visit = 'C1 Day 5';	DATE='04AUG2017'd; output;
	Visit = 'C2 Day 1';	DATE='15AUG2017'd; output;
	Visit = 'C2 Day 5';	DATE='19AUG2017'd; output;
	Visit = 'C3 Day 1';	DATE='25AUG2017'd; output;
	Visit = 'C3 Day 15';DATE='05SEP2017'd; output;
	Visit = 'C4 Day 1';DATE='10SEP2017'd; output;
run;

* Create Cycle variable and make sure input is in right order;;
proc sql;
	create table inter1 as
		select Visit, Date,
		Tranwrd(scan(Visit,1,' '),'C','Cycle ') as Cycle
	from want
	order by Cycle, Date;
quit;

* Set start and end; 
* Trick simulating look-ahead by using 2 set statements, one with firstobs=2; 
data inter2 (keep=Cycle CycleStart CycleEnd);
	format CycleStart cycleEnd date9.;
	retain CycleStart;

	if eof=0 then set inter1(firstobs=2 keep=Cycle Date rename=(Cycle=nextCycle date=nextDate)) end=eof;

	set inter1; by Cycle;
	if first.Cycle then cycleStart = Date;
	if last.Cycle then do;
		cycleEnd = nextDate-1;
		if cycleStart < cycleEnd then output;
	end;
run;

* Join all cycles/dates with start/end;
proc sql;
	create table want as
		select 
		a.Visit, 
		a.Date,
		b.Cycle,
		b.cycleStart,
		b.cycleEnd
	from inter1 as a left join inter2 as b
	on a.Cycle = b.Cycle
	order by 
		a.Cycle, 
		a.Date;
quit;

 

jenim514
Pyrite | Level 9
I'm going to attempt this and let you know!
jenim514
Pyrite | Level 9

Maybe this is a little easier to work with. 

I created a sequence variable based on the cycle and day within each subject.  When a new cycle starts, sequence resets to 1.

 

I would like to ouput the visdat as the start or end date  based on the sequence in each cycle.

 

So, i have formatted data to look like this:

Obs PATIENT VISIT VISDAT CYCX DAYX CSEQ
1 A Cycle 1 Day 1 14NOV2017 1 1 1
2 A Cycle 1 Day 8 21NOV2017 1 8 2
3 A Cycle 1 Day 15 28NOV2017 1 15 3
4 A Cycle 1 Day 22 06DEC2017 1 22 4
5 A Cycle 2 Day 1 13DEC2017 2 1 1
6 A Cycle 3 Day 1 22DEC2017 3 1 1
7 B Cycle 1 Day 1 14NOV2017 1 1 1
8 B Cycle 1 Day 8 21NOV2017 1 8 2
9 B Cycle 1 Day 15 28NOV2017 1 15 3
10 B Cycle 1 Day 22 06DEC2017 1 22 4

 

I would like this code (this is my attempt but all the dates are populating start and end...not just the first and last in each cycle):

 

data vs5;
set vs4;
by patient cycx cseq;
if first.cseq then cycle_start=visdat;
else cycle_start=.;
if last.cseq then cycle_end=visdat;
else cycle_end=.;
run;

 

to do this:

 

PATIENT VISIT VISDAT CYCX DAYX CSEQ CYCLE_START CYCLE_END
A Cycle 1 Day 1 14NOV2017 1 1 1 14-Nov-17  
A Cycle 1 Day 8 21NOV2017 1 8 2    
A Cycle 1 Day 15 28NOV2017 1 15 3    
A Cycle 1 Day 22 06DEC2017 1 22 4   6-Dec-17
A Cycle 2 Day 1 13DEC2017 2 1 1 13-Dec-17 13-Dec-17
A Cycle 3 Day 1 22DEC2017 3 1 1 22-Dec-17 22-Dec-18
B Cycle 1 Day 1 14NOV2017 1 1 1 14-Nov-17  
B Cycle 1 Day 8 21NOV2017 1 8 2    
B Cycle 1 Day 15 28NOV2017 1 15 3    
B Cycle 1 Day 22 06DEC2017 1 22 4   6-Dec-17
PGStats
Opal | Level 21

Keep it simple:

 

data have;
input patient $ (Visit1-Visit4) ($) DATE :date9.;
Visit = catx(" ", Visit1, Visit2, Visit3, Visit4);
format date date9.;
drop Visit1-Visit4;
datalines; 
A Cycle 1 Day 1 14NOV2017 
A Cycle 1 Day 8 21NOV2017 
A Cycle 1 Day 15 28NOV2017
A Cycle 1 Day 22 06DEC2017
A Cycle 2 Day 1 13DEC2017 
A Cycle 3 Day 1 22DEC2017 
B Cycle 1 Day 1 14NOV2017 
B Cycle 1 Day 8 21NOV2017 
B Cycle 1 Day 15 28NOV2017
B Cycle 1 Day 22 06DEC2017
;

proc sql;
create table want as
select
    patient,
    input(scan(Visit, 2), best.) as cycle,
    min(date) as cycle_start format=date9.,    
    max(date) as cycle_end format=date9.
from have
group by patient, calculated cycle;
quit;
PG
jenim514
Pyrite | Level 9
@PGStats this worked great! Thank you!
ErikLund_Jensen
Rhodochrosite | Level 12

Hi

 

I like simple solutions too, and it's fine as long as jenim514 find it useful. But is does not solve the problem raised in the original question. The problem - which made this a Little bit complicated - was to set a Cycle End-date as the day before the NEXT Cycle's Start-date, and that is not what happens.

jenim514
Pyrite | Level 9

I actually used a combination of both your suggestions (I would accept both if i could!)

 

This was my final code  (variable names vary from sample data provided to match actual data set)


proc sql;
create table vs5 as
select
patient,
/* input(scan(Visit, 2), best.) as cycle,*/
input (scan(cyc,1),best.) as cycle,
min(visdat) as cycle_start format date9.,
max(visdat) as cycle_end format date9.
from vs4
group by patient, calculated cycle;
quit;

proc sort data=vs5;
by patient cycle;
run;

 

 

data cycle_end (drop=cycle_end next_start);
set vs5;
by patient;
set vs5 ( firstobs = 2 keep = cycle_start rename = (cycle_start = Next_start))
vs5 ( obs = 1 drop = _all_);
/*Prev_start = ifn( first.patient, (.), lag(cycle_start) );*/
Next_start = ifn( last.patient, (.), Next_start );
if next_start ^=. then new_end=next_start-1;
format new_end date9.;
if cycle=0 then new_end=cycle_start;
run;

PGStats
Opal | Level 21

@ErikLund_Jensen, the accepted solution was based on the latest post by OP, which differed from the original in that respect. To get the day before the last cycle day, change

 

max(date)

 

to 

 

max(min(date), intnx("DAY", max(date), -1))

 

PG
mkeintz
PROC Star

You want one record per complete cycle, right ?  (i.e. no cycle 4 is output because there's no cycle 5 data to establish end-of-cycle-4) .

 

Then (1) only read in the "day 1" cases.  The others are noise.  (2) for each day 1 record use the lag function to get the date and id of  the prior cycle:

 

data want (drop=visit date );
  set have;
  where scan(visit,3)='1';
  by pat_id;
  
  cycle=lag(scan(visit,1));
  start=lag(date);
  end=date-1;
  format start end date9.;
  if first.pat_id=0;
run;

 

Note the subsetting "if first.pat_id=0" allows you to use lagged values yet avoid outputting a cycle at the beginning of each pat_id that is contaminated by data from the prior pat_id.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1311 views
  • 0 likes
  • 5 in conversation