BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Gib
Fluorite | Level 6 Gib
Fluorite | Level 6

Hi,

 

I have the following dataset, where the main interests are ID (refers to patient ID number) and date_start (refers to start day of treatment):

 

Row

id

atc

date_start

date_stop

1

1

A

2010-07-02

2011-08-17

2

1

B

2011-08-17

2011-09-09

3

1

C

2011-09-09

2011-11-10

4

1

D

2011-11-14

2012-02-08

5

2

E

2007-03-17

2009-07-01

6

2

F

2007-06-29

2009-11-12

7

2

A

2007-07-17

2010-06-22

8

2

B

2007-08-14

2010-11-04

9

2

A

2007-09-04

2011-02-17

10

2

D

2008-02-18

2011-06-20

11

2

F

2008-03-21

2011-08-19

12

2

A

2008-04-15

2011-09-28

13

2

B

2008-04-29

2011-10-18

14

2

K

2008-05-15

2011-12-01

 

I would like to create a new numeric variable called LINE (corresponds to the line of treatment). The LINE should be counted within the group of ID, but the number of lines are only counted if the date_start is higher than the first date_start + 60 within the LINE variable.

 

This is the desired output:

 

To exemplify what I would like to achieve please note: in row number 10-12 the line number is 4 because both the row 11 and 12 is less then date_start+60 of the row 10. But in row 13 the line is 5 because its date_start is more than date_start+60 of the row 10.

 

Row

id

atc

date_start

date_stop

Line

1

1

A

2010-07-02

2011-08-17

1

2

1

B

2011-08-17

2011-09-09

2

3

1

C

2011-09-09

2011-11-10

2

4

1

D

2011-11-14

2012-02-08

3

5

2

E

2007-03-17

2009-07-01

1

6

2

F

2007-06-29

2009-11-12

2

7

2

A

2007-07-17

2010-06-22

2

8

2

B

2007-08-14

2010-11-04

2

9

2

A

2007-09-04

2011-02-17

3

10

2

D

2008-02-18

2011-06-20

4

11

2

F

2008-03-21

2011-08-19

4

12

2

A

2008-04-15

2011-09-28

4

13

2

B

2008-04-29

2011-10-18

5

14

2

K

2008-05-15

2011-12-01

5

 

Do you have any suggestion how this can be done?

 

SAS 9.3

 

Thanks,
Gib

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

A similar approach, but with a slightly different interpretation of the requirements:

 

data want;

set have;

by id;

if first.id then do;

   line=1;

   baseline_date = date_start;

end;

else if date_start - baseline_date > 60 then do;

   line + 1;

   baseline_date = date_start;

end;

retain baseline_date;

drop baseline_date;

run;

View solution in original post

6 REPLIES 6
ballardw
Super User

I have to guess by +60 you mean 60 days. since 60 could be weeks, months, years, ...

/* assumes your data set is sorted by Id and date_start
   AND that date_start is a SAS date value and not character
*/
data want;
   set have;
   by id;
   retain line .;
   lds = lag(date_start);
   if first.id then line=1;
   else if start_date > (lds+60) then line+1;
drop lds; run;

If your dates aren't numeric SAS date values then you need to create some.

 

Gib
Fluorite | Level 6 Gib
Fluorite | Level 6

Hi,

Thanks for your reply. Yes, you assumed correctly, I meant 60 days by 60.

 

The suggested code does not work unfortunately, because it only takes into account the previous observation with the lag function. The code gives a value of 4 of LINE variable in the row 13, when it should be 5 because when comparing the date_start + 60 the reference should be the first date_start of the LINE variable.

 

Do you have an other suggestion maybe?

 

Row

id

atc

date_start

date_stop

Line

1

1

A

2010-07-02

2011-08-17

1

2

1

B

2011-08-17

2011-09-09

2

3

1

C

2011-09-09

2011-11-10

2

4

1

D

2011-11-14

2012-02-08

3

5

2

E

2007-03-17

2009-07-01

1

6

2

F

2007-06-29

2009-11-12

2

7

2

A

2007-07-17

2010-06-22

2

8

2

B

2007-08-14

2010-11-04

2

9

2

A

2007-09-04

2011-02-17

3

10

2

D

2008-02-18

2011-06-20

4

11

2

F

2008-03-21

2011-08-19

4

12

2

A

2008-04-15

2011-09-28

4

13

2

B

2008-04-29

2011-10-18

5

14

2

K

2008-05-15

2011-12-01

5

 

Thanks,
Gib

Astounding
PROC Star

A similar approach, but with a slightly different interpretation of the requirements:

 

data want;

set have;

by id;

if first.id then do;

   line=1;

   baseline_date = date_start;

end;

else if date_start - baseline_date > 60 then do;

   line + 1;

   baseline_date = date_start;

end;

retain baseline_date;

drop baseline_date;

run;

Gib
Fluorite | Level 6 Gib
Fluorite | Level 6

To be honest I do not understand the logic behind the code but it works as desired, thank you!


Astounding
PROC Star

All of the techniques and tools used here are frequently used, and well worth learning if you don't know them:

 

The effect of a BY statement within a DATA step

RETAIN statement

DROP statement

How SAS stores dates

Sum statement (such as line + 1; )

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13
these are the first steps in any programming language.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1733 views
  • 3 likes
  • 4 in conversation