BookmarkSubscribeRSS Feed
thanksforhelp12
Calcite | Level 5

Hello,


I have a large dataset with a series of rows, many of which share an ID. I would like to return a table with a flag for whether the entry was before or after the index date:

 

For example (in this case, index row is row where Proc=A):

 

ID         Date             Proc

1        1/1/2019          B

1        2/1/2019          A

1        3/1/2019          C

1        4/1/2019          D

2        1/1/2019          B

2        1/3/2019          B

2        1/6/2019          A

2        1/9/2019          C

 

 

ID         Date             Proc       BeforeOrAfterDate

1        1/1/2019          B                 Before

1        2/1/2019          A                 Index

1        3/1/2019          C                 After

1        4/1/2019          D                 After

2        1/1/2019          B                Before

2        1/3/2019          B                Before

2        1/6/2019          A                Index

2        1/9/2019          C                After

 

 

Additionally, as a bonus, if it is possible to calculate the difference in time between the before or after dates then that would be ideal. I suppose if this is possible then creating the flag after that is actually very straightforward (ie if datedif is negative then = before... if datedif = 0 then index...if datedif > 0 then after).

 

Thanks so much

6 REPLIES 6
ed_sas_member
Meteorite | Level 14

Hi @thanksforhelp12 

You can try the code below. Regarding the difference between dates, I understand that you want to calculate the difference between the date in row n and the date in row n-1 for each ID. You can do that using the lag function. Is that right?

data have;
	input ID Date Proc $;
	informat date MMDDYY10.;
	format date MMDDYY10.;
	cards;
1 1/1/2019 B
1 2/1/2019 A
1 3/1/2019 C
1 4/1/2019 D
2 1/1/2019 B
2 1/3/2019 B
2 1/6/2019 A
2 1/9/2019 C
	;
	run;

data list (keep= ID Index_date);
	set have;
	where proc = "A";
	rename Date = Index_date;
run;

data want;

	merge have list;
	by ID;

	length flag $6.;
	
	delta = Date-lag(Date);
	if first.ID then call missing (delta);

	if Date = Index_date then flag = "Index";
	else if Date < Index_date then flag = "Before";
	else if Date > Index_date then flag = "After";
run;
thanksforhelp12
Calcite | Level 5

Hi @ed_sas_member, thanks so much! This looks like it might work for the flags - I will give it a try!

 

Re datedif, I would like to calculate the difference between the date in row n and the date in the *index* row. This is not always row n-1. Does that make sense?

 

Thanks!

 

Edit: so in theory I should just be able to calculate datedif=date-index date, just as I am with flag! I'll give it a try.

ed_sas_member
Meteorite | Level 14

Hi @thanksforhelp12 

 

Sounds great !

In the code above, you can just replace

	delta = Date-lag(Date);
	if first.ID then call missing (delta);

by 

	delta = Date - Index_date;

and it should work.

 

Have a great day.

Best,

 

novinosrin
Tourmaline | Level 20

Hi @thanksforhelp12   Should be a straight forward SQL I'd think?

 

data have;
	input ID Date Proc $;
	informat date MMDDYY10.;
	format date MMDDYY10.;
	cards;
1 1/1/2019 B
1 2/1/2019 A
1 3/1/2019 C
1 4/1/2019 D
2 1/1/2019 B
2 1/3/2019 B
2 1/6/2019 A
2 1/9/2019 C
	;
	run;

proc sql;
create table want as
select *,case when date<max((proc='A')*date) then 'BEFORE'
when date=max((proc='A')*date) then 'INDEX' 
else 'AFTER' end as Flag,date-max((proc='A')*date) as datediff
from have
group by id
order by id,date;
quit;
mkeintz
PROC Star

Along the lines of @ed_sas_member 's suggestion you can put it in a single data step, with minimal obscurity:

 

data have;
	input ID Date Proc $;
	informat date MMDDYY10.;
	format date MMDDYY10.;
	cards;
1 1/1/2019 B
1 2/1/2019 A
1 3/1/2019 C
1 4/1/2019 D
2 1/1/2019 B
2 1/3/2019 B
2 1/6/2019 A
2 1/9/2019 C
run;

data want;
  merge have (where=(proc='A') rename=(date=index_date))
        have ;
  by id;
  length flag $6.;
	
  delta = ifn(first.id,.,dif(date));

  if Date = Index_date then flag = "Index";
  else if Date < Index_date then flag = "Before";
  else if Date > Index_date then flag = "After";
run;
  1. The program assume the data are sorted by id
  2. Instead of DATE-lag(DATE) you can use the DIF(date) function.
  3. Note that the IFN function always executes the embedded DIF (or embedded LAG), but does not always return the result of the embedded function.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data have;
	input ID Date Proc $;
	informat date MMDDYY10.;
	format date MMDDYY10.;
	cards;
1 1/1/2019 B
1 2/1/2019 A
1 3/1/2019 C
1 4/1/2019 D
2 1/1/2019 B
2 1/3/2019 B
2 1/6/2019 A
2 1/9/2019 C
	;
	run;
data want;
do i=1 by 1 until(last.id);
  set have;
  by id;
  if Proc='A' then do;_i=i;_date=date;end;
end;
do i=1 by 1 until(last.id);
  set have;
  by id;
  datedif=date-_date;
  if i<_i then flag='Before';
   else if i=_i then flag='Index ';
    else flag='After';
  output;
end;
drop i _:;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1539 views
  • 3 likes
  • 5 in conversation