Hello,
I have a large dataset with a series of rows, many of which share an ID. I would like to return a table with a flag for whether the entry was before or after the index date:
For example (in this case, index row is row where Proc=A):
ID Date Proc
1 1/1/2019 B
1 2/1/2019 A
1 3/1/2019 C
1 4/1/2019 D
2 1/1/2019 B
2 1/3/2019 B
2 1/6/2019 A
2 1/9/2019 C
ID Date Proc BeforeOrAfterDate
1 1/1/2019 B Before
1 2/1/2019 A Index
1 3/1/2019 C After
1 4/1/2019 D After
2 1/1/2019 B Before
2 1/3/2019 B Before
2 1/6/2019 A Index
2 1/9/2019 C After
Additionally, as a bonus, if it is possible to calculate the difference in time between the before or after dates then that would be ideal. I suppose if this is possible then creating the flag after that is actually very straightforward (ie if datedif is negative then = before... if datedif = 0 then index...if datedif > 0 then after).
Thanks so much
You can try the code below. Regarding the difference between dates, I understand that you want to calculate the difference between the date in row n and the date in row n-1 for each ID. You can do that using the lag function. Is that right?
data have;
input ID Date Proc $;
informat date MMDDYY10.;
format date MMDDYY10.;
cards;
1 1/1/2019 B
1 2/1/2019 A
1 3/1/2019 C
1 4/1/2019 D
2 1/1/2019 B
2 1/3/2019 B
2 1/6/2019 A
2 1/9/2019 C
;
run;
data list (keep= ID Index_date);
set have;
where proc = "A";
rename Date = Index_date;
run;
data want;
merge have list;
by ID;
length flag $6.;
delta = Date-lag(Date);
if first.ID then call missing (delta);
if Date = Index_date then flag = "Index";
else if Date < Index_date then flag = "Before";
else if Date > Index_date then flag = "After";
run;
Hi @ed_sas_member, thanks so much! This looks like it might work for the flags - I will give it a try!
Re datedif, I would like to calculate the difference between the date in row n and the date in the *index* row. This is not always row n-1. Does that make sense?
Thanks!
Edit: so in theory I should just be able to calculate datedif=date-index date, just as I am with flag! I'll give it a try.
Sounds great !
In the code above, you can just replace
delta = Date-lag(Date);
if first.ID then call missing (delta);
by
delta = Date - Index_date;
and it should work.
Have a great day.
Best,
Hi @thanksforhelp12 Should be a straight forward SQL I'd think?
data have;
input ID Date Proc $;
informat date MMDDYY10.;
format date MMDDYY10.;
cards;
1 1/1/2019 B
1 2/1/2019 A
1 3/1/2019 C
1 4/1/2019 D
2 1/1/2019 B
2 1/3/2019 B
2 1/6/2019 A
2 1/9/2019 C
;
run;
proc sql;
create table want as
select *,case when date<max((proc='A')*date) then 'BEFORE'
when date=max((proc='A')*date) then 'INDEX'
else 'AFTER' end as Flag,date-max((proc='A')*date) as datediff
from have
group by id
order by id,date;
quit;
Along the lines of @ed_sas_member 's suggestion you can put it in a single data step, with minimal obscurity:
data have;
input ID Date Proc $;
informat date MMDDYY10.;
format date MMDDYY10.;
cards;
1 1/1/2019 B
1 2/1/2019 A
1 3/1/2019 C
1 4/1/2019 D
2 1/1/2019 B
2 1/3/2019 B
2 1/6/2019 A
2 1/9/2019 C
run;
data want;
merge have (where=(proc='A') rename=(date=index_date))
have ;
by id;
length flag $6.;
delta = ifn(first.id,.,dif(date));
if Date = Index_date then flag = "Index";
else if Date < Index_date then flag = "Before";
else if Date > Index_date then flag = "After";
run;
data have;
input ID Date Proc $;
informat date MMDDYY10.;
format date MMDDYY10.;
cards;
1 1/1/2019 B
1 2/1/2019 A
1 3/1/2019 C
1 4/1/2019 D
2 1/1/2019 B
2 1/3/2019 B
2 1/6/2019 A
2 1/9/2019 C
;
run;
data want;
do i=1 by 1 until(last.id);
set have;
by id;
if Proc='A' then do;_i=i;_date=date;end;
end;
do i=1 by 1 until(last.id);
set have;
by id;
datedif=date-_date;
if i<_i then flag='Before';
else if i=_i then flag='Index ';
else flag='After';
output;
end;
drop i _:;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.