I am looking at a data set of Emergency Room visits. I only want to keep visits per ID that are 30 days apart. So as an example say I have this below. If I start with ID=1. In Row 1 I can see that the lag between row 1 and 2 is 15 days so I will exclude, or for now flag, row 2. Then I will continue to use Row 1 to evaluate Row 3. Again this is only 17 days so I will exclude Row 3 and look at Row 4. Row 4 is 30 days away so I keep it and then use Row 4 to evaluate Row 5....and so on. I have been trying to do this with the lag function but I can't figure out how to utilize the lag when I have to continue to use the 'anchor' row to evaluate several rows. Top is what I have and bottom is what I want. Any ideas??? If anyone knows how to do this using Proc SQL that would be even better. Much appreciated.
HAVE | ||
Row # | ID | |
1 | 1 | 1/1/2020 |
2 | 1 | 1/15/2020 |
3 | 1 | 1/17/2020 |
4 | 1 | 2/4/2020 |
5 | 1 | 3/15/2020 |
6 | 2 | 1/15/2020 |
7 | 2 | 3/15/2020 |
8 | 2 | 3/18/2020 |
WANT | ||
Row # | ID | |
1 | 1 | 1/1/2020 |
4 | 1 | 2/4/2020 |
5 | 1 | 3/15/2020 |
6 | 2 | 1/15/2020 |
7 | 2 | 3/15/2020 |
Hi @LOLO It's a straight forward datastep solution. In my humble opinion Proc SQL approach would be very inefficient involving more I/O , join , grouping and beyond. I'd recommend to stick to more appropriate solution
data have;
input row id date :mmddyy10.;
format date mmddyy10.;
cards;
1 1 1/1/2020
2 1 1/15/2020
3 1 1/17/2020
4 1 2/4/2020
5 1 3/15/2020
6 2 1/15/2020
7 2 3/15/2020
8 2 3/18/2020
;
data want;
do until(last.id);
set have;
by id;
if first.id or date-_n_>30 then do;
output;
_n_=date;
end;
end;
run;
I agree with @novinosrin comments, this is not a task for SQL. I would prefer however not to use date arithmetics (date1 - date2) to calculate number of days since SAS provides the date function INTCK to do that calculation more explicitly. Also, I would avoid using the automatic variable _n_ as temporary storage because it only creates confusion for future programmers for the sake of saving a few keystrokes. I would prefer:
data want;
anchorDate = "01jan1930"d; /* A date anterior to my dates */
do until(last.id);
set have;
by id;
if intck("day", anchorDate, date) > 30 then do;
output;
anchorDate = date;
end;
end;
drop anchorDate;
run;
This works! And it's so simple. Thank you.
Yes you are correct. I am a SAS programmer but I need to write some code in AZURE Data Studio. So Yes I have to use SQL Server. So I am trying to figure this out in SAS first and then apply it in SQL. Any ideas?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.