I am looking at a data set of Emergency Room visits. I only want to keep visits per ID that are 30 days apart. So as an example say I have this below. If I start with ID=1. In Row 1 I can see that the lag between row 1 and 2 is 15 days so I will exclude, or for now flag, row 2. Then I will continue to use Row 1 to evaluate Row 3. Again this is only 17 days so I will exclude Row 3 and look at Row 4. Row 4 is 30 days away so I keep it and then use Row 4 to evaluate Row 5....and so on. I have been trying to do this with the lag function but I can't figure out how to utilize the lag when I have to continue to use the 'anchor' row to evaluate several rows. Top is what I have and bottom is what I want. Any ideas??? If anyone knows how to do this using Proc SQL that would be even better. Much appreciated.
HAVE | ||
Row # | ID | |
1 | 1 | 1/1/2020 |
2 | 1 | 1/15/2020 |
3 | 1 | 1/17/2020 |
4 | 1 | 2/4/2020 |
5 | 1 | 3/15/2020 |
6 | 2 | 1/15/2020 |
7 | 2 | 3/15/2020 |
8 | 2 | 3/18/2020 |
WANT | ||
Row # | ID | |
1 | 1 | 1/1/2020 |
4 | 1 | 2/4/2020 |
5 | 1 | 3/15/2020 |
6 | 2 | 1/15/2020 |
7 | 2 | 3/15/2020 |
Hi @LOLO It's a straight forward datastep solution. In my humble opinion Proc SQL approach would be very inefficient involving more I/O , join , grouping and beyond. I'd recommend to stick to more appropriate solution
data have;
input row id date :mmddyy10.;
format date mmddyy10.;
cards;
1 1 1/1/2020
2 1 1/15/2020
3 1 1/17/2020
4 1 2/4/2020
5 1 3/15/2020
6 2 1/15/2020
7 2 3/15/2020
8 2 3/18/2020
;
data want;
do until(last.id);
set have;
by id;
if first.id or date-_n_>30 then do;
output;
_n_=date;
end;
end;
run;
I agree with @novinosrin comments, this is not a task for SQL. I would prefer however not to use date arithmetics (date1 - date2) to calculate number of days since SAS provides the date function INTCK to do that calculation more explicitly. Also, I would avoid using the automatic variable _n_ as temporary storage because it only creates confusion for future programmers for the sake of saving a few keystrokes. I would prefer:
data want;
anchorDate = "01jan1930"d; /* A date anterior to my dates */
do until(last.id);
set have;
by id;
if intck("day", anchorDate, date) > 30 then do;
output;
anchorDate = date;
end;
end;
drop anchorDate;
run;
This works! And it's so simple. Thank you.
Yes you are correct. I am a SAS programmer but I need to write some code in AZURE Data Studio. So Yes I have to use SQL Server. So I am trying to figure this out in SAS first and then apply it in SQL. Any ideas?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.