BookmarkSubscribeRSS Feed
LOLO
Obsidian | Level 7

I am looking at a data set of Emergency Room visits. I only want to keep visits per ID that are 30 days apart. So as an example say I have this below. If I start with ID=1. In  Row 1 I can see that the lag between row 1 and 2 is 15 days so I will exclude, or for now flag, row 2. Then I will continue to use Row 1 to evaluate Row 3. Again this is only 17 days so I will exclude Row 3 and look at Row 4. Row 4 is 30 days away so I keep it and then use Row 4 to evaluate Row 5....and so on. I have been trying to do this with the lag function but I can't figure out how to utilize the lag when I have to continue to use the 'anchor' row to evaluate several rows. Top is what I have and bottom is what I want. Any ideas??? If anyone knows how to do this using Proc SQL that would be even better. Much appreciated.

 

HAVE    
Row # ID  
1 1 1/1/2020
2 1 1/15/2020
3 1 1/17/2020
4 1 2/4/2020
5 1 3/15/2020
6 2 1/15/2020
7 2 3/15/2020
8 2 3/18/2020
WANT    
Row # ID  
1 1 1/1/2020
4 1 2/4/2020
5 1 3/15/2020
6 2 1/15/2020
7 2 3/15/2020
5 REPLIES 5
novinosrin
Tourmaline | Level 20

Hi @LOLO  It's a straight forward datastep solution. In my humble opinion Proc SQL approach would be very inefficient involving more I/O , join , grouping and beyond. I'd recommend to stick to  more appropriate solution

 



data have;
input row id date :mmddyy10.;
format date mmddyy10.;
cards;
1	1	1/1/2020
2	1	1/15/2020
3	1	1/17/2020
4	1	2/4/2020
5	1	3/15/2020
6	2	1/15/2020
7	2	3/15/2020
8	2	3/18/2020
;

data want;
 do until(last.id);
  set have;
  by id;
  if first.id or date-_n_>30 then do;
   output;
   _n_=date;
  end;
 end;
run;
PGStats
Opal | Level 21

I agree with @novinosrin comments, this is not a task for SQL. I would prefer however not to use date arithmetics (date1 - date2) to calculate number of days since SAS provides the date function INTCK to do that calculation more explicitly. Also, I would avoid using the automatic variable _n_ as temporary storage because it only creates confusion for future programmers for the sake of saving a few keystrokes. I would prefer:

 

data want;
anchorDate = "01jan1930"d; /* A date anterior to my dates */
do until(last.id);
 set have;
 by id;
 if intck("day", anchorDate, date) > 30 then do;
  output;
  anchorDate = date;
  end;
 end;
drop anchorDate;
run;
PG
LOLO
Obsidian | Level 7

This works! And it's so simple. Thank you. 

RichardDeVen
Barite | Level 11
Are you asking about SQL connected to say, SQL Server ? A SQL Server solution would likely require a recursive common table expression (CTE)
LOLO
Obsidian | Level 7

Yes you are correct. I am a SAS programmer but I need to write some code in AZURE Data Studio. So Yes I have to use SQL Server. So I am trying to figure this out in SAS first and then apply it in SQL. Any ideas? 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 978 views
  • 1 like
  • 4 in conversation