BookmarkSubscribeRSS Feed
echoli
Obsidian | Level 7

Hi,

Here is a dataset, how can I select rows if the event_time closest to date measured within one month. For example, for id = 14022, the date is 6/22/2010, I want to select reows that event_time should be in 5/22/2010 to 7/22/2010. (if there are two measures within one month, choose the closest one. if there is no measure within one month, then NA.).   Any clue or idea?

 

data abc;
input @1 ID @8 Date mmddyy10. @20 time anydtdtm.;
format date mmddyy10. time MDYAMPM.;
datalines;
14022 6/22/2010 10/11/2004 9:02
14022 6/22/2010 3/25/2008 12:53
14022 6/22/2010 4/29/2008 9:04
14022 6/22/2010 6/17/2008 9:14
14022 6/22/2010 7/8/2008 10:41
14022 6/22/2010 8/12/2008 12:30
14022 6/22/2010 9/16/2008 14:24
14022 6/22/2010 10/21/2008 1:44
14022 6/22/2010 11/18/2008 2:11
14022 6/22/2010 12/30/2008 9:53
14022 6/22/2010 2/10/2009 13:07
14022 6/22/2010 4/7/2009 9:45
14022 6/22/2010 6/2/2009 7:35
14022 6/22/2010 7/28/2009 10:11
14022 6/22/2010 9/29/2009 10:46
14022 6/22/2010 12/1/2009 10:51
14022 6/22/2010 1/26/2010 8:49
14022 6/22/2010 4/13/2010 7:48
14022 6/22/2010 6/22/2010 7:46
14032 2/7/2012 12/24/2002 10:25
14032 2/7/2012 1/21/2003 12:00
14032 2/7/2012 1/28/2003 9:25
14032 2/7/2012 1/29/2008 9:00
14032 2/7/2012 3/20/2008 13:52
14032 2/7/2012 5/1/2008 8:53
14032 2/7/2012 6/12/2008 14:01
14032 2/7/2012 7/24/2008 15:00
14032 2/7/2012 9/11/2008 11:02
14032 2/7/2012 9/20/2011 9:25
14032 2/7/2012 11/1/2011 10:40
14032 2/7/2012 12/22/2011 12:58
14032 2/7/2012 2/7/2012 10:20
14032 2/7/2012 3/20/2012 9:18
14032 2/7/2012 5/15/2012 12:43
14032 2/7/2012 6/19/2012 9:19
14032 2/7/2012 7/31/2012 11:48
14032 2/7/2012 9/18/2012 12:15
14032 2/7/2012 10/16/2012 12:35
14032 2/7/2012 10/18/2012 11:12
14032 2/7/2012 10/19/2012 5:00
14032 2/7/2012 10/20/2012 4:33
14032 2/7/2012 10/21/2012 4:19
14032 2/7/2012 10/22/2012 4:30
14032 2/7/2012 10/23/2012 5:03
14032 2/7/2012 10/24/2012 3:14
14032 2/7/2012 10/30/2012 11:10
14032 2/7/2012 11/27/2012 12:30
14032 2/7/2012 1/8/2013 9:58
14032 2/7/2012 2/19/2013 10:07
14032 2/7/2012 4/2/2013 13:45
14032 2/7/2012 5/14/2013 10:18
14601 1/30/2014 4/26/2005 8:40
14601 1/30/2014 6/28/2005 9:05
14601 1/30/2014 7/26/2005 8:21
14601 1/30/2014 8/9/2005 8:24
14601 1/30/2014 8/23/2005 8:10
14601 1/30/2014 10/18/2005 9:20
14601 1/30/2014 12/13/2005 9:32
14601 1/30/2014 2/7/2006 9:04
14601 1/30/2014 2/16/2006 8:44
14601 1/30/2014 5/2/2006 10:31
14601 1/30/2014 7/12/2006 7:31
14601 1/30/2014 9/7/2006 8:55
14601 1/30/2014 10/31/2006 8:43
14601 1/30/2014 2/6/2007 9:15
14601 1/30/2014 5/3/2007 7:54
14601 1/30/2014 6/28/2007 7:52
14601 1/30/2014 8/23/2007 8:10
14601 1/30/2014 10/18/2007 8:17
14601 1/30/2014 11/19/2007 2:07
14601 1/30/2014 12/13/2007 8:16
14601 1/30/2014 1/24/2008 10:23
14601 1/30/2014 1/31/2008 10:07
14601 1/30/2014 2/12/2008 12:31
14601 1/30/2014 3/25/2008 9:52
;
run;

 

Thanks,

 

 

5 REPLIES 5
art297
Opal | Level 21

I modified your example data a bit so that there would be some records that allowed for a better test. Does the following do what you want?

data abc;
  informat date mmddyy10.;
  informat time anydtdtm16.;
  input ID Date time;
  format date mmddyy10. time MDYAMPM.;
  datalines;
14022 6/22/2010 10/11/2004 9:02
14022 6/22/2010 3/25/2008 12:53
14022 6/22/2010 4/29/2008 9:04
14022 6/22/2010 6/17/2008 9:14
14022 6/22/2010 7/8/2008 10:41
14022 6/22/2010 8/12/2008 12:30
14022 6/22/2010 9/16/2008 14:24
14022 6/22/2010 10/21/2008 1:44
14022 6/22/2010 11/18/2008 2:11
14022 6/22/2010 12/30/2008 9:53
14022 6/22/2010 2/10/2009 13:07
14022 6/22/2010 4/7/2009 9:45
14022 6/22/2010 6/2/2009 7:35
14022 6/22/2010 7/28/2009 10:11
14022 6/22/2010 9/29/2009 10:46
14022 6/22/2010 12/1/2009 10:51
14022 6/22/2010 1/26/2010 8:49
14022 6/22/2010 5/23/2010 7:48
14022 6/22/2010 6/22/2010 7:46
14032 2/7/2012 7/24/2002 10:25
14032 2/7/2012 1/21/2003 12:00
14032 2/7/2012 1/28/2003 9:25
14032 2/7/2012 1/29/2008 9:00
14032 2/7/2012 3/20/2008 13:52
14032 2/7/2012 5/1/2008 8:53
14032 2/7/2012 6/12/2008 14:01
14032 2/7/2012 7/24/2008 15:00
14032 2/7/2012 9/11/2008 11:02
14032 2/7/2012 9/20/2011 9:25
14032 2/7/2012 11/1/2011 10:40
14032 2/7/2012 1/8/2012 12:58
14032 2/7/2012 2/7/2012 10:20
14032 2/7/2012 3/7/2012 9:18
14032 2/7/2012 5/15/2012 12:43
14032 2/7/2012 6/19/2012 9:19
14032 2/7/2012 7/31/2012 11:48
14032 2/7/2012 9/18/2012 12:15
14032 2/7/2012 10/16/2012 12:35
14032 2/7/2012 10/18/2012 11:12
14032 2/7/2012 10/19/2012 5:00
14032 2/7/2012 10/20/2012 4:33
14032 2/7/2012 10/21/2012 4:19
14032 2/7/2012 10/22/2012 4:30
14032 2/7/2012 10/23/2012 5:03
14032 2/7/2012 10/24/2012 3:14
14032 2/7/2012 10/30/2012 11:10
14032 2/7/2012 11/27/2012 12:30
14032 2/7/2012 1/8/2013 9:58
14032 2/7/2012 2/19/2013 10:07
14032 2/7/2012 4/2/2013 13:45
14032 2/7/2012 5/14/2013 10:18
14601 1/30/2014 4/26/2005 8:40
14601 1/30/2014 6/28/2005 9:05
14601 1/30/2014 7/26/2005 8:21
14601 1/30/2014 8/9/2005 8:24
14601 1/30/2014 8/23/2005 8:10
14601 1/30/2014 10/18/2005 9:20
14601 1/30/2014 12/29/2013 9:32
14601 1/30/2014 12/30/2013 9:04
14601 1/30/2014 2/16/2014 8:44
14601 1/30/2014 5/2/2006 10:31
14601 1/30/2014 7/12/2006 7:31
14601 1/30/2014 9/7/2006 8:55
14601 1/30/2014 10/31/2006 8:43
14601 1/30/2014 2/6/2007 9:15
14601 1/30/2014 5/3/2007 7:54
14601 1/30/2014 6/28/2007 7:52
14601 1/30/2014 8/23/2007 8:10
14601 1/30/2014 10/18/2007 8:17
14601 1/30/2014 11/19/2007 2:07
14601 1/30/2014 12/13/2007 8:16
14601 1/30/2014 1/24/2008 10:23
14601 1/30/2014 1/31/2008 10:07
14601 1/30/2014 2/12/2008 12:31
14601 1/30/2014 3/25/2008 9:52
;
run;

proc sql noprint;
  create table want as
    select *
      from abc 
       where intnx('month',date,-1,'s')<=datepart(time)<=intnx('month',date,1,'s')
         group by id
           having abs(datepart(time)-date) eq min(abs(datepart(time)-date))
  ;
quit;

Art, CEO, AnalystFinder.com

echoli
Obsidian | Level 7

Hi art297,

 

I tried to use the slolution you helped me, it works!! but when I use my real data (I have lots of columns and rows) it doesn't work. 

 it shows " The query requires remerging summary statistics back with the original data." in the log. I doubt. Do you know where is the problem? Thanks,

L

 

1455  proc sql noprint;
1456      create table want as
1457          select mrn, event_end_UA, event_end_dt
1458          from combine_CBC
1459          where intnx('month', event_end_UA, -1, 's') <= datepart(event_end_dt) <=
1459! intnx('month', event_end_UA, 1, 's')
1460          group by mrn, event_end_UA
1461          having abs(datepart(event_end_dt) - event_end_UA) = min(abs(datepart(event_end_UA)-
1461! event_end_dt))
1462  ;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.WANT created, with 0 rows and 3 columns.

1463  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.03 seconds
      cpu time            0.04 seconds

art297
Opal | Level 21

That is a normal message from proc sql. It doesn't mean that there was a problem.

 

However, since it didn't select any rows, it indicates that you don't have any records that meet the condition that you specified.

 

You'd have to post some of your actual data for us to understand.

 

Are you sure that event_end_UA is a date, rather than a datetime variable?

 

Art, CEO, AnalystFinder.com

 

ballardw
Super User

Your example data does not have a variable named event_time only "time". So I suspose that's the one you want to use.

Is the time of day supposed to be considered when picking "closest"? if so, since there is no time component with your date variable which time of day would you want?

 

Note that we can't run your code as pasted as the window reformatted the columns so the time actually starts in column 16 or 17 instead of 20.

Please post code in a code box using the forum {i} menu icon if you are going to post anything where columns might be important.

 

This line of code added to the read will reduce the data set to only datetimes where the date part is within you specified range.

   if intnx('month',date,-1,'S') le datepart(time) le intnx('month',date,+1,'S');

And this will determine the number of days different.

 

   DaysDifferent = abs(date- datepart(time));

Sort the resulting data by ID and DaysDifferent and keep the first.

 

Assuming the above two lines were added to your code:

proc sort data=abc;
   by id daydifferent;
run;

data want;
   set abc;
   by id;
   if first.id;
run;

Would result in one row per ID.

 

If the there are two days that are exactly the same number of days from the date and also the smallest I think this will pick the early date of time.

echoli
Obsidian | Level 7

Thanks for your explanation, Ballardw! That's really my goal!!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1964 views
  • 0 likes
  • 3 in conversation