Solved: Re: How to compare date range for same id

amber7391 · Posted 09-18-2019 04:56 PM

Hey guys, I am new to SAS.

I have a problem which looks like the following.

There are two types of visits, x and y.

The data has already sorted based on id and visit date.

Now I want to exclude all the x type visits for each id number.

Also, if the visit date of y type falls in the range of x's check in and check out date, it also needs to be excluded.

Moreover, if there is an x type visit in 3 days after y type visit, both of them need to be excluded.

Can someone please help???

This is how the raw data look like:

id	visit date	check in	check out	type
1	1/29/19	1/23/19	1/29/19	x
1	1/26/19			y
2	2/1/19			y
2	2/2/19	2/2/19	2/6/19	x
3	3/3/19	3/5/19	3/20/19	x
3	3/5/19	3/5/19	3/20/19	y
4	4/23/19	4/24/19	4/27/19	x
4	4/24/19	4/24/19	4/27/19	y
4	8/8/19			y
5	9/1/19			y
5	9/4/19	9/4/19	9/6/19	x
5	10/10/19	10/10/19	10/21/19	y

This is how the result should be:

id	visit date	check in	check out	type
4	8/8/19			y
5	10/10/19	10/10/19	10/21/19	y

hashman · Posted 09-19-2019 12:05 AM

@amber7391:

Read each BY group twice:

On the first read, skip the Y-records and determine the boundaries for the Y-conditions based on the X-values.
On the second read, skip the X-records and filter the Y-records based on the conditions discovered in #1.

For example:

data have ;                                                              
  input id (visit_date check_in check_out) (:mmddyy8.) type :$1. ;       
  format v: c: yymmdd10. ;                                               
  cards ;                                                                
1  1/29/19  1/23/19   1/29/19  x                                         
1  1/26/19  .         .        y                                         
2  2/01/19  .         .        y                                         
2  2/02/19  2/02/19   2/06/19  x                                         
3  3/03/19  3/05/19   3/20/19  x                                         
3  3/05/19  3/05/19   3/20/19  y                                         
4  4/23/19  4/24/19   4/27/19  x                                         
4  4/24/19  4/24/19   4/27/19  y                                         
4  8/08/19  .         .        y                                         
5  9/01/19  .         .        y                                         
5  9/04/19  9/04/19   9/06/19  x                                         
5 10/10/19 10/10/19  10/21/19  y                                         
;                                                                        
run ;                                                                    
                                                                         
data want (drop = _:) ;                                                  
  _xi_min = constant ("big") ;                                           
  _xv_min = constant ("big") ;                                           
  do _n_ = 1 by 1 until (last.id) ;                                      
    set have ;                                                           
    by id ;                                                              
    if type = "y" then continue ;                                        
    _xi_min = _xi_min min check_in ;                                     
    _xo_max = _xo_max max check_out ;                                    
    _xv_min = _xv_min min visit_date ;                                   
    _xv_max = _xv_max max visit_date ;                                   
  end ;                                                                  
  do _n_ = 1 to _n_ ;                                                    
    set have ;                                                           
    if type = "x" then continue ;                                        
    if _xi_min <= visit_date <= _xo_max then continue ;                  
    if visit_date <= _xv_min <= _xv_max <= visit_date + 3 then continue ;
    output ;                                                             
  end ;                                                                  
run ;

Kind regards

Paul D.

View solution in original post

ballardw · Posted 09-18-2019 05:48 PM

First thing is start with a SAS data set with date values. From you post it looks like you are copying from Excel and so we can't be sure what some of those dates might actually be: 23-Jan for example. Or whether you may have a mix of character and actual date values in Excel.

You need to show what you expect your output to actually look like when you are done.

Since you have moderately complex rules involving intervals and order this is likely not going to get a simple to understand single step solution.

amber7391 · Posted 09-18-2019 10:11 PM

Hi ballardw, thanks for pointing that out. I have corrected the format and added the result I would like to have. If you dont mind you can take a look. I really appreciate it!

ChrisNZ · Posted 09-18-2019 10:53 PM

Why is ID 2 dropped?

High-Performance SAS Coding - Third Edition

hashman · Posted 09-19-2019 12:24 AM

@ChrisNZ:

The X-record is killed unconditionally.

The Y-record is killed since the visit date of the X-record (2/02/19) is within 3 days after the visit date of the Y-record (2/01/19).

Hence, the whole ID=2 group is a goner.

Kind regards

Paul D.

ChrisNZ · Posted 09-19-2019 12:38 AM

@hashman Ha! Of course! The reversed American dates got me again!

High-Performance SAS Coding - Third Edition

hashman · Posted 09-19-2019 12:56 AM

@ChrisNZ:

Ha's on me: After almost 30 revolutions round the Sun, I still have to make a concerted effort to remind myself of that (the European dd/mm/yy is better but still makes just as much sense as writing the number 753 as 357). That's why I abhor when sample data dates are presented as mm/dd/yy or dd-Mon-yy (they make no sense when sorted) and always both present them and format them as yymmdd10. Hierarchical numbers must come higher order to lower order, left to right; only then they're comprehended properly when ordered.

Kind regards

Paul D.

ChrisNZ · Posted 09-19-2019 01:11 AM

@hashman Totally with you on compulsory use of yymmdd under the penalty of imprisonment with hard labour! 🙂

High-Performance SAS Coding - Third Edition

hashman · Posted 09-19-2019 01:28 AM

@ChrisNZ:

Those using mmddyy and such bring hard labor on themselves, getting self-imprisoned by having to sift through unsorted mess.

hashman · Posted 09-19-2019 12:05 AM

@amber7391:

Read each BY group twice:

On the first read, skip the Y-records and determine the boundaries for the Y-conditions based on the X-values.
On the second read, skip the X-records and filter the Y-records based on the conditions discovered in #1.

For example:

data have ;                                                              
  input id (visit_date check_in check_out) (:mmddyy8.) type :$1. ;       
  format v: c: yymmdd10. ;                                               
  cards ;                                                                
1  1/29/19  1/23/19   1/29/19  x                                         
1  1/26/19  .         .        y                                         
2  2/01/19  .         .        y                                         
2  2/02/19  2/02/19   2/06/19  x                                         
3  3/03/19  3/05/19   3/20/19  x                                         
3  3/05/19  3/05/19   3/20/19  y                                         
4  4/23/19  4/24/19   4/27/19  x                                         
4  4/24/19  4/24/19   4/27/19  y                                         
4  8/08/19  .         .        y                                         
5  9/01/19  .         .        y                                         
5  9/04/19  9/04/19   9/06/19  x                                         
5 10/10/19 10/10/19  10/21/19  y                                         
;                                                                        
run ;                                                                    
                                                                         
data want (drop = _:) ;                                                  
  _xi_min = constant ("big") ;                                           
  _xv_min = constant ("big") ;                                           
  do _n_ = 1 by 1 until (last.id) ;                                      
    set have ;                                                           
    by id ;                                                              
    if type = "y" then continue ;                                        
    _xi_min = _xi_min min check_in ;                                     
    _xo_max = _xo_max max check_out ;                                    
    _xv_min = _xv_min min visit_date ;                                   
    _xv_max = _xv_max max visit_date ;                                   
  end ;                                                                  
  do _n_ = 1 to _n_ ;                                                    
    set have ;                                                           
    if type = "x" then continue ;                                        
    if _xi_min <= visit_date <= _xo_max then continue ;                  
    if visit_date <= _xv_min <= _xv_max <= visit_date + 3 then continue ;
    output ;                                                             
  end ;                                                                  
run ;

Kind regards

Paul D.

amber7391 · Posted 09-20-2019 09:15 AM

Thank you soooo much for help me out hashman! I really appreciate it. It fixed my problem perfectly!

s_lassen · Posted 09-19-2019 04:19 AM

These kind of problems are most easily solved in SQL:

proc sql;
  create table want as select * from have y
    where type='y' 
      and not exists(select * from have x 
                      where type='x' 
			and x.id=y.id
			and (y.visit_date between x.check_in and x.check_out
			     or y.visit_date between x.visit_date-3 and x.visit_date));
quit;

amber7391 · Posted 09-20-2019 09:16 AM

s_lassen thank you so much!

Ksharp · Posted 09-19-2019 09:02 AM


data have ;                                                              
  input id (visit check_in check_out) (:mmddyy8.) type :$1. ;       
  format v: c: yymmdd10. ;                                               
  cards ;                                                                
1  1/29/19  1/23/19   1/29/19  x                                         
1  1/26/19  .         .        y                                         
2  2/01/19  .         .        y                                         
2  2/02/19  2/02/19   2/06/19  x                                         
3  3/03/19  3/05/19   3/20/19  x                                         
3  3/05/19  3/05/19   3/20/19  y                                         
4  4/23/19  4/24/19   4/27/19  x                                         
4  4/24/19  4/24/19   4/27/19  y                                         
4  8/08/19  .         .        y                                         
5  9/01/19  .         .        y                                         
5  9/04/19  9/04/19   9/06/19  x                                         
5 10/10/19 10/10/19  10/21/19  y                                         
;                                                                        
run ;

proc sql;
create table want as
 select distinct a.*
  from (select * from have where type='y') as a ,
       (select * from have where type='x') as b
where a.id=b.id and a.visit not between b.check_in and b.check_out
and b.visit not between a.visit and a.visit+3;
quit;

amber7391 · Posted 09-20-2019 09:16 AM

Ksharp thank you so much!

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away