Hey guys, I am new to SAS.
I have a problem which looks like the following.
There are two types of visits, x and y.
The data has already sorted based on id and visit date.
Now I want to exclude all the x type visits for each id number.
Also, if the visit date of y type falls in the range of x's check in and check out date, it also needs to be excluded.
Moreover, if there is an x type visit in 3 days after y type visit, both of them need to be excluded.
Can someone please help???
This is how the raw data look like:
id | visit date | check in | check out | type |
1 | 1/29/19 | 1/23/19 | 1/29/19 | x |
1 | 1/26/19 | y | ||
2 | 2/1/19 | y | ||
2 | 2/2/19 | 2/2/19 | 2/6/19 | x |
3 | 3/3/19 | 3/5/19 | 3/20/19 | x |
3 | 3/5/19 | 3/5/19 | 3/20/19 | y |
4 | 4/23/19 | 4/24/19 | 4/27/19 | x |
4 | 4/24/19 | 4/24/19 | 4/27/19 | y |
4 | 8/8/19 | y | ||
5 | 9/1/19 | y | ||
5 | 9/4/19 | 9/4/19 | 9/6/19 | x |
5 | 10/10/19 | 10/10/19 | 10/21/19 | y |
This is how the result should be:
id | visit date | check in | check out | type |
4 | 8/8/19 | y | ||
5 | 10/10/19 | 10/10/19 | 10/21/19 | y |
Read each BY group twice:
For example:
data have ;
input id (visit_date check_in check_out) (:mmddyy8.) type :$1. ;
format v: c: yymmdd10. ;
cards ;
1 1/29/19 1/23/19 1/29/19 x
1 1/26/19 . . y
2 2/01/19 . . y
2 2/02/19 2/02/19 2/06/19 x
3 3/03/19 3/05/19 3/20/19 x
3 3/05/19 3/05/19 3/20/19 y
4 4/23/19 4/24/19 4/27/19 x
4 4/24/19 4/24/19 4/27/19 y
4 8/08/19 . . y
5 9/01/19 . . y
5 9/04/19 9/04/19 9/06/19 x
5 10/10/19 10/10/19 10/21/19 y
;
run ;
data want (drop = _:) ;
_xi_min = constant ("big") ;
_xv_min = constant ("big") ;
do _n_ = 1 by 1 until (last.id) ;
set have ;
by id ;
if type = "y" then continue ;
_xi_min = _xi_min min check_in ;
_xo_max = _xo_max max check_out ;
_xv_min = _xv_min min visit_date ;
_xv_max = _xv_max max visit_date ;
end ;
do _n_ = 1 to _n_ ;
set have ;
if type = "x" then continue ;
if _xi_min <= visit_date <= _xo_max then continue ;
if visit_date <= _xv_min <= _xv_max <= visit_date + 3 then continue ;
output ;
end ;
run ;
Kind regards
Paul D.
First thing is start with a SAS data set with date values. From you post it looks like you are copying from Excel and so we can't be sure what some of those dates might actually be: 23-Jan for example. Or whether you may have a mix of character and actual date values in Excel.
You need to show what you expect your output to actually look like when you are done.
Since you have moderately complex rules involving intervals and order this is likely not going to get a simple to understand single step solution.
Why is ID 2 dropped?
The X-record is killed unconditionally.
The Y-record is killed since the visit date of the X-record (2/02/19) is within 3 days after the visit date of the Y-record (2/01/19).
Hence, the whole ID=2 group is a goner.
Kind regards
Paul D.
@hashman Ha! Of course! The reversed American dates got me again!
Ha's on me: After almost 30 revolutions round the Sun, I still have to make a concerted effort to remind myself of that (the European dd/mm/yy is better but still makes just as much sense as writing the number 753 as 357). That's why I abhor when sample data dates are presented as mm/dd/yy or dd-Mon-yy (they make no sense when sorted) and always both present them and format them as yymmdd10. Hierarchical numbers must come higher order to lower order, left to right; only then they're comprehended properly when ordered.
Kind regards
Paul D.
@hashman Totally with you on compulsory use of yymmdd under the penalty of imprisonment with hard labour! 🙂
Those using mmddyy and such bring hard labor on themselves, getting self-imprisoned by having to sift through unsorted mess.
Read each BY group twice:
For example:
data have ;
input id (visit_date check_in check_out) (:mmddyy8.) type :$1. ;
format v: c: yymmdd10. ;
cards ;
1 1/29/19 1/23/19 1/29/19 x
1 1/26/19 . . y
2 2/01/19 . . y
2 2/02/19 2/02/19 2/06/19 x
3 3/03/19 3/05/19 3/20/19 x
3 3/05/19 3/05/19 3/20/19 y
4 4/23/19 4/24/19 4/27/19 x
4 4/24/19 4/24/19 4/27/19 y
4 8/08/19 . . y
5 9/01/19 . . y
5 9/04/19 9/04/19 9/06/19 x
5 10/10/19 10/10/19 10/21/19 y
;
run ;
data want (drop = _:) ;
_xi_min = constant ("big") ;
_xv_min = constant ("big") ;
do _n_ = 1 by 1 until (last.id) ;
set have ;
by id ;
if type = "y" then continue ;
_xi_min = _xi_min min check_in ;
_xo_max = _xo_max max check_out ;
_xv_min = _xv_min min visit_date ;
_xv_max = _xv_max max visit_date ;
end ;
do _n_ = 1 to _n_ ;
set have ;
if type = "x" then continue ;
if _xi_min <= visit_date <= _xo_max then continue ;
if visit_date <= _xv_min <= _xv_max <= visit_date + 3 then continue ;
output ;
end ;
run ;
Kind regards
Paul D.
These kind of problems are most easily solved in SQL:
proc sql;
create table want as select * from have y
where type='y'
and not exists(select * from have x
where type='x'
and x.id=y.id
and (y.visit_date between x.check_in and x.check_out
or y.visit_date between x.visit_date-3 and x.visit_date));
quit;
data have ; input id (visit check_in check_out) (:mmddyy8.) type :$1. ; format v: c: yymmdd10. ; cards ; 1 1/29/19 1/23/19 1/29/19 x 1 1/26/19 . . y 2 2/01/19 . . y 2 2/02/19 2/02/19 2/06/19 x 3 3/03/19 3/05/19 3/20/19 x 3 3/05/19 3/05/19 3/20/19 y 4 4/23/19 4/24/19 4/27/19 x 4 4/24/19 4/24/19 4/27/19 y 4 8/08/19 . . y 5 9/01/19 . . y 5 9/04/19 9/04/19 9/06/19 x 5 10/10/19 10/10/19 10/21/19 y ; run ; proc sql; create table want as select distinct a.* from (select * from have where type='y') as a , (select * from have where type='x') as b where a.id=b.id and a.visit not between b.check_in and b.check_out and b.visit not between a.visit and a.visit+3; quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.