I have the following two datasets.
I want to delete the lines from A with dates that falls within the in-out period in file B.
File A
SEQ ID date var1 1 A12 03JAN04 242 2 A12 01FEB06 356 3 A12 06JAN08 325 4 A12 28DEC09 123 5 B34 06MAY03 985 6 B34 13JUN03 198 7 B34 10MAY05 241 8 C56 09NOV10 155 9 C56 19OCT13 352
File B
ID date_in date_out A12 01JAN04 04JAN04 A12 05FEB08 08FEB08 B34 03MAY03 06MAY03 B34 09MAY05 19MAY05 C56 12JUL12 18JUL12
For example, for ID=A12, his first obs falls within 01JAN04-04JAN04, thus I delete that obs, but keep the rest.
Essentially, I should get the following results
SEQ ID date var1 2 A12 01FEB06 356 3 A12 06JAN08 325 4 A12 28DEC09 123 6 B34 13JUN03 198 8 C56 09NOV10 155 9 C56 19OCT13 352
Both file A and B has multiple records per ID, and the between in-out period in file B is not alway constant.
I don't have a lot experience with data manipulation in SAS, I'm really clueless about where to start, eg. should I write a macro that loops through both file some how? or use SQL somehow?
I would greatly appreciate some advice on how to approach this problem.
Thanks!
Here an approach using SQL
/* option 1: Delete records if target is a database table */
proc sql;
delete from A
where exists
(select * from B where a.id=b.id and a.date between b.date_in and b.date_out)
;
quit;
/* option 2: create a new table if target is a SAS file */
proc sql;
create table want as
select * from a
where not exists
(select * from B where a.id=b.id and a.date between b.date_in and b.date_out)
;
quit;
The first question to answer is what is in your DATE variables? Are they character strings, or are they SAS dates (numeric, with a date7 format)?
PROC CONTENTS will reveal that.
data A; infile cards expandtabs truncover; input SEQ ID $ date : date9. var1; format date date9.; cards; 1 A12 03JAN04 242 2 A12 01FEB06 356 3 A12 06JAN08 325 4 A12 28DEC09 123 5 B34 06MAY03 985 6 B34 13JUN03 198 7 B34 10MAY05 241 8 C56 09NOV10 155 9 C56 19OCT13 352 ; run; data B; infile cards expandtabs truncover; input ID $ (date_in date_out) ( : date9.); format date_in date_out date9.; cards; A12 01JAN04 04JAN04 A12 05FEB08 08FEB08 B34 03MAY03 06MAY03 B34 09MAY05 19MAY05 C56 12JUL12 18JUL12 ; run; data key; set B; do date=date_in to date_out; output; end; keep id date; run; data want; if _n_=1 then do; if 0 then set key; declare hash h(dataset:'key'); h.definekey('id','date'); h.definedone(); end; set A; if h.check()=0 then delete; run;
Here an approach using SQL
/* option 1: Delete records if target is a database table */
proc sql;
delete from A
where exists
(select * from B where a.id=b.id and a.date between b.date_in and b.date_out)
;
quit;
/* option 2: create a new table if target is a SAS file */
proc sql;
create table want as
select * from a
where not exists
(select * from B where a.id=b.id and a.date between b.date_in and b.date_out)
;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.