Hello, I am using SAS version 9.4 I have a missing data/duplicates problem. I have merged two datasets in which one date (date2) precedes the other (date1) within a certain window. Note that date1 and date2 are from different datasets. My matching criteria leads some observations to match date2 on the incorrect date1, creating an interesting duplicate data problem. Edit to include my merge procedure: proc sql;
create table have as
select a.*, b.*
from have1 as a
left join have2 as b
on a.id=b.id and (a.date1>b.date2>a.date3); *date3 & date2 share the same data source. I omit date3 for brevity;
quit; What I have: id date1 date2 1 1/1/2008 1 1/1/2009 1 1/1/2010 1/1/2009 1 1/1/2011 1/1/2009 1 1/1/2011 1/1/2010 I would like to remove the 4th observation only. edit for clarity: I want to remove the 4th observation in the above example, but as a rule in my full sample. data have;
id date1 date2;
datalines;
a 1/1/2008 .
a 1/1/2009 .
a 1/1/2010 1/1/2009
a 1/1/2011 1/1/2009
a 1/1/2011 1/1/2010
;
run; What I want: id date1 date2 1 1/1/2008 1 1/1/2009 1 1/1/2010 1/1/2009 1 1/1/2011 1/1/2010 What happens: id date1 date2 1 1/1/2011 1/1/2009 1 1/1/2011 1/1/2010 In summary, I need to remove only non-missing duplicates BY id date2, and I need to keep the observation in which date1 is the closest to date2. I have tried sorting by descending order, but it only keeps the date2 which is further away from date1. I scanned the board before posting this, but I didn't see anything quite as unique as my issue. I appreciate any guidance you all can provide. Please let me know if I can improve my post at all. Thank you
... View more