I have following data
data have1; infile datalines missover; input IDnumber @3 date mmddyy10.; datalines; 1 4-7-2003 1 4-7-2009 1 4-7-2010 1 4-7-2011 1 4-7-2012 1 4-7-2013 1 4-7-2014 1 4-7-2015 1 4-7-2016 1 4-7-2018 2 4-7-2005 2 4-7-2011 2 4-7-2012 2 4-7-2013 2 4-7-2014 2 4-7-2015 2 4-7-2016 2 4-7-2017 2 4-7-2018 2 4-7-2019 2 4-7-2020 ; run;
And
data have2; infile datalines missover; input IDnumber @3 date mmddyy10.; datalines; 1 4-7-2003 2 4-7-2005 ; run;
I need the ouput as following
data want; infile datalines missover; input IDnumber @3 date mmddyy10.; datalines; 1 4-7-2009 1 4-7-2010 1 4-7-2011 1 4-7-2012 1 4-7-2013 1 4-7-2014 1 4-7-2015 1 4-7-2016 1 4-7-2018 2 4-7-2011 2 4-7-2012 2 4-7-2013 2 4-7-2014 2 4-7-2015 2 4-7-2016 2 4-7-2017 2 4-7-2018 2 4-7-2019 2 4-7-2020 ; run;
I need to check by id number, if ID matches in both tables, and if the data in have1 has not exact match in have 2.
Hi @ifti_ch2002
Please try this to remove observations of have2 from have1:
data want;
merge have1 (in=x) have2 (in=y);
by IDnumber date;
if x and not y;
run;
Best,
Hi @ifti_ch2002
Please try this to remove observations of have2 from have1:
data want;
merge have1 (in=x) have2 (in=y);
by IDnumber date;
if x and not y;
run;
Best,
Another option could be the use of a SET operator in PROC SQL:
proc sql;
create table want as
select IDnumber, date from have1
except
select IDnumber, date from have2;
quit;
or a hash:
data want;
if _n_=1 then do;
declare hash h (dataset:'have2');
h.definekey('IDnumber', 'date');
h.definedata('IDnumber', 'date');
h.definedone();
end;
set have1;
if h.find() ne 0 then output;
run;
Best,
Thanks, I spent hours on Join.
Hi @ifti_ch2002 The effort -"Thanks, I spent hours on Join" is not a bad idea nor a wrong approach. The intricate details to pay attention to in my humble opinion is the risk of a cartesian join triggered by an inequality operator. However, in your case that concern doesn't surface for the reason, the combination of an equality and inequality operator makes it rather straight forward look-up as the SQL optimizer determines the best join algorithm principally on the equality operator.
@ChrisNZ aka Mr Perfect eloquently writes the distinction between JOINS,Subqueries going above and beyond from a performance standpoint and similar practical usage have often been noticed in posts by @PGStats aka Pierre with no peers who is above all. Please do peek into their posts when you have the time in grasping the concepts.
Kind Regards!
data have1;
infile datalines missover;
input IDnumber @3 date mmddyy10.;
format date mmddyy10.;
datalines;
1 4-7-2003
1 4-7-2009
1 4-7-2010
1 4-7-2011
1 4-7-2012
1 4-7-2013
1 4-7-2014
1 4-7-2015
1 4-7-2016
1 4-7-2018
2 4-7-2005
2 4-7-2011
2 4-7-2012
2 4-7-2013
2 4-7-2014
2 4-7-2015
2 4-7-2016
2 4-7-2017
2 4-7-2018
2 4-7-2019
2 4-7-2020
;
run;
data have2;
infile datalines missover;
input IDnumber @3 date mmddyy10.;
format date mmddyy10.;
datalines;
1 4-7-2003
2 4-7-2005
;
run;
proc sql;
create table want as
select a.*
from have1 a inner join have2 b
on a.IDnumber=b.IDnumber
and a.date ne b.date
order by a.IDnumber, a.date;
quit;
proc print noobs;run;
data have1;
infile datalines missover;
input IDnumber @3 date mmddyy10.;
format date mmddyy10.;
datalines;
1 4-7-2003
1 4-7-2009
1 4-7-2010
1 4-7-2011
1 4-7-2012
1 4-7-2013
1 4-7-2014
1 4-7-2015
1 4-7-2016
1 4-7-2018
2 4-7-2005
2 4-7-2011
2 4-7-2012
2 4-7-2013
2 4-7-2014
2 4-7-2015
2 4-7-2016
2 4-7-2017
2 4-7-2018
2 4-7-2019
2 4-7-2020
;
run;
data have2;
infile datalines missover;
input IDnumber @3 date mmddyy10.;
format date mmddyy10.;
datalines;
1 4-7-2003
2 4-7-2005
;
run;
proc sql;
create table want as
select * from have1
except
select * from have2;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.