Solved: Re: Match tables A and Table B and bring back data that does not match...

zdassu · Posted 10-24-2018 06:38 AM

Hi I have two tables, table A and Table B, I want to match these tables and I want to bring back everything where the data in Table B does not match table A by the name field. See sample data below.

TableA

name date1 date2

7328914 01/10/2008 31/12/2099

7338572 27/07/2015 31/12/2099

7400000 01/10/2010 31/12/2099

TableB

name date1 date2

7328914 01/10/2008 31/12/2099

7338572 27/07/2015 31/12/2099

7400000 01/10/2010 31/12/2099

Kurt_Bremser · Posted 10-24-2018 07:26 AM

In that case, you need to use both variables in the by statements:

(stealing the example data from @PeterClemmensen)

data table_A;
input name$ (date1 date2)(:ddmmyy10.);
format date: ddmmyy10.;
datalines;
7328914 01/10/2008 31/12/2099
7338572 27/07/2015 31/12/2099
7400000 01/10/2010 31/12/2099
7400000 03/10/2010 31/12/2099
;
run;

data table_B;
input name$ (date1 date2)(:ddmmyy10.);
format date: ddmmyy10.;
datalines;
7328914 01/10/2008 31/12/2099
7338572 27/07/2015 31/12/2099
7400000 01/10/2010 31/12/2099
7400000 02/10/2010 31/12/2099
;
run;

/* sort both datasets by the important variables */
proc sort data=table_a;
by name date1;
run;

proc sort
  data=table_b (keep=name date1)
  out=exclude
  nodupkey
;
by name date1;
run;

/* now, a simple merge by the same variables */
data want;
merge
  table_a (in=a) /* creates automatic boolean variable a that indicates an observation from table_a was read */
  exclude (in=b) /* same for the exclude table */
;
by name date1;
if a and not b; /* take only observations from table_a for which no match was found in exclude dataset */
/* this is called a "subsetting if" */
run;

Since your example data had only perfect matches, I put in some extra lines to verify the function.

Note that the hash solution suggested by @PeterClemmensen will also work (and probably be faster), but be limited by available memory. The sort/data step solution is limited by disk space only.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 10-24-2018 06:54 AM

proc sort data=table_a;
by name;
run;

proc sort
  data=table_b (keep=name)
  out=exclude
  nodupkey
;
by name;
run;

data want;
merge
  table_a (in=a)
  exclude (in=b)
;
by name;
if a and not b;
run;

The nodupkey is there to prevent unwanted observations if table_b has more obs for a given name than table_a.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

zdassu · Posted 10-24-2018 07:01 AM

Thanks for your replay, what would I do if I want to bring everything back that does not match in table B

By Name and Date1

for example

Table A Name = Table B Name

And Table A Date1 = Table B Date1

Bring back everything that does not match

PeterClemmensen · Posted 10-24-2018 07:03 AM

You could do something like this

data A;
input name$ (date1 date2)(:ddmmyy10.);
format date: ddmmyy10.;
datalines;
7328914 01/10/2008 31/12/2099
7338572 27/07/2015 31/12/2099
7400000 01/10/2010 31/12/2099
;

data B;
input name$ (date1 date2)(:ddmmyy10.);
format date: ddmmyy10.;
datalines;
7328914 01/10/2008 31/12/2099
7338572 27/07/2015 31/12/2099
7400000 01/10/2010 31/12/2099
;

data want;
   if 0 then set B;
   if _N_=1 then do;
      declare hash h(dataset:"B", multidata:"Y");
      h.definekey("name", "date1");
      h.definedata(all:"Y");
      h.definedone();
   end; 
   set A;
   if h.check()>0 then output;
run;

The DATA to DATA Step Macro
Blog: SASnrd

novinosrin · Posted 10-24-2018 01:50 PM

HI @zdassu Just a nit and FWIW if you play with hashes

if h.check()>0 then output;

will not fetch the accurate records although will not produce any error. Assuming it's a typo, correcting to

if h.check()=0 then output;
if check() ne 0 then output;

or

if h.check()=0 ;
if h.check() ne 0;

or

rc= h.check();
if rc=0;
if rc ne 0;

The return code based on object notation method always results in a non zero value when it fails, since it's non zero it's not ideal for inequality comparison operation unless the return code is of course reset to some value knowing that we are gonna use that for inequality comparisons. One of the tricky things is that any hash object would have a non-scalar value however return code being though being scalar in the PDV resulting from a object do notation operation is not particularly useful until some gymnastics is done.

The scalar vs non scalar in hashes is so similar to arrays but occurs at meta-level as opposed to giving much clarity. Thanks!

Kurt_Bremser · Posted 10-24-2018 07:04 AM

Quote from your original post:

"where the data in Table B does not match table A by the name field"

(emphasis by me)

My code does exactly that.

Please make up your mind what you really want.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

zdassu · Posted 10-24-2018 07:10 AM

My apologies, when I checked the tables they have millions of lines and I noticed that sometimes the name is the same but date1 is different, therefore I would class this as a mismatch (date 2 is not an issue this does not matter), your code seems a little complicated when trying to do it this way, I don't really understand it

Kurt_Bremser · Posted 10-24-2018 07:26 AM

In that case, you need to use both variables in the by statements:

(stealing the example data from @PeterClemmensen)

data table_A;
input name$ (date1 date2)(:ddmmyy10.);
format date: ddmmyy10.;
datalines;
7328914 01/10/2008 31/12/2099
7338572 27/07/2015 31/12/2099
7400000 01/10/2010 31/12/2099
7400000 03/10/2010 31/12/2099
;
run;

data table_B;
input name$ (date1 date2)(:ddmmyy10.);
format date: ddmmyy10.;
datalines;
7328914 01/10/2008 31/12/2099
7338572 27/07/2015 31/12/2099
7400000 01/10/2010 31/12/2099
7400000 02/10/2010 31/12/2099
;
run;

/* sort both datasets by the important variables */
proc sort data=table_a;
by name date1;
run;

proc sort
  data=table_b (keep=name date1)
  out=exclude
  nodupkey
;
by name date1;
run;

/* now, a simple merge by the same variables */
data want;
merge
  table_a (in=a) /* creates automatic boolean variable a that indicates an observation from table_a was read */
  exclude (in=b) /* same for the exclude table */
;
by name date1;
if a and not b; /* take only observations from table_a for which no match was found in exclude dataset */
/* this is called a "subsetting if" */
run;

Since your example data had only perfect matches, I put in some extra lines to verify the function.

Note that the hash solution suggested by @PeterClemmensen will also work (and probably be faster), but be limited by available memory. The sort/data step solution is limited by disk space only.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

zdassu · Posted 10-24-2018 10:50 AM

Thanks for your help much appreciated

RW9 · Posted 10-24-2018 08:02 AM

Maybe:

proc sql;
  create table want as 
  select * from tablea except select * from tableb
  union all
  select * from tableb except select * from tablea;
quit;

zdassu · Posted 10-24-2018 10:51 AM

Thanks

zdassu · Posted 10-24-2018 10:52 AM

Thanks

Astounding · Posted 10-24-2018 09:54 AM

Here's a simple approach, assuming both of your data sets are already sorted by NAME:

data want;
merge a  b (rename=(date1=date1b date2=date2b));
by name;
if (date1 ne date1b) or (date2 ne date2b);
run;

Match tables A and Table B and bring back data that does not match in table B

Re: Match tables A and Table B and bring back data that does not match in table B

Re: Match tables A and Table B and bring back data that does not match in table B

Re: Match tables A and Table B and bring back data that does not match in table B

Re: Match tables A and Table B and bring back data that does not match in table B

Re: Match tables A and Table B and bring back data that does not match in table B

Re: Match tables A and Table B and bring back data that does not match in table B

Re: Match tables A and Table B and bring back data that does not match in table B

Re: Match tables A and Table B and bring back data that does not match in table B

Re: Match tables A and Table B and bring back data that does not match in table B

Re: Match tables A and Table B and bring back data that does not match in table B

Re: Match tables A and Table B and bring back data that does not match in table B

Re: Match tables A and Table B and bring back data that does not match in table B

Re: Match tables A and Table B and bring back data that does not match in table B

Registration is open

Registration is open

SAS Training: Just a Click Away