## Soft Method to compare two datasets

Super Contributor
Posts: 378

# Soft Method to compare two datasets

Hello experts,

Please, I need a soft method to compare two datasets.

The idea, is to loop on the lines of first table, if the line in table 1 is the same in table 2, we keep the obs number , if not 0

I know that, we can use compare proc, but this, this proc will compare  the observers by order (first with first, so on ..) . But  I do not want that

proc compare base=t1 comp=t2;

run;

I know that, we can use proc sort to sort the the tables by _all_ and then using compare , but I do not want that

proc sort data=ti out ti;

by _all_;

run

INPUT

 T1 T2 Id Name Id Name 1 Bob 4 bonde 2 sausan 2 sausan 3 Petter 1 Bob 4 bonde

Output

 Id Name Obs_IN_T2 1 Bob 3 2 sausan 2 3 Petter 0 4 bonde 1
Posts: 1,147

## Re: Soft Method to compare two datasets

``````data T1 ;
input Id 	Name \$;
cards;
1 	Bob
2 	sausan
3 	Petter
4 	bonde
;

data T2;
input Id  Name \$;
row=_n_;
cards;
4   bonde
2   sausan
1   Bob
;

proc sort data =t1;
by id;
run;

proc sort data =t2;
by id;
run;

option missing=0;
data t1t2;
merge t1 t2;
by id;
run;``````
Thanks,
Jag
Super User
Posts: 10,784

## Re: Soft Method to compare two datasets

```Check PROC COMPARE 's ID statement.

```
Super User
Posts: 6,785

## Re: Soft Method to compare two datasets

You can do it, but you might need a lot of code to determine when you have a match.  To illustrate, I'll use the name variable that you provided.

data want;

set t1;

do _n_=1 to _nobs_;

set t2 (rename=(name=name2)) point=_n_ nobs=_nobs_;

if name=name2 then obs_in_T2=_n_;

end;

if obs_in_T2=. then obs_in_T2=0;

drop name2;

run;

This is feasible with a small number of variables, but very clumsy with a lot of variables.  But it can be done.

Super Contributor
Posts: 378

## Re: Soft Method to compare two datasets

That's very kind from you.

@Astounding: I need this method for a lots of variables.

Super Contributor
Posts: 326

## Re: Soft Method to compare two datasets

What is the key - ID or Name or both?

Here is a solution where you can have the key in whatever way you want. You said you want to use with several variables? Do you mean the key as a combination of variables? In that case hash solution is easy. If T2 is a very large file, then you include just the varaibles of interest into the hash table using KEEP/DROP to minimize the size of the hash table.

``````data T1 ;
input id    Name \$;
cards;
1  Bob
2  sausan
3  Petter
4  bonde
;
run;
data T2;
input id  Name \$;
cards;
4   bonde
2   sausan
1   Bob
;
run;

data want;
if _n_ = 1 then do;
if 0 then set t2;
declare hash h();
h.definekey('id');
h.definedata('rowid');
h.definedone();

do rowid = 1 by 1 until(last);
set t2 end = last;
end;
end;
set t1;
if h.find() ^= 0 then rowid = 0;
run;

proc print data = want;
run;

If the key is 'NAME', replace H.DEFINEKY('ID') by
h.definekey('Name');

If ID and NAME are the composite key then use
h.dfinekey('id','Name');

All other statements remain the same.``````
Super Contributor
Posts: 378