DATA Step, Macro, Functions and more

Soft Method to compare two datasets

Reply
Super Contributor
Posts: 371

Soft Method to compare two datasets

Hello experts,

Please, I need a soft method to compare two datasets.

The idea, is to loop on the lines of first table, if the line in table 1 is the same in table 2, we keep the obs number , if not 0

I know that, we can use compare proc, but this, this proc will compare  the observers by order (first with first, so on ..) . But  I do not want that

 

proc compare base=t1 comp=t2;

run;

 

I know that, we can use proc sort to sort the the tables by _all_ and then using compare , but I do not want that

 

proc sort data=ti out ti;

   by _all_;

run

 

INPUT

T1     T2  
Id Name   Id Name
1 Bob   4 bonde
2 sausan   2 sausan
3 Petter   1 Bob
4 bonde      

 

Output

 

Id Name Obs_IN_T2
1 Bob 3
2 sausan 2
3 Petter 0
4 bonde 1
Trusted Advisor
Posts: 1,137

Re: Soft Method to compare two datasets

data T1 ;
input Id 	Name $;
cards;	
1 	Bob 	
2 	sausan 	
3 	Petter 	  
4 	bonde
;

data T2;
input Id  Name $;
row=_n_;
cards;     
4   bonde      
2   sausan     
1   Bob  
;   
   
   
proc sort data =t1;
by id;
run;

proc sort data =t2;
by id;
run;

option missing=0;
data t1t2;
merge t1 t2;
by id;
run;
Thanks,
Jag
Super User
Posts: 10,018

Re: Soft Method to compare two datasets

Check PROC COMPARE 's ID statement.


Super User
Posts: 5,497

Re: Soft Method to compare two datasets

You can do it, but you might need a lot of code to determine when you have a match.  To illustrate, I'll use the name variable that you provided.

 

data want;

set t1;

do _n_=1 to _nobs_;

   set t2 (rename=(name=name2)) point=_n_ nobs=_nobs_;

   if name=name2 then obs_in_T2=_n_;

end;

if obs_in_T2=. then obs_in_T2=0;

drop name2;

run;

 

This is feasible with a small number of variables, but very clumsy with a lot of variables.  But it can be done.

Super Contributor
Posts: 371

Re: Soft Method to compare two datasets

Posted in reply to Astounding

Thank you all for your answers @Astounding @Ksharp @Jagadishkatam

That's very kind from you.

@Astounding: I need this method for a lots of variables.

 

Super Contributor
Posts: 298

Re: Soft Method to compare two datasets

How about HASH? 

 

What is the key - ID or Name or both?

 

Here is a solution where you can have the key in whatever way you want. You said you want to use with several variables? Do you mean the key as a combination of variables? In that case hash solution is easy. If T2 is a very large file, then you include just the varaibles of interest into the hash table using KEEP/DROP to minimize the size of the hash table.

 

data T1 ;
input id    Name $;
cards;   
1  Bob   
2  sausan   
3  Petter     
4  bonde
;
run;
data T2;
input id  Name $;
cards;     
4   bonde      
2   sausan     
1   Bob  
;   
run;

data want;
   if _n_ = 1 then do;
      if 0 then set t2;
      declare hash h();
      h.definekey('id');
      h.definedata('rowid');
      h.definedone();

      do rowid = 1 by 1 until(last);
         set t2 end = last;
         h.add();
      end;
   end;
   set t1;
   if h.find() ^= 0 then rowid = 0;
run;

proc print data = want;
run;

If the key is 'NAME', replace H.DEFINEKY('ID') by
h.definekey('Name');

If ID and NAME are the composite key then use
h.dfinekey('id','Name');

All other statements remain the same.
Super Contributor
Posts: 371

Re: Soft Method to compare two datasets

@datasp: Thank you for your answer...good idea

Ask a Question
Discussion stats
  • 6 replies
  • 216 views
  • 4 likes
  • 5 in conversation