BookmarkSubscribeRSS Feed
LineMoon
Lapis Lazuli | Level 10

Hello experts,

Please, I need a soft method to compare two datasets.

The idea, is to loop on the lines of first table, if the line in table 1 is the same in table 2, we keep the obs number , if not 0

I know that, we can use compare proc, but this, this proc will compare  the observers by order (first with first, so on ..) . But  I do not want that

 

proc compare base=t1 comp=t2;

run;

 

I know that, we can use proc sort to sort the the tables by _all_ and then using compare , but I do not want that

 

proc sort data=ti out ti;

   by _all_;

run

 

INPUT

T1     T2  
Id Name   Id Name
1 Bob   4 bonde
2 sausan   2 sausan
3 Petter   1 Bob
4 bonde      

 

Output

 

Id Name Obs_IN_T2
1 Bob 3
2 sausan 2
3 Petter 0
4 bonde 1
6 REPLIES 6
Jagadishkatam
Amethyst | Level 16
data T1 ;
input Id 	Name $;
cards;	
1 	Bob 	
2 	sausan 	
3 	Petter 	  
4 	bonde
;

data T2;
input Id  Name $;
row=_n_;
cards;     
4   bonde      
2   sausan     
1   Bob  
;   
   
   
proc sort data =t1;
by id;
run;

proc sort data =t2;
by id;
run;

option missing=0;
data t1t2;
merge t1 t2;
by id;
run;
Thanks,
Jag
Ksharp
Super User
Check PROC COMPARE 's ID statement.


Astounding
PROC Star

You can do it, but you might need a lot of code to determine when you have a match.  To illustrate, I'll use the name variable that you provided.

 

data want;

set t1;

do _n_=1 to _nobs_;

   set t2 (rename=(name=name2)) point=_n_ nobs=_nobs_;

   if name=name2 then obs_in_T2=_n_;

end;

if obs_in_T2=. then obs_in_T2=0;

drop name2;

run;

 

This is feasible with a small number of variables, but very clumsy with a lot of variables.  But it can be done.

LineMoon
Lapis Lazuli | Level 10

Thank you all for your answers @Astounding @Ksharp @Jagadishkatam

That's very kind from you.

@Astounding: I need this method for a lots of variables.

 

KachiM
Rhodochrosite | Level 12

How about HASH? 

 

What is the key - ID or Name or both?

 

Here is a solution where you can have the key in whatever way you want. You said you want to use with several variables? Do you mean the key as a combination of variables? In that case hash solution is easy. If T2 is a very large file, then you include just the varaibles of interest into the hash table using KEEP/DROP to minimize the size of the hash table.

 

data T1 ;
input id    Name $;
cards;   
1  Bob   
2  sausan   
3  Petter     
4  bonde
;
run;
data T2;
input id  Name $;
cards;     
4   bonde      
2   sausan     
1   Bob  
;   
run;

data want;
   if _n_ = 1 then do;
      if 0 then set t2;
      declare hash h();
      h.definekey('id');
      h.definedata('rowid');
      h.definedone();

      do rowid = 1 by 1 until(last);
         set t2 end = last;
         h.add();
      end;
   end;
   set t1;
   if h.find() ^= 0 then rowid = 0;
run;

proc print data = want;
run;

If the key is 'NAME', replace H.DEFINEKY('ID') by
h.definekey('Name');

If ID and NAME are the composite key then use
h.dfinekey('id','Name');

All other statements remain the same.
LineMoon
Lapis Lazuli | Level 10

@KachiM: Thank you for your answer...good idea

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 928 views
  • 4 likes
  • 5 in conversation