Hello experts,
Please, I need a soft method to compare two datasets.
The idea, is to loop on the lines of first table, if the line in table 1 is the same in table 2, we keep the obs number , if not 0
I know that, we can use compare proc, but this, this proc will compare the observers by order (first with first, so on ..) . But I do not want that
proc compare base=t1 comp=t2;
run;
I know that, we can use proc sort to sort the the tables by _all_ and then using compare , but I do not want that
proc sort data=ti out ti;
by _all_;
run
INPUT
T1 | T2 | |||
Id | Name | Id | Name | |
1 | Bob | 4 | bonde | |
2 | sausan | 2 | sausan | |
3 | Petter | 1 | Bob | |
4 | bonde |
Output
Id | Name | Obs_IN_T2 |
1 | Bob | 3 |
2 | sausan | 2 |
3 | Petter | 0 |
4 | bonde | 1 |
data T1 ;
input Id Name $;
cards;
1 Bob
2 sausan
3 Petter
4 bonde
;
data T2;
input Id Name $;
row=_n_;
cards;
4 bonde
2 sausan
1 Bob
;
proc sort data =t1;
by id;
run;
proc sort data =t2;
by id;
run;
option missing=0;
data t1t2;
merge t1 t2;
by id;
run;
Check PROC COMPARE 's ID statement.
You can do it, but you might need a lot of code to determine when you have a match. To illustrate, I'll use the name variable that you provided.
data want;
set t1;
do _n_=1 to _nobs_;
set t2 (rename=(name=name2)) point=_n_ nobs=_nobs_;
if name=name2 then obs_in_T2=_n_;
end;
if obs_in_T2=. then obs_in_T2=0;
drop name2;
run;
This is feasible with a small number of variables, but very clumsy with a lot of variables. But it can be done.
Thank you all for your answers @Astounding @Ksharp @Jagadishkatam
That's very kind from you.
@Astounding: I need this method for a lots of variables.
How about HASH?
What is the key - ID or Name or both?
Here is a solution where you can have the key in whatever way you want. You said you want to use with several variables? Do you mean the key as a combination of variables? In that case hash solution is easy. If T2 is a very large file, then you include just the varaibles of interest into the hash table using KEEP/DROP to minimize the size of the hash table.
data T1 ;
input id Name $;
cards;
1 Bob
2 sausan
3 Petter
4 bonde
;
run;
data T2;
input id Name $;
cards;
4 bonde
2 sausan
1 Bob
;
run;
data want;
if _n_ = 1 then do;
if 0 then set t2;
declare hash h();
h.definekey('id');
h.definedata('rowid');
h.definedone();
do rowid = 1 by 1 until(last);
set t2 end = last;
h.add();
end;
end;
set t1;
if h.find() ^= 0 then rowid = 0;
run;
proc print data = want;
run;
If the key is 'NAME', replace H.DEFINEKY('ID') by
h.definekey('Name');
If ID and NAME are the composite key then use
h.dfinekey('id','Name');
All other statements remain the same.
@KachiM: Thank you for your answer...good idea
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.