Hello experts,
Please, I need a soft method to compare two datasets.
The idea, is to loop on the lines of first table, if the line in table 1 is the same in table 2, we keep the obs number , if not 0
I know that, we can use compare proc, but this, this proc will compare the observers by order (first with first, so on ..) . But I do not want that
proc compare base=t1 comp=t2;
run;
I know that, we can use proc sort to sort the the tables by _all_ and then using compare , but I do not want that
proc sort data=ti out ti;
by _all_;
run
INPUT
T1 | T2 | |||
Id | Name | Id | Name | |
1 | Bob | 4 | bonde | |
2 | sausan | 2 | sausan | |
3 | Petter | 1 | Bob | |
4 | bonde |
Output
Id | Name | Obs_IN_T2 |
1 | Bob | 3 |
2 | sausan | 2 |
3 | Petter | 0 |
4 | bonde | 1 |
data T1 ;
input Id Name $;
cards;
1 Bob
2 sausan
3 Petter
4 bonde
;
data T2;
input Id Name $;
row=_n_;
cards;
4 bonde
2 sausan
1 Bob
;
proc sort data =t1;
by id;
run;
proc sort data =t2;
by id;
run;
option missing=0;
data t1t2;
merge t1 t2;
by id;
run;
Check PROC COMPARE 's ID statement.
You can do it, but you might need a lot of code to determine when you have a match. To illustrate, I'll use the name variable that you provided.
data want;
set t1;
do _n_=1 to _nobs_;
set t2 (rename=(name=name2)) point=_n_ nobs=_nobs_;
if name=name2 then obs_in_T2=_n_;
end;
if obs_in_T2=. then obs_in_T2=0;
drop name2;
run;
This is feasible with a small number of variables, but very clumsy with a lot of variables. But it can be done.
Thank you all for your answers @Astounding @Ksharp @Jagadishkatam
That's very kind from you.
@Astounding: I need this method for a lots of variables.
How about HASH?
What is the key - ID or Name or both?
Here is a solution where you can have the key in whatever way you want. You said you want to use with several variables? Do you mean the key as a combination of variables? In that case hash solution is easy. If T2 is a very large file, then you include just the varaibles of interest into the hash table using KEEP/DROP to minimize the size of the hash table.
data T1 ;
input id Name $;
cards;
1 Bob
2 sausan
3 Petter
4 bonde
;
run;
data T2;
input id Name $;
cards;
4 bonde
2 sausan
1 Bob
;
run;
data want;
if _n_ = 1 then do;
if 0 then set t2;
declare hash h();
h.definekey('id');
h.definedata('rowid');
h.definedone();
do rowid = 1 by 1 until(last);
set t2 end = last;
h.add();
end;
end;
set t1;
if h.find() ^= 0 then rowid = 0;
run;
proc print data = want;
run;
If the key is 'NAME', replace H.DEFINEKY('ID') by
h.definekey('Name');
If ID and NAME are the composite key then use
h.dfinekey('id','Name');
All other statements remain the same.
@KachiM: Thank you for your answer...good idea
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.