can anybody tell me what is the fastest method to find some records from a dataset in another dataset?
I explain better: I have two dataset, the first one has 6000 records and differents columns; the second one has 500 records and just one column with the ID.
I need to select in the first dataset the records with the ID from the second dataset. I have tried with merge and inner join (the where in (,,,) woluld be crazy), but the program runs to slowly, around 4 minutes, then i'd want to know if there is another faster method to do this.
For data sets of your size (assuming a reasonable number of columns) the SQL step ought to do the trick. If the SQL look-up is too slow, there are other alternatives that generally require more coding, but can usually process quicker. The DATA Step Components (HASH) Objects are about the fastest. An overview of look-up techniques can be found at: http://caloxy.com/papers/43-i_how_table_lookups_from_ift.pdf
I think you should first index these two dataset by using proc sort.then that wil be faster.
And if you do not want sort.;
there are some code i copy from somewhere.Wish will help you .
proc sort data=small(keep=id) nodupkey force;*small is your second dataset;
retain fmtname 'key'
type 'N' *'N' means numerical type format,if id is character type then use 'C';
set small end=eof;
if eof then do;
proc format cntlin=fmt;
set test; *test is your first dataset;