Help using Base SAS procedures

Faster way to do this in data step?

Reply
N/A
Posts: 0

Faster way to do this in data step?

Sorry for the noobish question. I'm a sql background trying to get a handle on SAS...

So I've been adding a variable based on matching observations to a second table...

proc sql;
create table matching as
select a.*, b.id_final_new as match_id
from people a left join responders b
on a.id_one=b.final_id_one and a.id_two=b.final_id_two;
quit;

This takes forever! Can I do the same in a quick data step?
Frequent Contributor
Frequent Contributor
Posts: 77

Re: Faster way to do this in data step?

Posted in reply to deleted_user
I have found similiar problems with SQL joins of large data sets. While the data step is usually faster, merging requires that the joined data be presorted. I have have began experimenting using Hash tables in the data step to join records. While I'm not comfortable fully explaining them, they have provided the performance enhancements that I wanted. There are many good SAS Global forum papers that explain and give examples. Just search hash tables in tech support search.

Here's one paper....

http://www2.sas.com/proceedings/forum2008/029-2008.pdf
N/A
Posts: 0

Re: Faster way to do this in data step?

Posted in reply to deleted_user
Thanks, looks like I've got a lot of reading to do...
Regular Contributor
Posts: 151

Re: Faster way to do this in data step?

Posted in reply to deleted_user
You can test this code. If dataset people is not very large then should work very fast.

data responders(keep=id_one id_two id_final index=(cr1=(id_one id_two)));
set responders(rename=(final_id_one=id_one final_id_two=id_two));
run;

data matching;
set people;
set responders key=cr1/unique;
select(_iorc_);
when(%sysrc(_sok)) do; end;
when(%sysrc(_dsenom)) do; id_final= . /* if id_final is char then should be id_final=''*/; _error_ = 0; end;
otherwise do;
put 'ERROR_: Unexpected value for _IORC_= ' _iorc_ ' Program terminating. Data set accessed is responders';
put _all_; _error_ = 0; stop; end;
end;
run;
PROC Star
Posts: 1,760

Re: Faster way to do this in data step?

Posted in reply to deleted_user
You could investigate using a hash table in a data step.
While the syntax is unusual at first glance, it is quite easy to grasp.
Hash tables are fast as they are loaded in memory, and don't require any prior sorting.
The pseudo code looks like:
data matching ;
if _N_=1 then load hash table;
set base table;
find match;
run;
Ask a Question
Discussion stats
  • 4 replies
  • 119 views
  • 0 likes
  • 4 in conversation