I have a dataset A that contains about 25 000 records
I have a dataset B that contains about 14 million records
I want some data in dataset B, where A.ID = B.ID, however it takes ages to pull the data.
So far I am using/tried
1. Proc sql : where B.ID in(select ID from A)
(where A only contains distinct numbers of ID)
2.A macro where the smallest A.ID is written and I pull the B.IDs that are greater than that one into a temp file and using that for extraction.
I wish there were a way to write all the A.IDs to a macro and use that, as it looks like my SAS server is having trouble with two tables at the same time
So if you only want those three variables and filter on A, my code would need an additional KEEP= dataset option:
data want;
set b (keep=id date amount); /* big dataset */
if _n_ = 1
then do;
declare hash a (dataset:"a"); /* small dataset */
a.definekey("id");
a.definedone();
end;
if a.check() = 0; /* found an entry */
run;
This is THE task for which you use a hash object:
data want;
set b; /* big dataset */
if _n_ = 1
then do;
declare hash a (dataset:"a"); /* small dataset */
a.definekey("id");
a.definedone();
end;
if a.check() = 0; /* found an entry */
run;
No sorting is done (except in-memory for the small table), and the hash object discards duplicates in A by default.
Could I use : to choose the column b and c from the big dataset?
data want;
set b; /* big dataset */
if _n_ = 1
then do;
declare hash a (dataset:"a"); /* small dataset */
a.definekey("id");
a.definedata ("id", "b", "c") ;
a.definedone();
end;
if a.check() = 0; /* found an entry */
run;
My code will include ALL variables from the big dataset B and filter on ID's found in dataset A.
the DEFINEDATA() method is used to retrieve values from the hash table to the PDV, in which case you also must use FIND() instead of CHECK().
The code you posted would only be valid if dataset A contained the variables b and c, and you wanted those variables added to those retrieved from dataset B; you must also define those variables in the PDV (usually done with a LENGTH statement); additionally, replace CHECK() with FIND().
So please be more specific: which variables are contained in which dataset?
A contains ID
B contains
ID, DATE, AMOUNT
pluss a lot more that I don't care about.
So if you only want those three variables and filter on A, my code would need an additional KEEP= dataset option:
data want;
set b (keep=id date amount); /* big dataset */
if _n_ = 1
then do;
declare hash a (dataset:"a"); /* small dataset */
a.definekey("id");
a.definedone();
end;
if a.check() = 0; /* found an entry */
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.