BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kiteulf
Quartz | Level 8

 

I have a dataset A that contains about 25 000 records

I have a dataset B that contains about 14 million records

 

I want some data in dataset B, where A.ID = B.ID, however it takes ages to pull the data.

 

So far I am using/tried

 

1. Proc sql : where B.ID in(select ID from A)

(where A only contains distinct numbers of ID)

2.A macro where the smallest A.ID is written and I pull the B.IDs that are greater than that one into a temp file and using that for extraction.

 

I wish there were a way to write all the A.IDs to a macro and use that, as it looks like my SAS server is having trouble with two tables at the same time

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

So if you only want those three variables and filter on A, my code would need an additional KEEP= dataset option:

data want;
set b (keep=id date amount); /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;

View solution in original post

5 REPLIES 5
Kurt_Bremser
Super User

This is THE task for which you use a hash object:

data want;
set b; /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;

No sorting is done (except in-memory for the small table), and the hash object discards duplicates in A by default.

Kiteulf
Quartz | Level 8

Could I use : to choose the column b and c from the big dataset?

data want;
set b; /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedata ("id", "b", "c") ; 
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;

 

Kurt_Bremser
Super User

My code will include ALL variables from the big dataset B and filter on ID's found in dataset A.

the DEFINEDATA() method is used to retrieve values from the hash table to the PDV, in which case you also must use FIND() instead of CHECK().

The code you posted would only be valid if dataset A contained the variables b and c, and you wanted those variables added to those retrieved from dataset B; you must also define those variables in the PDV (usually done with a LENGTH statement); additionally, replace CHECK() with FIND().

 

So please be more specific: which variables are contained in which dataset?

Kiteulf
Quartz | Level 8

A contains ID

 

B contains

ID, DATE, AMOUNT

 

pluss a lot more that I don't care about.

Kurt_Bremser
Super User

So if you only want those three variables and filter on A, my code would need an additional KEEP= dataset option:

data want;
set b (keep=id date amount); /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1990 views
  • 0 likes
  • 2 in conversation