BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kiteulf
Quartz | Level 8

 

I have a dataset A that contains about 25 000 records

I have a dataset B that contains about 14 million records

 

I want some data in dataset B, where A.ID = B.ID, however it takes ages to pull the data.

 

So far I am using/tried

 

1. Proc sql : where B.ID in(select ID from A)

(where A only contains distinct numbers of ID)

2.A macro where the smallest A.ID is written and I pull the B.IDs that are greater than that one into a temp file and using that for extraction.

 

I wish there were a way to write all the A.IDs to a macro and use that, as it looks like my SAS server is having trouble with two tables at the same time

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

So if you only want those three variables and filter on A, my code would need an additional KEEP= dataset option:

data want;
set b (keep=id date amount); /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;

View solution in original post

5 REPLIES 5
Kurt_Bremser
Super User

This is THE task for which you use a hash object:

data want;
set b; /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;

No sorting is done (except in-memory for the small table), and the hash object discards duplicates in A by default.

Kiteulf
Quartz | Level 8

Could I use : to choose the column b and c from the big dataset?

data want;
set b; /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedata ("id", "b", "c") ; 
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;

 

Kurt_Bremser
Super User

My code will include ALL variables from the big dataset B and filter on ID's found in dataset A.

the DEFINEDATA() method is used to retrieve values from the hash table to the PDV, in which case you also must use FIND() instead of CHECK().

The code you posted would only be valid if dataset A contained the variables b and c, and you wanted those variables added to those retrieved from dataset B; you must also define those variables in the PDV (usually done with a LENGTH statement); additionally, replace CHECK() with FIND().

 

So please be more specific: which variables are contained in which dataset?

Kiteulf
Quartz | Level 8

A contains ID

 

B contains

ID, DATE, AMOUNT

 

pluss a lot more that I don't care about.

Kurt_Bremser
Super User

So if you only want those three variables and filter on A, my code would need an additional KEEP= dataset option:

data want;
set b (keep=id date amount); /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 958 views
  • 0 likes
  • 2 in conversation