Solved: Efficient ways to extract data

Kiteulf · Posted 09-08-2021 05:01 AM

I have a dataset A that contains about 25 000 records

I have a dataset B that contains about 14 million records

I want some data in dataset B, where A.ID = B.ID, however it takes ages to pull the data.

So far I am using/tried

1. Proc sql : where B.ID in(select ID from A)

(where A only contains distinct numbers of ID)

2.A macro where the smallest A.ID is written and I pull the B.IDs that are greater than that one into a temp file and using that for extraction.

I wish there were a way to write all the A.IDs to a macro and use that, as it looks like my SAS server is having trouble with two tables at the same time

Kurt_Bremser · Posted 09-08-2021 06:08 AM

So if you only want those three variables and filter on A, my code would need an additional KEEP= dataset option:

data want;
set b (keep=id date amount); /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 09-08-2021 05:25 AM

This is THE task for which you use a hash object:

data want;
set b; /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;

No sorting is done (except in-memory for the small table), and the hash object discards duplicates in A by default.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kiteulf · Posted 09-08-2021 05:46 AM

Could I use : to choose the column b and c from the big dataset?

data want;
set b; /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedata ("id", "b", "c") ; 
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;

Kurt_Bremser · Posted 09-08-2021 06:01 AM

My code will include ALL variables from the big dataset B and filter on ID's found in dataset A.

the DEFINEDATA() method is used to retrieve values from the hash table to the PDV, in which case you also must use FIND() instead of CHECK().

The code you posted would only be valid if dataset A contained the variables b and c, and you wanted those variables added to those retrieved from dataset B; you must also define those variables in the PDV (usually done with a LENGTH statement); additionally, replace CHECK() with FIND().

So please be more specific: which variables are contained in which dataset?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kiteulf · Posted 09-08-2021 06:05 AM

A contains ID

B contains

ID, DATE, AMOUNT

pluss a lot more that I don't care about.

Kurt_Bremser · Posted 09-08-2021 06:08 AM

So if you only want those three variables and filter on A, my code would need an additional KEEP= dataset option:

data want;
set b (keep=id date amount); /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Efficient ways to extract data

Re: Efficient ways to extract data

Re: Efficient ways to extract data

Re: Efficient ways to extract data

Re: Efficient ways to extract data

Re: Efficient ways to extract data

Re: Efficient ways to extract data

Catch up on SAS Innovate 2026

Efficient ways to extract data

Re: Efficient ways to extract data

Re: Efficient ways to extract data

Re: Efficient ways to extract data

Re: Efficient ways to extract data

Re: Efficient ways to extract data

Re: Efficient ways to extract data

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away