BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
metuzalem
Calcite | Level 5

I am running the following sql-step:

 

proc sql;

   create table id_map as select

           a.securityId

           ,b.name

           ,b.companyId

   from eq_securityIds as a

   left join FOUNDATION_SECURITY as b on a.securityId=b.securityId

       ;

quit;

 

where the dataset eq_securityId consists of ~16000 observations (here denoted a small dataset) and the dataset FOUNDATION_SECURITY consist of ~60 000 000 observations (here denoted a large dataset).

As of now the runtime to execute this command is about 10min, is there anything I can do to speed up the sql-step? e.g, is it in general faster to join a small dataset on to a large dataset instead of me joining the large dataset onto the small dataset? could some kind of indexation of the large dataset help me speeding up the computations?

I noted that the command ran a lot faster (~3min) when I excluded the variable b.name in the same sql-step, is this a general result or was it just a fluke?

 

 I run SAS 9.4 (in enterprise guide 7.1)

 

1 ACCEPTED SOLUTION

Accepted Solutions
LinusH
Tourmaline | Level 20

Indexing the large table is a good idea, especially if it's used in other use cases.

This would might trigger the hash join method in SQL, which is quite efficient (use _method PROC SQL option).

Data never sleeps

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

Keep both datasets sorted by securityid, and use a data step merge for the selection of your subset.

 

Alternatively, create a format from eq_securityIds where all contained securityid keys get a label of 'yes' and the OTHER value a 'no'. Then you can use the format in a subsetting if in a data step.

LinusH
Tourmaline | Level 20

Indexing the large table is a good idea, especially if it's used in other use cases.

This would might trigger the hash join method in SQL, which is quite efficient (use _method PROC SQL option).

Data never sleeps
metuzalem
Calcite | Level 5

Thx!

by running the command

proc sql;

    create index securityId on FOUNDATION_SECURITY;

quit;

the sql-step is executed a lot faster. It takes about 2min to create the index but since I am running similar join-queries further down the in the program with the large dataset then this helped a lot. Thx!

 

 

 

Ksharp
Super User

1) As @LinusH said , make a index on the large table is a good choice.

 

create index securityid on BigTable;

 

2)Hash Table 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 9271 views
  • 0 likes
  • 4 in conversation