BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
metuzalem
Calcite | Level 5

I am running the following sql-step:

 

proc sql;

   create table id_map as select

           a.securityId

           ,b.name

           ,b.companyId

   from eq_securityIds as a

   left join FOUNDATION_SECURITY as b on a.securityId=b.securityId

       ;

quit;

 

where the dataset eq_securityId consists of ~16000 observations (here denoted a small dataset) and the dataset FOUNDATION_SECURITY consist of ~60 000 000 observations (here denoted a large dataset).

As of now the runtime to execute this command is about 10min, is there anything I can do to speed up the sql-step? e.g, is it in general faster to join a small dataset on to a large dataset instead of me joining the large dataset onto the small dataset? could some kind of indexation of the large dataset help me speeding up the computations?

I noted that the command ran a lot faster (~3min) when I excluded the variable b.name in the same sql-step, is this a general result or was it just a fluke?

 

 I run SAS 9.4 (in enterprise guide 7.1)

 

1 ACCEPTED SOLUTION

Accepted Solutions
LinusH
Tourmaline | Level 20

Indexing the large table is a good idea, especially if it's used in other use cases.

This would might trigger the hash join method in SQL, which is quite efficient (use _method PROC SQL option).

Data never sleeps

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

Keep both datasets sorted by securityid, and use a data step merge for the selection of your subset.

 

Alternatively, create a format from eq_securityIds where all contained securityid keys get a label of 'yes' and the OTHER value a 'no'. Then you can use the format in a subsetting if in a data step.

LinusH
Tourmaline | Level 20

Indexing the large table is a good idea, especially if it's used in other use cases.

This would might trigger the hash join method in SQL, which is quite efficient (use _method PROC SQL option).

Data never sleeps
metuzalem
Calcite | Level 5

Thx!

by running the command

proc sql;

    create index securityId on FOUNDATION_SECURITY;

quit;

the sql-step is executed a lot faster. It takes about 2min to create the index but since I am running similar join-queries further down the in the program with the large dataset then this helped a lot. Thx!

 

 

 

Ksharp
Super User

1) As @LinusH said , make a index on the large table is a good choice.

 

create index securityid on BigTable;

 

2)Hash Table 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 7995 views
  • 0 likes
  • 4 in conversation