BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Steelers_In_DC
Barite | Level 11

Good Morning All,

I have the following code:

proc sql;

create table fourth_run as

select b.*

from mdj.zip5_august a left join

     mdj.addr_100m_zip9_zip5 b on

a.zip5 = substr(b.prop_zip_code,1,5);

zip5_august has about 900 records, 1 variable.  addr_100m_zip9_zip5 has about 1 million records with 70 variables.  I want to make sure that the way I'm doing this is best practice.

Is it best practice to have the smaller dataset listed first?  Both are sorted accordingly, will the substr() cause much more processing time?  Would it be better to set up another field?  It's a monthly file that doesn't get used much.  I'd prefer not to make too many changes as this server is fairly bogged down already.

Any input or suggestions are welcome.

Thanks,

Go Pirates.

1 ACCEPTED SOLUTION

Accepted Solutions
LinusH
Tourmaline | Level 20

Not necessary, but you asked for optimzation, and an indexed join will usually perform better than a sort/merge join. Especially when the ratio of hits/total rows is as low as in your example.

The message means that SAS could use multi threading (that sort is done in parallel using multiple cores/CPUs). Good, but not surprising.

Data never sleeps

View solution in original post

4 REPLIES 4
LinusH
Tourmaline | Level 20

I would index prop_zip_code.

Use options msglevel =i; to verify it's being used.

Data never sleeps
Steelers_In_DC
Barite | Level 11

I'll have to look that up.  I'm not familiar with msglevel = i.  I ran a small subset of each dataset and see this:

NOTE: SAS threaded sort was used.

I'm not sure what that means.  Is an index necessary if it is sorted on prop_zip_code?

LinusH
Tourmaline | Level 20

Not necessary, but you asked for optimzation, and an indexed join will usually perform better than a sort/merge join. Especially when the ratio of hits/total rows is as low as in your example.

The message means that SAS could use multi threading (that sort is done in parallel using multiple cores/CPUs). Good, but not surprising.

Data never sleeps
Steelers_In_DC
Barite | Level 11

Excellent.  Thanks!

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 991 views
  • 3 likes
  • 2 in conversation