BookmarkSubscribeRSS Feed
jerry898969
Pyrite | Level 9
Hello to everyone
I have an optimization question about the following query

proc sql ;
update Addr a
set A_ID =
(
select ID
from U_Addr
where US=upcase(a.OS)
and UC=upcase(a.OC)
and US=upcase(a.OS)
) ;
quit ;

This update takes a very long time to update the Addr table. The Addr table has 230,000 rows and the U_Addr has 67,000 rows. The joining field are character fields which I think is one of the reason it takes so long.
If anyone has an tips or comments on how to optimize this query and make it faster I would appreciate it greatly.

Thank You
5 REPLIES 5
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
It would help to explain more about the SAS operating environment, particularly about the SAS code execution that leads up to the PROC SQL you have posted. Is SAS executing against local data or through a remote connection; is a DBMS (external table) involved? Is there variability with processing, depending on the row/observation count (try using OPTIONS OBS=nnn; for testing various scenarios).

Scott Barry
SBBWorks, Inc.
jerry898969
Pyrite | Level 9
Thanks guys for the replies.
Everything with this query runs locally. What i'm doing is taking a distinct list of address. I'm adding an ID column to this distinct list then I want to go back and update each row with it's corresponding ID.

With a regular left join how would I do the update part of it?

Thanks again for your help
Jerry
LinusH
Tourmaline | Level 20
Since your master table seems quite small, you just can overwrite it with the query result:

proc sql;
create table Addr as
select a.*, u.A_ID
from Addr as a left join U_Addr as u
on u.US=upcase(a.OS)
and u.UC=upcase(a.OC)
and u.US=upcase(a.OS)
;
quit;

Remember if you try to use indexed strategies, avoid having function calls in WHERE/ON-clauses.

/Linus
Data never sleeps
jerry898969
Pyrite | Level 9
Linus H,
Thank you for your help

That last sentence about the functions in the where clause opened my eyes. I did the upcase for these tables when I created them so I didn't have to do it in the join. The time when from 24 minutes to under 3.

Thank you so much

Jerry
LinusH
Tourmaline | Level 20
What goes on is that for each row in Addr, you do separate look-up in U_addr.
Indexing on some of the "join" columns might help.
In this particular case, you will probably find it much more efficient to do a regular left join to do this update, or data step techniques.

/Linus
Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 785 views
  • 0 likes
  • 3 in conversation