BookmarkSubscribeRSS Feed
Jeg123
Calcite | Level 5

I am running some proc sql statements which first uses alters the table a new column and then updates the column.

This is taking forever, it is faster to do a select all, even though it contains 50+ columns and create a new column in the select statement. This does not make sense. 

 

I tried finding an answer to this and the only thing I found was this comment

 

"FYI - SQL Update unfortunately as supported in SAS is almost always terrible. SQL has many uses in SAS, but update should almost never be one, unless it's a single value update that's not sourced from a table. For whatever reason, that always seems to be incredibly slow, even for not-all-that-large datasets, and even for updates that in SQL Server are quite quick. – Joe Jun 10 '15 at 14:27"

 

From https://stackoverflow.com/questions/30708324/proc-sql-update-efficiency-for-large-datasets

 

How come is update so inefficient? 

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Proc SQL can be slow all the time, especially with large data.  The reason, its not the primary language, it is a sub component.  Its provide as a procedure with a basic SQL compiler.  Base SAS is the programming language, and is optimised for data processing.  It is always recommended to use Base SAS above SQL, unless there is any specific benefits to using SQL. 

As for why, put _method and _tree on and see what the SQL compiler is doing;

http://www2.sas.com/proceedings/sugi30/101-30.pdf

I suspect its likely reading/writing each observation individually as there is no PDV.  Using a datastep reads each observation into the PDV then writes it out, being more efficient.  SQL does not have that, and has none of the tricks that databases use to speed up code running.

 

Another point, if your aiming to use SQL a lot, then look at normalising your data.  SQL works best with few columns and many rows.  50+ columns is a lot of columns.  

SASKiwi
PROC Star

Please post your full log including elapsed time and row numbers.

 

I've had good success getting reasonable performance with SQL Server updates by choosing an appropriate method along with tweaking database update options.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

But then that is using the SQL Server parser, and pass through is it not?   My impression of the question was that he was using SQL within SAS alone.  If its running on a DB then yes, it should be a lot quicker.

Jeg123
Calcite | Level 5

Here is a example

 

Create random data set

 

data A;
call streaminit(123);     
do i = 1 to 100000000;
   u = rand("Uniform");    
   output;
end;
run;

 

Some random query which creates a new column and also writes a whole new data set

 

proc sql;
create table B AS
	SELECT
		*
	,	1 AS id
FROM A
;quit;

Perform same as above, but using alter table

proc sql;
alter table A
	add id num
;quit;

proc sql;
update A
	set id = monotonic()
;quit;

Solution 1: 10.2 seconds

Solution 2: 44 seconds

 

It takes 4x longer, even though the first solution also has to copy all the additional columns. This is also exponentially increasing, increasing to 100 million rows it takes 40 seconds vs 7.5 minutes. 

SASKiwi
PROC Star

Monotonic is an unsupported SAS function. Does it perform similarly with a supported function like rand?

 

I confess I have only used your solution 1 so I've never struck the problem in solution 2. Using SET in SAS SQL is seriously limiting as a separate sub-query is required to populate each SET from another dataset. There are better ways to update in SAS.

 

I update mainly with DATA steps and that certainly provides best performance with SAS data. 

Jeg123
Calcite | Level 5

I see that my example was wrong, as I am using a constant 1 in one example and then monotonic() in another. But you can replace the

monotonic with 1 and experience the same slowness. 

 

Do you mind showing a short example how you do when you update a column in data step?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

The equivalent of the monotonic() example would be:

data a;
set a;
id=_n_;
run;

_n_ is an automatic internal variable which is observation number.

It is very easy to update, alter, manipulate, data using datastep language - it is at the end of the day the basis of the whole SAS system.  The SQL is a simple ANSI parser component accessed through Base SAS and hence is never going to be a good as the base language.  Think of it like a VB script to generate a C++ program.

Jeg123
Calcite | Level 5

Aware of the _n_. But is this really the "proper" way to add a column using

 

data a;
  set a;
  id=_n_;
run;

From my understanding, the above will overwrite a total new file, which seems inefficient when only adding one column. Or is this the "correct" way to add a new column?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

That depends on scenario, however in general, yes, that is the proper way to add variables.  

In certain circumstances the datastep update function can be applied:

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202975.htm

 

And in big data scenarios then the whole programming part tends to be quite different.  However in most programming, the datastep is the way to go.  It will also be the most efficient as its a one directional stream.  SQL creates intermediary steps where the data is sorted, merged and otherwise processed, which can lead to lots of reads and writes.  On a proper database there are processes in place to handle this.

 

You can find a whole raft of videos about most SAS related items here:

https://video.sas.com/category/videos/how-to-tutorials

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 5322 views
  • 0 likes
  • 3 in conversation