BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello,

When handling large datasets ,is it better to work with proc sql or DATA step?

Which is working more quickly?

 

6 REPLIES 6
Oligolas
Barite | Level 11

It depends on what you intend to do, it's a case by case situation, there is no general answer or rule to follow.

For my part I'd say it is triggered by task, functionality, usage, readability and easiness of validation for third party.

________________________

- Cheers -

Ksharp
Super User
I prefer Data Step .
Ksharp
Super User
I prefer Data Step .
PaigeMiller
Diamond | Level 26

I prefer the DATA step as well, but there are features in PROC SQL that really aren't available in a DATA step (and vice versa), and so the answer is what @Oligolas said: "It depends".

--
Paige Miller
maguiremq
SAS Super FREQ

You could time the data step versus the proc sql step using some of the code included in the link below.

 

https://blogs.sas.com/content/sgf/2015/01/21/sas-timer-the-key-to-writing-efficient-sas-code/ 

 

I'm not totally familiar with how the two are going to vary under the hood, but it gives you a metric to go by. Also assumes you're willing to write both steps.

s_lassen
Meteorite | Level 14

What is best for large data, SQL or data step?

 

Depends. On many things.

 

First, how large is "large"? Are we talking 10 GB or 1000 GB? In the first case, it should be possible to fit all the data into the memory of your server, or even a decent-sized PC. In the second case, not so easy.

 

The data step used to be one of the fastest beasts around, back in the old days when even a mainframe might not have more than a few gigabytes of memory, and most processing had to be done from disc or even tape. SAS data step language is very fast for sequential processing, also when you are low on memory. It is not an interpreted language, it gets compiled to machine code on the fly, and can do a lot of processing with very little memory. 

 

Nowadays, SQL interpreters are pretty good at optimizing data handling, and memory is not so much a bottleneck.

 

 

Second, where is your data stored? If it is on a modern database server, using pass-through native SQL will in many situations be faster than the data step. Even SAS SQL, if written correctly, can be executed as native SQL on the server.

 

 

Third, what do know best? Badly written data step code is easily outperformed by well-written SQL, and vice versa.

 

 

Fourth, what are you doing? Is it updating data with transactions, analyzing data, or transforming data for a Data Warehouse? The solution you choose should also depend on the task at hand.

 

 

Fifth, what are your resources? A large SAS server can be quite expensive (especially the software licences), and it you may be better off investing in a powerful database server, and using SAS for the tasks where it really shines. On the other hand you may have a powerful SAS server, used by analysts during the day, but available for batch processing at night. In which case you may prefer SAS data step code for you night batch.

 

 

Personally, I am completely flexible. But even when doing native SQL on a database server, SAS is a very good framework for tying things together. I have not seen a macro language on par with SAS on any other platform.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 657 views
  • 4 likes
  • 6 in conversation