BookmarkSubscribeRSS Feed
Aboiron
Calcite | Level 5

So I am starting to be a little familiar with sas, and I realize that a lot of operations that I learned to do with data sets can also be done using proc sql statements, including merging, creating variables, subsetting and many others.

My question is therefore  : "When is it best to do what ?", is proc sql better ? always ?

Thanks in advance

7 REPLIES 7
mohamed_zaki
Barite | Level 11

It is very often asked question search for it, also check the communities for similar question.

You may check:

Five reasons to use the SAS DATA Step or PROC SQL

Ksharp
Super User

No. Each has its advantage and disadvantage .Sometime Data Step is better than SQL, Sometime vice verse .

Xia Keshan

DBailey
Lapis Lazuli | Level 10

I think it is mostly determined by your background.  If you grew up as a sql coder, you would probably prefer sql...but as other's have mentioned, there are some things that are better done in data steps..some things are better in sql.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I often see this question raised on games programming forums, what technology is best from x or y.  The answer is inevitably based on what can "you" do with the technology, and what the goal is.  If your happy merging datasets and find joins and such like difficult then keep using datasteps.  There are of course some considerations, such as speed of execution and memory/hdd space, however unless you are dealing with large datasets (i.e. in the millions of records) it shouldn't matter too much to you.

SASMike
Obsidian | Level 7

It depends.  Just like a carpenter may use different tools for different tasks, a SAS programmer will either use the DATA step or PROC SQL step depending on the circumstances.  The DATA step is very useful when you need to implement business logic or do heavy manipulations on your data.  Meanwhile, PROC SQL is useful for mining from a variety of datasouces.  Once you become familar and comfortable with both steps, you'll develop a propensity to use the most applicable step depending on the situation.

In a nutshell, DATA step is great for business logic and heavy data manipulations.  PROC SQL is great and intuitive for mining and combing many data sources.

I'd check out this paper for an introduction to the differences between the DATA step and PROC SQL: http://www2.sas.com/proceedings/sugi29/269-29.pdf.

Happy Programming! :smileysilly:

SAS Mike

Aboiron
Calcite | Level 5

So a answer with "it depends" does not help at all. If you do so, please at least tell on what id does depend.

Then, it sometimes happens that something is better, that is why there is research in algorithms.

So I rephrase my question : which one is the fastest, which one is the less memory consuming, in which cases.

And it happens that I have data with several millions observations.

jakarman
Barite | Level 11

Aboiron,

What can you do with the datastep.

- process all kind of external input data (Sftp url flat files)

- combining a lot of datasources at the same time  (set merge point) Hash object  external fcuntions like Perl (regular strings).

- creating a lot of dataset as SAS datasets or as tables, all in one pass.

What can you do with SQL.

- Accessing external DBMS systems in their advanced native mode.

- working with sas-datasets in the same way as on a external dbms and combine those

When there is a different functionality choose the best tool for that fcuntionality.

A hammer and screw is a bad combination. This implies you have to learn to use/understand those different tools.

What to choose when there is a good overlap, is that your question?.

The datastep is fast when the sequential approach is good and the limitation is IO speed.

Proc SQL by nature is better when you can use the multithreading features. That is processing al lot in parallel.

When that processing in parallel is alo IO bound and that is sequential (SAN or that spinning thing dasd) when hitting above about 20% of the data (rule of thumb) it will lose in turn around time of sequential processing.

All other factors when speed does not matter but the ease of coding is the most major issue.

Make it as readable and comfortable as you like (personal favors). With most small datasets executing time is not the problem but coding time.      

---->-- ja karman --<-----

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1074 views
  • 4 likes
  • 7 in conversation