Same with me here. I have also seen data Step executing faster than PROC SQL. But, Data Step may produce more lines of code than PROC SQL.
There are major differences in the datastep approach vs SQL approach.
Touching all data the datastep should be faster as doing one ordered speed up IO stream on the background. But it is single threaded.
The single threaded approach is required as the data is processed ordered. You have lag/retain fucntionality aside first/last (missing in SQL).
Doing small part of work SQL can win as this will work better on isolated records.
It will execute Multi-threaded (more processes) and cause more randomness in IO causing wait-seek time.
If you have SAS dataset processed with SQL you can also specify indexes and constraints like every DBMS.
Also modfiy records in place instead of replacing datasets is possible.
My personal rule(s):
If I'm working with the data on a row level then I use a data step, if I'm working with the columns, doing aggregates/counts I use SQL.
I generally do merges with SQL, unless I need to identify the incoming data set then I use a DATA STEP.
If data is on a server and there's the option, I'll use SQL Pass Through for bigger datasets (anything that takes longer than 30 secs) whenever possible.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.