Curious only....
Many (most?) SAS executions (programs of users) happen lickety-split. Maybe, what, ten minutes at most, or so.
Do some folks have programs that run a long time?
So, what I'm actually wondering is, suppose a distribution plot of total time taken to run, a histogram, what is some of the stuff done over in the right tail?
(And, as a follow-up, how many person-hours would that be equivalent to in pre-SAS days?)
Any thoughts appreciated.
Nicholas Kormanik
The run time will vary depending on;
- The skill of the programmer
- The volume of data
- The manipulations and calculations done
- The performance of the hardware.
It should be noted that regardless of the last three factors, a badly written program can bring a machine to its knees.
Two further comments:
- SAS is generally I/O-heavy. Transaction systems typically process one row at a time. Data management and data analysis systems such as SAS typically process whole tables. Slow disks are usually the second thing to look at.
- The first thing to look at is the log. See which steps take the longest. Note that programs that run for a long time also slow down all the other processes on the server, so it's a vicious circle. From my experience, long-duration steps are generally due to sorting (which may not always be done optimally, or may not even be necessary) and badly formed SQL queries. Another common source of waste is many baby steps copying the data over and over to perform what could be done in one single step. Also look at the real and CPU times. to see if the CPU is the bottleneck.
The run time will vary depending on;
- The skill of the programmer
- The volume of data
- The manipulations and calculations done
- The performance of the hardware.
It should be noted that regardless of the last three factors, a badly written program can bring a machine to its knees.
Two further comments:
- SAS is generally I/O-heavy. Transaction systems typically process one row at a time. Data management and data analysis systems such as SAS typically process whole tables. Slow disks are usually the second thing to look at.
- The first thing to look at is the log. See which steps take the longest. Note that programs that run for a long time also slow down all the other processes on the server, so it's a vicious circle. From my experience, long-duration steps are generally due to sorting (which may not always be done optimally, or may not even be necessary) and badly formed SQL queries. Another common source of waste is many baby steps copying the data over and over to perform what could be done in one single step. Also look at the real and CPU times. to see if the CPU is the bottleneck.
A further way of asking would be (not to flog a dead horse)...
What are the total lengths of time you are familiar with to run some SAS code set?
Doing what, for instance? Which procs?
(And, what the heck, let's assume current hardware: 32 gigs RAM, SSD, etc.)
Chime in here.
> What are the total lengths of time you are familiar with to run some SAS code set?
There are no rules. We have jobs that last seconds, and jobs that last hours.
It depends on what the jobs do.
Our hardware is much faster than SSDs thankfully, but supports many concurrent jobs.
The longest running job we ever had in batch was the import of one of the largest DBMS tables (metadata for documents scanned into the archive system).
Just copying the unload file from the DB server to the SAS server took about 6 hours. Although the job consisted of just three steps (data step to run the sftp via filename pipe, data step to read the transport file, proc sort for the main key), it would routinely take half a day when running during daytime, and about 9 hours in the night.
The probably longest-running step I ever encountered in a user program was a data step that implemented fuzzy logic to align the different ways car manufacturer and model names had been entered into the data (before the advent of a stricter frontend that provided drop-down select lists derived from Eurotax).
That step did a very complicated "edit distance" by using the POINT= option. Replacing this with a hash object speeded it up a little, but the number of iterations within each source observation was still considerable, and it would run for hours on just about half a million obs, with considerable CPU load.
My second example was part of the preparation of the data for running regressions to determine factors that influence the probability of claims in car insurance; it resulted in our first rate that was scientifically calculated and used sex, age, profession, location, manufacturer (in addition to the "standard" factor engine power) as calculation factors. Without SAS, such a calculation would have been impossible.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.