@bebess wrote:
i've tested the sort procedure having the table in my Work sesson ( local ) and i haven't notice an improvment of performance, i've still a big difference cpu and real time , the sort takes around 8 hours !
Which means that your storage is not up to the task. Get in touch with your SAS administrators, they need to improve the performance if you need to work with such data sizes regularly.
another test in a better env seems to show better performances
NOTE: PROCEDURE SORT used (Total process time):
real time 3:56:21.17
user cpu time 1:13:29.12
system cpu time 19:33.06
memory 1059895.26k
OS Memory 1080224.00k
Timestamp 01/24/2022 01:56:51 PM
Step Count 3 Switch Count 22047
Sorting - usually y9u need around 2-3 times the size of the table in your UTILLOC. Assuming it has the same location as your saswork?
2TB should be enough, have monitored during execution?
I can see that SQL "only" uses 1GB om RAM. If you have more available to you, maximize MEMSIZE and SORTSIZE options.
TAGSORT is resource effective, but it will take substantiable longer time than the default sort.
And I guess that you really need all input columns as output?
And the data is in a Base SAS library?
Consider using SPDE. It might not solve your current issue, but for large data sets it faster for many use cases. Then you should take a look at the SPDESORTSIZE option.
only this job was running on the server and i've seen the error in this time .
i need 90% of columns from the table but as i 'm also created new calculated variables so at the end it's like i am having same number of columns but with less observations .
Yes it's a BASE SAS library , a basic SAS table
In an "ideal sort", you need three times the size of the table
If the original table is compressed (dataset or system option COMPRESS=yes), the utility file will be larger, sometimes MUCH larger (think of a compression ratio of 95%, the utility file will be 20 times as large). In this case, use the TAGSORT option of PROC SORT. Since this option is not available in SQL, you are better off running your summation in two steps:
If there are more variables in your source dataset than those used in BY and for the summation, drop all other variables when sorting, and create an intermediary table:
proc sort
data=xxx (keep=list_of_var var_n)
out=yyy
/* (compress=yes) tagsort if long character variables exist */
;
by list_of_var;
run;
proc summary data=yyy;
by list_of_var;
var var_n;
output out=zzz sum()=;
run;
You also seem to have non-adequate storage or other CPU-consuming processes, recognizable here:
real time 1:08:47.47 user cpu time 11:37.53 system cpu time 2:51.71
Your real time is more than 4 times the CPU time, which points either to wait states caused by the storage, or contention for CPU resources.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.