Re: Sort Initialization failure

qaguy1982 · Posted 03-14-2022 09:30 AM

I have a program that has been working for a couple of years but now when I run it on my local PC I get Sort Initialization failures which blows up the entire program.

The program is assigning a group of individual subjects to up to 5 local service centers based upon distance.

To save memory, I took the service center ids and distance from the subject address and assigned them previously. Now I am adding back in the details for each service center address via proc sql so that table is left joined to the (member-ServiceCenter) table 5 times to add the detail address data for each service center. When I do this on our SAS server, I don't have any issues but when I run this process locally on my laptop/PC the SAS session issues a SORT initialization error that blows up.

I read the article :

Usage Note 39705: Troubleshooting "Sort initialization failure" and "Sort execution failure" errors in Windows and UNIX environments (https://support.sas.com/kb/39/705.html )

but I am not sure I understand what I am supposed to do with the memory and sortsize parameters. I have my memory over 16Gb and my Sortsize at 8Gb. but the error keeps occurring. I have occasionally just kept running it over and over and eventually the program works.

The original service center table was nationwide so I filtered to just include the service center ids that are used by my current sample of individuals. So the original service center table was 77,000 locations but now if I am running the report for 100 individuals, then at most I will need only 500 service center ids in my table, and actually it is closer to 200 as there are many individuals within the same geographic area so that they can be assigned to the same service center. So now even though there are 5 copies of the table, they are each small. But I still get the sort initialization error. I have added the threaded sort option and that worked sometimes but lately that is not working either.

I guess I am looking for some configuration ideas. Or some explanation of why this might work on the server but not on the local PC. My memory configurations actually show that I am allocating even more memory on my local than the server allocates.

The gist of the query is :

Proc sql ;

create table customer_service_Details as

select a.name, a.address, a.city, a.state, a.zipcode, a.service_center_id1, a.service_center_id2, a.service_center_id3, a.service_center_id4, a.service_center_id5,

b.sc_name as sc_name1, b.sc_address as sc_address1 ......

c.sc_name as sc_name2, c.sc_address as sc_address2,.....

d......

e.......

f.sc_name as sc_name5, f.sc_address as sc_address5 ....

from

customer a

left outer join service_center_details b on a.service_center_id1 = b.service_center_id

left outer join service_center_details c on a.service_center_id2 = c.service_center_id

left outer join service_center_details d on a.service_center_id3 = d.service_center_id

left outer join service_center_details e on a.service_center_id4 = e.service_center_id

left outer join service_center_details f on a.service_center_id5 = f.service_center_id

;

quit;

This is simulated code and there are many additional fields added for each address.

Any guidance or help would be appreciated. I have worked with other members on my team as well and we are all stumped.

Carl

pink_poodle · Posted 03-14-2022 07:08 PM

If it has trouble sorting, try pinching off a small portion of the data set to test the program, it should probably work. Then see what would the the maximum doable size of that portion. Then perform the program on each part of the data set of that size. Then re-set the results.

SASKiwi · Posted 03-14-2022 08:06 PM

Have Windows Task Manager open while running the program. Does memory max out or not? Also check free disk space. Not enough free disk space can cause sort execution failures. You need at around 3 times the amount of space a dataset takes up to sort it.

Is your SAS installation 32 or 64 bit? 32-bit SAS only uses up to 2GB of memory.

Kurt_Bremser · Posted 03-15-2022 02:40 AM

So this is basically a multiple lookup into the same details table.

A SQL query like this in SAS is always a disaster waiting to happen (see Maxim 10).

Replace it with a data step and a hash object, you will see a performance improvement by several orders of magnitude.

As soon as I'm on my desktop, I'll cobble up some example code.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 03-15-2022 03:42 AM

OK, here we go:

data details;
length
  service_center_id $10
  sc_name $30
  sc_address $200
;
service_center_id = "1234567890";
sc_name = "Anton";
sc_address = "somewhere";
run;

data customer;
service_center_id1 = "1234567890";
service_center_id2 = "1234567890";
service_center_id3 = "1234567890";
service_center_id4 = "";
service_center_id5 = "";
run;

data customer_service_details;
set customer;
/* define all new and hash variables here */
length
  service_center_id $10
  sc_name
  sc_name1
  sc_name2
  sc_name3
  sc_name4
  sc_name5
    $30
  sc_address
  sc_address1
  sc_address2
  sc_address3
  sc_address4
  sc_address5
    $200
;
/* make arrays for use in the DO loop */
array sc_k {*} service_center_id1-service_center_id5;
array sc_n {*} sc_name1-sc_name5;
array sc_a {*} sc_address1-sc_address5;
if _n_ = 1
then do; /* define hash */
  declare hash dt (dataset:"details");
  dt.definekey("service_center_id");
  dt.definedata("sc_name","sc_address");
  dt.definedone();
  call missing(service_center_id,sc_name,sc_address); /* to avoid "uninitialized" NOTE */
end;
/* look up in a loop */
do i = 1 to dim(sc_k);
  if dt.find(key:sc_k{i}) = 0
  then do;
    sc_n{i} = sc_name;
    sc_a{i} = sc_address;
  end;
end;
drop service_center_id sc_name sc_address i;
run;

The SQL requires a SORT (or at least creation of an internal index) for all 5 keys of the whole incoming dataset, while the data step works through the main dataset sequentially, and builds a b-tree for the keys of the lookup table only once. All "sorting" is done purely in memory, and only for the (considerably smaller) lookup table.

Even before the advent of the hash, I would not have used a single SQL for this.

Even running 5 SORTs and 5 data step MERGEs would usually outperform the SQL (unless you could hand over the SQL to a DBMS, if your data is actually stored there).

Next we turned to defining formats for each column of the lookup table, and use those, speeding up the process massively.

And then came hash, which outperforms formats once again. It requires a little bit more coding (but you avoid creating a CNTLIN dataset for each lookup column), but the performance benefits - well, you will see those.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

qaguy1982 · Posted 03-17-2022 11:31 AM

Thank you, I am looking into hashes and appreciate your detailed example. I learned and used Perl and SQL before learning SAS so my SAS tends to use a lot of PROC SQL and I am not as well versed in pure SAS datastep logic. My actual code was to connect multiple levels in multilayered macros and determined by macro parameters. And it works at the home office but now with working from home, I am getting new issues. I have had to re-tune several programs that I wrote a few years back to deal with large datasets. Do more work on the database before bringing the samples down to the local session. Time to learn some more!

Sort Initialization failure