Solved: Re: slow renaming sas7bdat.lck to sas7bdat

econ · Posted 12-30-2015 01:34 PM

i'm seeing some really weird performance on our server (linux, redhat). work is ssd.

i can watch the table build to it's ultimate size in samba and then it will hang there for a minute or so before finally renaming the table.

what is even weirder is it only happens when i'm overwriting a table using the same name, i.e. data test; set test; run;

if i choose a different name there is no lag.

any ideas, my admin can't find anything wrong? example from log is pasted below.

data test;

do id = 1 to 100000000;

format blah $256.;

do blah = 'blah blah blah';

output; end; end;

run;

NOTE: The data set WORK.TEST has 100000000 observations and 2 variables.

NOTE: DATA statement used (Total process time):

real time 27.85 seconds

cpu time 27.38 seconds

data test; set test; run;

NOTE: There were 1000

00000 observations read from the data set WORK.TEST.

NOTE: The data set WORK.TEST has 100000000 observations and 2 variables.

NOTE: DATA statement used (Total process time):

real time 2:07.31

cpu time 47.67 seconds

data test1; set test; run;

NOTE: There were 100000000 observations read from the data set WORK.TEST.

NOTE: The data set WORK.TEST1 has 100000000 observations and 2 variables.

NOTE: DATA statement used (Total process time):

real time 33.33 seconds

cpu time 32.86 seconds

econ · Posted 01-26-2016 04:31 PM

SAS and RH tech support finally determined the issue, and this is impacting anyone working with large datasets.

It is a bug with the close() operation and will require a complete re-write of SAS's locking behavoir. So no fix for it right now.

Thw workaround is something like this.

options symbolgen mprint mlogic;

libname scott '/ma/scott/';

%let work = %sysfunc(getoption(work));

data test;

do id = 1 to 100000000;

format blah $256.;

do blah = 'blah blah blah';

output; end; end;

run;

data test1; set test; run;

data test; set test; run;

x rm "&work./test.sas7bdat";

data test; set test1; run;

View solution in original post

ballardw · Posted 12-30-2015 01:37 PM

Likely what's happening is 1) making a copy of the data set so that the original isn't lost if you have a coding error and then 2) copying that new set back to the original. With enough the records the system lag for copying files is noteable.

econ · Posted 12-30-2015 01:49 PM

no, in the example i posted in the second data step, the table test.sas7bdat is intact in work. the table test.sas7bdat.lck is created to it's ultimate size. then the data step hangs for about a minute when renaming test.sas7bdat.lck to test.sas7bdat.

and as an example if i was to stop the data step before it finished test.sas7bdat.lck would be deleted and the original test.sas7bdat would remain.

Haikuo · Posted 12-30-2015 02:16 PM

@econ wrote:

no, in the example i posted in the second data step, the table test.sas7bdat is intact in work. the table test.sas7bdat.lck is created to it's ultimate size. then the data step hangs for about a minute when renaming test.sas7bdat.lck to test.sas7bdat.

and as an example if i was to stop the data step before it finished test.sas7bdat.lck would be deleted and the original test.sas7bdat would remain.

What you are saying does NOT contradicts the comments from @ballardw. It is not the best practice to use the same table names for both input and output in the same data step, however, when you choose to do so, SAS will lock(.lck, readonly?) the original, just in case your process is adrupted unexpectedly. When your buidling process is complete, SAS will take some time to verify it (SAS skip this step when building a different name table), if SAS feels comfortable after verifying, that is when .lck is deleted, and whole process is finally over.

In the case your process is NOT a success, the target table failed completely or partially, SAS will delete the failed, and rename .lck back to original.

econ · Posted 12-30-2015 02:38 PM

not best practice? maybe if you have unlimited disk space.

is there an option to disable this "check" SAS is doing?

SAS isn't puting a .lck on the original. it is creating a table with .lck as the extension. when it finishes the data step it renames the .lck file to original.

and if it failed it wouldn't rename .lck to the original. the original still exists. did you mean SAS would delete the .lck file?

run this code and watch what is going on in work.

data test;

do id = 1 to 100000000;

format blah $256.;

do blah = 'blah blah blah';

output; end; end;

run;

data test; set test; run;

data test1; set test; run;

jakarman · Posted 12-31-2015 09:55 AM

In either case SAS is always creating a new dataset wiht the lck extension. (Windows/Unix)

That one belongs to your current sas process and is moslty being locked at the OS level. That lock (semaphore) cannot be easily seen (root required). Only by some shared access you notice the behavior. On the old dataset there must also set some locking. Some older/personal systems do not uses these kind of OS lock settings on mainframes not to be circumvented.

At the end end of the creating process the freeing / deleting renaming actions can go ahaed. I would expect proper signalhandling and that would take just seconds.The possible delays:

- The freeing deleting of the old file can take some time.
- The OS file-systems for diretory acces can taken time (High number of files)
- The semaphore/locking of the OS is very slow.
The last option is also the most bad one. Bad programming practices at SAS, not implementing signal processing but timed delays in the expectation that the delay will solve possible issues. Than is hitting 60 seconds verye extreme..

If you are are working wiht EGuide you can see some of those timing/delay issues. Every job when ready is wating some time before results are gathered and send back.

---->-- ja karman --<-----

econ · Posted 12-31-2015 10:40 AM

thanks jk. i have a track open with tech support, but they are out until after the new year. if i learn anything i will report back.

the reason i even started looking into this is i have a peice of code which used to run in 20 minutes which is now taking 40 minutes. this is not acceptable. maybe something got corrupted in the last patch or something.

i found this link on filelocks and tried this option but it didn't help.

options filelocks=none;

https://support.sas.com/documentation/cdl/en/hostunx/61879/HTML/default/viewer.htm#filelock.htm

jakarman · Posted 12-31-2015 11:04 AM

There must be a change somewhere when the process changed behavior from 20 to 40 minutes.
What changed and as even important what did not and what do you not know.

- Is there a new version wiht SAS?

- Are machines being changed?

- had the file-system beeen changed (storage admins)?

- Is the load of data sizing/complexity changed?

- Are there netork traffic options that have been changed (routers/firewalls network segmentation)?
Just jumping arround is searching a needle in a haystack

---->-- ja karman --<-----

econ · Posted 12-31-2015 11:27 AM

running 9.04 TS1M2, but yes we did a pretty significant upgrade about 6 months ago. honestly, i wasn't really paying attention to performance until recently.

we got visual analytics/ statistics / imstat and a 8 node grid. our analytic server is the same, but i suspect something with this grid is corrupted. we have had a lot of weird issues during and after this implementation.

the other thing i can think of is we tried to increase the disk space of work by adding some space from the san. work is all ssd. the san is emc vnx 5800 which is also really fast. our thought was if work got filled up the san would then kick in. what we eventually noticed was it wasn't working this way. work was being randomly asigned to ssd or the san. so we changed it back to all ssd.

i know, and my admin doesn't know where to look, and he's a really smart guy.

econ · Posted 02-17-2016 08:55 PM

jk,

finally got the answer. it is a deep rooted flaw in how SAS clears the file cache. whatever that means. and they don't plan to fix it. probably so they can sell you something more expensive.

this blog is total bs. it has nothing to do with memory.

http://blogs.sas.com/content/sgf/2016/02/15/when-can-too-much-memory-hurt-sas/

usage note

http://support.sas.com/kb/57/630.html

this behavior can be seen on something as small as a 1GB table. totally misleading.

jakarman · Posted 02-18-2016 04:21 PM

Reading the root cause it is totally acceptable as a flaw caused by OS improvements (Redhat IBM microsoft) out of control of SAS.
Improvements sometimes has backfiring effects as in this case.

Does me remember on a topic / discusion wioht Paul Dorfman. The statement: "A tape unit is faster than a hard disk drive".
Doe sounds weird and both of us are convinced by experience it is true. Why?
- The condition is a serial approach for both as needs with backups.
- The tape is optimized for that without any locks enques or whatever caching being optimized for that.
- The Dasd unit shoudl be able to be multi-user. Many of those coming for possible requests and prioritizing for those. As that is needing the introdcution for enqueing locking etc complexity in routines. Adding a log of cpu memory resources for the same action as tape. It is that overhead thats is the iomportant difference.

Going for the caching with write at (OS) the gain is having the signal writing is finished (just as in old days moving physcial actuator spinnign - ramac). But the real fact isit has to be done yet. Do not unplug your machine in har way it will lost what is already done.... Not a real problem as whn it happens you have to go back in some time and restart.

Just make your coding habits to folllow avoidance of these kind of effects. When you don not understand your system your system will outsmart you. Mostly not in a positive result.

---->-- ja karman --<-----

Tom · Posted 12-31-2015 11:49 AM

Check if you have automatic backup/audit on the disk used for the dataset?

Sounds like the OS is waiting for the backup of the original 'test.sas7bdat' to finish before allowing SAS to rename the new file to 'test.sas7bdat'.

econ · Posted 12-31-2015 12:38 PM

is there a unix command to determine this? otherwise, it's probably over my head and i'll need to ask my admin when he gets back on monday. thanks for the idea Tom.

jakarman · Posted 12-31-2015 01:00 PM

That is a difficult start a 6 months old installation with doubts. It should have been part of the roll-out.

As you are on the analytics side. Let us see what can be done now. That is verifying the setup the grid one.
There are nice papers on setting up machines IO with the name Margaret Crevar find those ....
In a grid all data is on shared file systems. Each machine should know what the others are doing. LSF I spreading the load. That is a solution for the lack of capacity of the iron.
Virtualized machines are a solution for overpowered tools going to share hardware resources.

Verify there is nothing wrong with the shared file system setup. As that one is critical.

Sas work is better not on San. No backup or whatever is needed. It is more like the paging file system. You won't get that idea in the head of Unix admins.
Keep the San as fast as possible on a non shared file system.

When you are convinced those are correct. Some like a backup/ write wait is also possible. Writes are often delayed for performance reasons but must be validated at some checkpoint. Renaming data on the same name as previous data can be a checkpoint where complete new is not.

---->-- ja karman --<-----

econ · Posted 01-26-2016 04:31 PM

SAS and RH tech support finally determined the issue, and this is impacting anyone working with large datasets.

It is a bug with the close() operation and will require a complete re-write of SAS's locking behavoir. So no fix for it right now.

Thw workaround is something like this.

options symbolgen mprint mlogic;

libname scott '/ma/scott/';

%let work = %sysfunc(getoption(work));

data test;

do id = 1 to 100000000;

format blah $256.;

do blah = 'blah blah blah';

output; end; end;

run;

data test1; set test; run;

data test; set test; run;

x rm "&work./test.sas7bdat";

data test; set test1; run;