Avoiding the dreaded "Dataset is locked" ERROR

14 Likes

When writing to SAS datasets or creating views in permanent libraries shared by lots of users, one will regularly have the problem that a user has a file open for viewing or other purposes, which prevents SAS from opening the file for update; although SAS will always create a new (.lck) file when a complete rewrite is done, it still blocks if it recognizes that the original file is opened.

Because of the way the UNIX filesystem is structured, a directory entry does not contain file metadata; instead, it points to a so-called inode that contains metadata and the pointers to the actual data blocks.

The file handle that is created when a file is opened therefore points to the inode, not the directory entry. This allows directory entries to be manipulated (including removal) even when the file they point to is in use.

The following macro makes use of this by:

Either using the UNIX command rm with the -f option; -f is necessary so that the command will not issue a message and expect an answer if a file is in use. The macro determines the member type automatically and selects the proper filename extension. Depending on an additional parameter, it will use the %SYSEXEC macro statement to run the command (making the macro safe for use "inside" data or procedure step code), or use the FILENAME PIPE method to retrieve diagnostic information to the SAS log. Both methods require XCMD to be enabled.
Or use the FDELETE function, once again either from a data step or with %sysfunc.

%macro ds_delete(ds=,ext=Y,safe=N);

/* ds_delete, Version 0.9
   This macro is designed to use the UNIX operating system command rm to
   delete a dataset or view, in order to prevent a "Dataset is locked"
   message. It will automatically determine the physical path to the
   library and the proper filename extension. Depending on the presence
   of a second parameter, it will either use the FILENAME PIPE method to
   run the UNIX command, or the %SYSEXEC macro statement.
*/

%if %length(&ds.) lt 1
%then %do;
  /* issue a usage note */
  %put Usage:;
  %put %nrstr(%ds_delete)(ds=dataset|view[,ext=y|n][,safe=y|n]);
  %put ds=dataset|view .. name of the dataset or view, single level considered in WORK;
  %put ext= (optional) .. Y will use external commands;
  %put safe=(optional) .. Y will cause macro code to be used (otherwise data step);
  %return;
%end;

/* make our macro behave gracefully */
%local
  dsid
  type
  suffix
  pathname
  fname
  command
  rc
;

/* crude check for validity of dataset name */
%if %sysfunc(countw(&ds.,.)) gt 2
%then %do;
  %put Too many levels!;
  %return;
%end;

/* expand single-level dataset name */
%if %sysfunc(countw(&ds.,.)) lt 2 %then %let ds = WORK.&ds.;

/* check if the dataset/view actually exists */
%let dsid = %sysfunc(open(&ds.));
%if &dsid. = 0
%then %do;
  %put Dataset or view does not exist!;
  %return;
%end;

/* determine physical filename suffix */
%let type = %sysfunc(attrc(&dsid.,MTYPE));
%if &type. = VIEW
%then %let suffix = sas7bvew;
%else %let suffix = sas7bdat;

/* get physical name of the directory */
%let pathname = %sysfunc(pathname(%sysfunc(attrc(&dsid.,LIB))));

/* build the complete filename */
%let fname = %lowcase(%sysfunc(attrc(&dsid.,MEM))).&suffix.;

/* external command or FDELETE */
%if %upcase(&ext.) = Y
%then %do;

  /* build the command */
  %let command = rm -f &pathname./&fname.;

  /* run the command, either with %SYSEXEC or FILENAME PIPE */
  %if %upcase(&safe.) = Y
  %then %do;
    /* this makes the code "safe" for use anywhere in code,
       by using only macro statements */
    %sysexec &command.;
  %end;
  %else %do;
    /* retrieve all responses (including stderr) and write them to the log */
    data _null_;
    infile "&command. 2>&1" pipe;
    input;
    put _infile_;
    run;
  %end;

%end;
%else %do;

  %if %upcase(&safe.) = Y
  %then %do;
    /* macro statements */
    %let rc = %sysfunc(filename(fref,&pathname./&fname.));
    %if &rc = 0
    %then %do;
      %let rc = %sysfunc(fdelete(&fref));
      %put &=rc.;
      %let rc = %sysfunc(filename(fref));
    %end;
  %end;
  %else %do;
    /* data step */
    data _null_;
    length fref $8;
    rc = filename(fref,"&pathname/&fname.");
    rc = fdelete(fref);
    put rc=;
    rc = filename(fref);
    run;
  %end;

%end;

%let dsid = %sysfunc(close(&dsid.));

%mend;

If you ask yourself what will happen to a dataset file that is open at the time the macro is run: the directory entry will be removed, making the file "vanish" from a user's POV; but the actual data will be kept until the last file handle for the inode is closed. At this point the UNIX kernel realizes that the inode does no longer have a directory entry ("link") pointing to it, and removes it, clearing the allocation table entries along with it, and freeing the space.

In this particular case, it might be that a user runs out of space while writing the new dataset even if there is no more data than it originally contained; the "zombie" still occupies the space until it is closed.

Edit from 2021-05-22: copied the macro code from the SASGF 2021 paper/presentation, which can alternatively use FDELETE.

yabwon · ‎06-05-2020

Kurt,

Very cool macro! One observation, what about index file asociated with a dataset?

All the best

Bart

yabwon · ‎06-05-2020

One more thing, if I may share FCMP approach (unfortunately doesn't handle locked files)?

A function and some tests:

/* tested on windows and linux */

options cmplib = _NULL_;

proc FCMP
 outlib = work.f.p
;
  function kill(lbds_ $);
    length F $ 8 ds $ 32 lb $ 2048 lbds $ 41;

    F = cats("_",put(datetime(),hex7.));

    lbds = strip(lowcase(lbds_));
    if lbds = '_last_' then lbds = lowcase(symget("syslast"));

    ds = scan(lbds, -1, ".");
    if index(lbds, ".")>0 then lb = pathname(scan(lbds, 1, "."));
                          else lb = pathname(ifc(LIBREF("user"),"work","user"));
    /*put lbds lb ds;*/

    array ext[3] $ 12 (".sas7bdat", ".sas7bndx", ".sas7bvew");

    _RCV_ = 1;
    do i = 1 to dim(ext);
      _RC_ = filename(F, cats(lb, "/", ds, ext[i]), "disk");
      _RCV_ = _RCV_ * fdelete(F);
    end;
    _RC_ = filename(F);

    return(_RCV_);
  endsub;
run;

options cmplib = work.f;

data TEST1 TEST2(index=(x));
x = 17;
run;
data TEST3/view=TEST3;
set test1;
run;

data _null_;
  p = kill("WORK.TEST1");
  put p=;
run;

data _null_;
  p = kill("TEST2");
  put p=;
run;

data _null_;
  p = kill("WORK.TEST3");
  put p=;
run;

data test4;
x=42;
run;
data _null_;
  p = kill("_LAST_");
  put p=;
run;

options dlcreatedir;
libname user "%sysfunc(pathname(work))/user";

data test5;
x=42;
run;

data _null_;
  p = kill("test5");
  put p=;
run;

libname user clear;


data test6;
x=42;
run;

%put *%sysfunc(kill(test6))*;

All the best

Bart

Kurt_Bremser · ‎06-05-2020

All approaches using the built-in functions will fail with locked files; SAS simply does not want that to work, and a "force" option would only be feasible on UNIX systems.

Windows (because of its inferior file system architecture) cannot remove an open file (according to the documentation of del, the /f option is only good for read-only files).

Checking for an index might make sense; I will test what happens if one tries to create a dataset where an "orphaned" index exists which might be locked/opened.

Kurt_Bremser · ‎06-13-2020

I tested the "index issue". If an index file with the same dataset name exists, it is automatically deleted by SAS when the new dataset is created.

I used UE, created a copy of class in myfolders, created an index, and removed the class.sas7bdat with the file browser. When I re-ran the copy of class, the class.sas7bndx was deleted. So it is not absolutely necessary to delete the file, but I will add that to the macro; after all, I could not test the issue with an open dataset+index.

Patrick · ‎06-13-2020

Thanks for sharing.

Just one remark ("academic"): If using rm then there will be this very short time where the data set doesn't exist an if there is a process trying to access the data in exactly this time it would fail. If using cp instead to replace the file then in my understanding the table would always be available. ...now of course if it's about replacing more than one file - like a table and the index - then rm is eventually still the better option.

Kurt_Bremser · ‎06-13-2020

If timing is critical, and the dataset has to be available at all times, then a different approach should be made:

create the new dataset in a subdirectory of the original library (with a temporary assigned libname); this ensures that the dataset is on the same filesystem.
then, in one combined command, rm the old dataset file and mv the new dataset file to its final location. The delay between the remove and the move will be so short that a "missing" error in another process is virtually impossible.

yabwon · ‎06-13-2020

@Kurt_Bremser thanks for the follow up!

SASKiwi · ‎06-13-2020

@Kurt_Bremser - Very nice. Would be good if you added an example of how you use it in practice. I assume you use it only to prevent your SAS job getting the lock error prior to updating the dataset in question, not to help other jobs avoid the error on the same dataset?

Kurt_Bremser · ‎06-14-2020

Yes, the macro enables updating "production" datasets while endusers have access.

pchegoor · ‎05-22-2021

@Kurt_Bremser Can you please update the above macro you have with the revised one you showed at this year's SGF ?

Thanks.

Kurt_Bremser · ‎05-22-2021

@pchegoor new macro code incorporated into the article

@yabwon further testing revealed that the FDELETE function indeed ignores that a .sas7bdat file is a dataset and does not care about the lock.

Bill_Barbour · ‎08-27-2021

Nice job, Kurt! Thank you so much for creating this macro.

Avoiding the dreaded "Dataset is locked" ERROR

SAS Innovate 2025: Call for Content

Free course: Data Literacy Essentials

Get Started