DATA Step, Macro, Functions and more

Do I need to lock a SPDE dataset during ETL processing?

Reply
Super Contributor
Posts: 377

Do I need to lock a SPDE dataset during ETL processing?

Hi,

We have some existing ETL code which uses the lock statement to ensure another process doesn't get a lock on the target dataset and cause the ETL job to fail.

We're investigating changing from the BASE to SPDE engine for certain libraries.

The SPDE library doesn't like the lock statement and gives an error message, although I can't find where that is documented.

Does the SPDE engine require locking in the same way the BASE engine does?  And, if so, how is this accomplished?

(If you're interested, here is a good paper on why you'd want to use the lock statement during ETL processing:  www.lexjansen.com/pharmasug/2005/posters/po33.pdf)

Regards,

Scott

PROC Star
Posts: 7,366

Do I need to lock a SPDE dataset during ETL processing?

Scott,  I'm not familiar with SPDE, but found the following note that may be of interest:

http://support.sas.com/kb/18/467.html

Trusted Advisor
Posts: 1,300

Do I need to lock a SPDE dataset during ETL processing?

Any open on a file with the V9 or SPD Engines causes a lock to be applied to the file.  Unlike the SPDS which allows for multiple opens.  The reason to use a lock statement or command usually to prevent other programs from altering a dataset while it is in the middle of some multistep processing.

Super Contributor
Posts: 377

Do I need to lock a SPDE dataset during ETL processing?

Hi, yes, as I understand it:

* BASE engine default locking level is record level locking, SPDE engine default locking level is member level locking

* For a SPDE dataset:  "Concurrent access from multiple SAS sessions on a given data set - READ (INPUT open mode)"  (see p. 7 of the SPDE PDF)

* So, any access (i.e. READ) to a SPDE dataset from another process will prevent any other process from writing to that dataset

I've also implemented UNOTE 18467, i.e. spdedebug="cache_disable".

However, there are times when you want to "manually" get an exclusive lock on a dataset.  The aforementioned paper www.lexjansen.com/pharmasug/2005/posters/po33.pdf goes into all the details, so I won't repeat them here.

What I'm finding from my testing of SPDE is:

* All my current ETL jobs that implement locking now generate an error.  Bummer that SAS didn't document this issue.  A search of the 9.2 SPDE PDF on "lock" gets no hits.  Little birdies, how about updating p. 7 of the SPDE doc?

* The lock statement allows you to wait if you can't get an exclusive lock on a dataset.  So, instead of immediately failing, you can wait to see if you can get the lock later.

* Because I can't implement locking, I'm getting these errors in my ETL jobs:

ERROR: The file is already locked by another user.

ERROR: Member medprovstatusdim not replaced(DARENAME failed): Reason=Requested lock is not available for data set

ERROR: The file is already locked by another user.

The only thing that could be locking these SPDE datasets in our test environment is another test ETL job (in my current ETL stream).  I'll investigate if there's a dependency issue with the scheduling that can be tweaked to fix this issue, but it would be nice if SAS would document all the differences between the BASE and SPDE engine to save customer's time.

If there is a way to get an exclusive lock on a SPDE dataset, please let me know.

Thanks,

Scott

Trusted Advisor
Posts: 1,300

Do I need to lock a SPDE dataset during ETL processing?

Okay, I think I better understand the issue you have at hand.  The issue is that you have a number of ETL processes that possible run in parrallel of eachother and want the to have the added intelligence to wait for eachother to release their locks on datasets.  There are several options you can use to see what works for your specific situation.

The libname statement I beleive has option FILELOCKWAIT and I think related system option FILELOCKWAITMAX.

Whether the lock statement works with SPDE, I do not know.  But apperances, based on your comments, lead me to believe that it is not compatible.  Try resolving the automatic variable &syslckrc after attempting to perform a lock on your SPDE dataset.

Super Contributor
Posts: 377

Re: Do I need to lock a SPDE dataset during ETL processing?

Hi,

Thanks for the reply...

1)  The SPDE libname statement does not support the FILELOCKWAIT option:

422  libname spde spde "c:\temp" filelockwait=300;

                                 ------------

                                 22

ERROR: Libname SPDE is not assigned.

ERROR: Error in the LIBNAME statement.

ERROR 22-7: Invalid option name FILELOCKWAIT.

2) SPDE does not support the LOCK statement:

424  lock spde.test;

ERROR: The FILE LOCKING function is not supported by the SASSPDE engine.

3) I tried to roll my own locking routine using the OPEN function, but unlike the SCL OPEN function, the port to base SAS does not support Update ("U") open mode.  See  http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/a000148395.htm vs. http://support.sas.com/documentation/cdl/en/sclref/59578/HTML/default/viewer.htm#a000143447.htm.

I tried this in one SAS session:

data spde.test;

  x=1;

  rc=sleep(60);

run;

And this in another:

%macro lock(dataset);

%let dsid=%sysfunc(open(&dataset,i));

%put dsid=&dsid;

%let rc=%sysfunc(fetchobs(&dsid,1));

%put rc=&rc;

%if (&dsid gt 0) %then %let dsid=%sysfunc(close(&dsid));

%end;

%mend;

%lock(spde.test);

I also tried IN and IS mode, but I could never get the open function to fail and return dsid=0.

Of course, this code in the 2nd SAS session fails:

425  data spde.test;

426    x=1;

427  run;

ERROR: The file is already locked by another user.

ERROR: Requested lock is not available for data set.

NOTE: The SAS System stopped processing this step because of errors.

By contrast, using the BASE engine, the below code works:

Session #1:

data base.test;

  x=1;

  rc=sleep(30);

run;

Session #2:

%macro lock(member=,timeout=120,retry=2,action=LOCK,onfail=STOP);

  %local starttime;

  %let starttime = %sysfunc(datetime());

  %if %UPCASE(&ACTION)=LOCK %then %do;

    lock &member;

    %if &SYSLCKRC > 0 %then %do %until(%sysevalf(%sysfunc(datetime()) > (&starttime + &timeout)));

      %let SLEPT=%sysfunc(sleep(&retry));

      %put NOTE: Trying to lock %UPCASE(&MEMBER)...;

      lock &member;

      %if &syslckrc LE 0 %then %do;

        %return;

      %end;

    %end;

    %if &syslckrc > 0 %then %do;

      %if %upcase(&ONFAIL)=STOP %then %do;

        %abort cancel;

      %end;

      %if %upcase(&ONFAIL)=SYSCC %then %do;

        %let SYSCC=&SYSLCKRC;

        %return;

      %end;

    %end;

  %end;

  %if %UPCASE(&ACTION)=CLEAR %then %do;

    lock &member query;

    %if &SYSLCKRC = %sysrc(_SWLKYOU) %then %do;

      lock &member clear;

    %end;

  %end;

%mend lock;

%lock(member=base.test,action=lock);

data base.test;

  do x=1 to 10;

    output;

  end;

run;

%lock(member=base.test,action=clear);

Anyone know if there is a way to duplicate this functionality using SPDE?

Thanks,

Scott

Trusted Advisor
Posts: 1,300

Do I need to lock a SPDE dataset during ETL processing?

Scott,

I believe your testing method for the open function is flawed and the source of your continuing issues.  The sleep for 60 miliseconds is not long enough to cause of failure.

14         data in.test;

15          call streaminit(12345);

16          do i=1 to 20;

17           nbr=rand('cauchy');

18           output;

19          end;

20         run;

NOTE: The dataset IN.TEST has 20 observations and 2 variables.

NOTE: DATAstatement used (Total process time):

      real time           0.03 seconds

      cpu time            0.01 seconds

     

21        

22         signon task1sascmd='/usr/local/SAS/SASFoundation/9.2/sas' wait=no;

NOTE: Remotesignon to TASK1 complete.

23         rsubmit task1;

NOTE: Backgroundremote submit to TASK1 in progress.

24        

25         libname in spde '/home/friedegg';

NOTE: Libref INwas successfully assigned as follows:

      Engine:        SPDE

      Physical Name: /home/friedegg/

26        

27          data _null_;

28           call sleep(5,1);

29          run;

NOTE: DATAstatement used (Total process time):

      real time           5.00 seconds

      cpu time            0.00 seconds

     

30        

31          data _null_;

32           do until(d>0);

33            d=open('in.test');

34            m=sysmsg();

35            put d= m=;

36            call sleep(1,1);

37           end;

38          run;

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=0 m=ERROR:Requested lock is not available for data set.

d=1 m=

NOTE: DATAstatement used (Total process time):

      real time           17.00 seconds

      cpu time            0.01 seconds

     

39        

40         signoff task1;

NOTE: Remotesubmit to TASK1 commencing.

1     libname in spde '/home/friedegg';

NOTE: Libref INwas successfully assigned as follows:

      Engine:        SPDE

      Physical Name: /home/friedegg/

2

3     data in.test;

4      set in.test;

5      sleep=sleep(1,1);

6     run;

NOTE: There were20 observations read from the data set IN.TEST.

NOTE: The dataset IN.TEST has 20 observations and 3 variables.

NOTE: DATAstatement used (Total process time):

      real time           20.07 seconds

      cpu time            0.01 seconds

NOTE: Remotesubmit to TASK1 complete.

NOTE: Remotesignoff from TASK1 commencing.

NOTE: SASInstitute Inc., SAS Campus Drive, Cary, NC USA 27513-2414

NOTE: The SASSystem used:

      real time           22.06 seconds

      cpu time            0.06 seconds

     

NOTE: Remotesignoff from TASK1 complete.

41        

42         proc datasets lib=in nolist nodetails;

43          delete test;

44         run;

NOTE: DeletingIN.TEST (memtype=DATA).

NOTE: PROCEDUREDATASETS used (Total process time):

      real time           0.02 seconds

      cpu time            0.01 seconds

Trusted Advisor
Posts: 1,300

Do I need to lock a SPDE dataset during ETL processing?

Sometimes submitted code is hard to understand from the log when using multiple sessions, here is code is submitted.

data in.test;

call streaminit(12345);

do i=1 to 20;

  nbr=rand('cauchy');

  output;

end;

run;

signon task1 sascmd='/usr/local/SAS/SASFoundation/9.2/sas' wait=no;

rsubmit task1;

libname in spde '/home/friedegg';

data in.test;

  set in.test;

  sleep=sleep(1,1);

run;

endrsubmit;

libnamein spde '/home/friedegg';

data _null_;

  call sleep(5,1);

run;

data _null_;

  do until(d>0);

   d=open('in.test');

   m=sysmsg();

   put d= m=;

   call sleep(1,1);

  end;

run;

signoff task1;

proc datasets lib=in nolist nodetails;

delete test;

run;

Super Contributor
Posts: 377

Re: Do I need to lock a SPDE dataset during ETL processing?

Hi Fried,

First of all, thanks a lot for your posts on this subject.  Much appreciated.

Second, I'm running on Windows, which has a default increment of 1 second for the SLEEP() function, whereas Unix has 1 millisecond.

I posted this on SAS-L to get a wider audience, and have this working now.  Rather than re-post that long message here, here is the link to my SAS-L posting:  http://www.listserv.uga.edu/cgi-bin/wa?A2=ind1110a&L=sas-l&D=0&X=746AD217485B47084E&P=26500

Regards,

Scott

Super User
Posts: 5,260

Do I need to lock a SPDE dataset during ETL processing?

As I understand it, there is not specific requirement to be able to call/update tables at the same time.

I consider it best practice to have your scheduling taking care of dependencies, it's much more obvious than have some hidden program logic which is doing locking. And there is a risk that there will be situations were your locking could get out of control... Smiley Wink

Boring answer perhaps...

Linus

Data never sleeps
Super Contributor
Posts: 377

Re: Do I need to lock a SPDE dataset during ETL processing?

Hi Linus,

Are you saying you disagree with the points raised in http://www.lexjansen.com/pharmasug/2005/posters/po33.pdf?

Scott

Super User
Posts: 5,260

Do I need to lock a SPDE dataset during ETL processing?

Disagree is a strong word...

it seemed in your description that only processes that might lock table were other ETL-processes. In this case, a proper scheduling will be enough to avoid locking.

I can see that there can be other cases when on-line ad-hoc jobs are locking tables i.e. in Enterprise Guide. These kind of problems might need some other techniques, such as shutting down workspace servers, kill servers processes etc. These situations are probably easier handled if the data store resides in a data server environment, such as SPD Server, or other external RDBMS.

/Linus

Data never sleeps
Ask a Question
Discussion stats
  • 11 replies
  • 1300 views
  • 2 likes
  • 4 in conversation