Hi,
We have some existing ETL code which uses the lock statement to ensure another process doesn't get a lock on the target dataset and cause the ETL job to fail.
We're investigating changing from the BASE to SPDE engine for certain libraries.
The SPDE library doesn't like the lock statement and gives an error message, although I can't find where that is documented.
Does the SPDE engine require locking in the same way the BASE engine does? And, if so, how is this accomplished?
(If you're interested, here is a good paper on why you'd want to use the lock statement during ETL processing: www.lexjansen.com/pharmasug/2005/posters/po33.pdf)
Regards,
Scott
Scott, I'm not familiar with SPDE, but found the following note that may be of interest:
Any open on a file with the V9 or SPD Engines causes a lock to be applied to the file. Unlike the SPDS which allows for multiple opens. The reason to use a lock statement or command usually to prevent other programs from altering a dataset while it is in the middle of some multistep processing.
Hi, yes, as I understand it:
* BASE engine default locking level is record level locking, SPDE engine default locking level is member level locking
* For a SPDE dataset: "Concurrent access from multiple SAS sessions on a given data set - READ (INPUT open mode)" (see p. 7 of the SPDE PDF)
* So, any access (i.e. READ) to a SPDE dataset from another process will prevent any other process from writing to that dataset
I've also implemented UNOTE 18467, i.e. spdedebug="cache_disable".
However, there are times when you want to "manually" get an exclusive lock on a dataset. The aforementioned paper www.lexjansen.com/pharmasug/2005/posters/po33.pdf goes into all the details, so I won't repeat them here.
What I'm finding from my testing of SPDE is:
* All my current ETL jobs that implement locking now generate an error. Bummer that SAS didn't document this issue. A search of the 9.2 SPDE PDF on "lock" gets no hits. Little birdies, how about updating p. 7 of the SPDE doc?
* The lock statement allows you to wait if you can't get an exclusive lock on a dataset. So, instead of immediately failing, you can wait to see if you can get the lock later.
* Because I can't implement locking, I'm getting these errors in my ETL jobs:
ERROR: The file is already locked by another user.
ERROR: Member medprovstatusdim not replaced(DARENAME failed): Reason=Requested lock is not available for data set
ERROR: The file is already locked by another user.
The only thing that could be locking these SPDE datasets in our test environment is another test ETL job (in my current ETL stream). I'll investigate if there's a dependency issue with the scheduling that can be tweaked to fix this issue, but it would be nice if SAS would document all the differences between the BASE and SPDE engine to save customer's time.
If there is a way to get an exclusive lock on a SPDE dataset, please let me know.
Thanks,
Scott
Okay, I think I better understand the issue you have at hand. The issue is that you have a number of ETL processes that possible run in parrallel of eachother and want the to have the added intelligence to wait for eachother to release their locks on datasets. There are several options you can use to see what works for your specific situation.
The libname statement I beleive has option FILELOCKWAIT and I think related system option FILELOCKWAITMAX.
Whether the lock statement works with SPDE, I do not know. But apperances, based on your comments, lead me to believe that it is not compatible. Try resolving the automatic variable &syslckrc after attempting to perform a lock on your SPDE dataset.
Hi,
Thanks for the reply...
1) The SPDE libname statement does not support the FILELOCKWAIT option:
422 libname spde spde "c:\temp" filelockwait=300;
------------
22
ERROR: Libname SPDE is not assigned.
ERROR: Error in the LIBNAME statement.
ERROR 22-7: Invalid option name FILELOCKWAIT.
2) SPDE does not support the LOCK statement:
424 lock spde.test;
ERROR: The FILE LOCKING function is not supported by the SASSPDE engine.
3) I tried to roll my own locking routine using the OPEN function, but unlike the SCL OPEN function, the port to base SAS does not support Update ("U") open mode. See http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/a000148395.htm vs. http://support.sas.com/documentation/cdl/en/sclref/59578/HTML/default/viewer.htm#a000143447.htm.
I tried this in one SAS session:
data spde.test;
x=1;
rc=sleep(60);
run;
And this in another:
%macro lock(dataset);
%let dsid=%sysfunc(open(&dataset,i));
%put dsid=&dsid;
%let rc=%sysfunc(fetchobs(&dsid,1));
%put rc=&rc;
%if (&dsid gt 0) %then %let dsid=%sysfunc(close(&dsid));
%end;
%mend;
%lock(spde.test);
I also tried IN and IS mode, but I could never get the open function to fail and return dsid=0.
Of course, this code in the 2nd SAS session fails:
425 data spde.test;
426 x=1;
427 run;
ERROR: The file is already locked by another user.
ERROR: Requested lock is not available for data set.
NOTE: The SAS System stopped processing this step because of errors.
By contrast, using the BASE engine, the below code works:
Session #1:
data base.test;
x=1;
rc=sleep(30);
run;
Session #2:
%macro lock(member=,timeout=120,retry=2,action=LOCK,onfail=STOP);
%local starttime;
%let starttime = %sysfunc(datetime());
%if %UPCASE(&ACTION)=LOCK %then %do;
lock &member;
%if &SYSLCKRC > 0 %then %do %until(%sysevalf(%sysfunc(datetime()) > (&starttime + &timeout)));
%let SLEPT=%sysfunc(sleep(&retry));
%put NOTE: Trying to lock %UPCASE(&MEMBER)...;
lock &member;
%if &syslckrc LE 0 %then %do;
%return;
%end;
%end;
%if &syslckrc > 0 %then %do;
%if %upcase(&ONFAIL)=STOP %then %do;
%abort cancel;
%end;
%if %upcase(&ONFAIL)=SYSCC %then %do;
%let SYSCC=&SYSLCKRC;
%return;
%end;
%end;
%end;
%if %UPCASE(&ACTION)=CLEAR %then %do;
lock &member query;
%if &SYSLCKRC = %sysrc(_SWLKYOU) %then %do;
lock &member clear;
%end;
%end;
%mend lock;
%lock(member=base.test,action=lock);
data base.test;
do x=1 to 10;
output;
end;
run;
%lock(member=base.test,action=clear);
Anyone know if there is a way to duplicate this functionality using SPDE?
Thanks,
Scott
Scott,
I believe your testing method for the open function is flawed and the source of your continuing issues. The sleep for 60 miliseconds is not long enough to cause of failure.
14 data in.test;
15 call streaminit(12345);
16 do i=1 to 20;
17 nbr=rand('cauchy');
18 output;
19 end;
20 run;
NOTE: The dataset IN.TEST has 20 observations and 2 variables.
NOTE: DATAstatement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds
21
22 signon task1sascmd='/usr/local/SAS/SASFoundation/9.2/sas' wait=no;
NOTE: Remotesignon to TASK1 complete.
23 rsubmit task1;
NOTE: Backgroundremote submit to TASK1 in progress.
24
25 libname in spde '/home/friedegg';
NOTE: Libref INwas successfully assigned as follows:
Engine: SPDE
Physical Name: /home/friedegg/
26
27 data _null_;
28 call sleep(5,1);
29 run;
NOTE: DATAstatement used (Total process time):
real time 5.00 seconds
cpu time 0.00 seconds
30
31 data _null_;
32 do until(d>0);
33 d=open('in.test');
34 m=sysmsg();
35 put d= m=;
36 call sleep(1,1);
37 end;
38 run;
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=0 m=ERROR:Requested lock is not available for data set.
d=1 m=
NOTE: DATAstatement used (Total process time):
real time 17.00 seconds
cpu time 0.01 seconds
39
40 signoff task1;
NOTE: Remotesubmit to TASK1 commencing.
1 libname in spde '/home/friedegg';
NOTE: Libref INwas successfully assigned as follows:
Engine: SPDE
Physical Name: /home/friedegg/
2
3 data in.test;
4 set in.test;
5 sleep=sleep(1,1);
6 run;
NOTE: There were20 observations read from the data set IN.TEST.
NOTE: The dataset IN.TEST has 20 observations and 3 variables.
NOTE: DATAstatement used (Total process time):
real time 20.07 seconds
cpu time 0.01 seconds
NOTE: Remotesubmit to TASK1 complete.
NOTE: Remotesignoff from TASK1 commencing.
NOTE: SASInstitute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SASSystem used:
real time 22.06 seconds
cpu time 0.06 seconds
NOTE: Remotesignoff from TASK1 complete.
41
42 proc datasets lib=in nolist nodetails;
43 delete test;
44 run;
NOTE: DeletingIN.TEST (memtype=DATA).
NOTE: PROCEDUREDATASETS used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
Sometimes submitted code is hard to understand from the log when using multiple sessions, here is code is submitted.
data in.test;
call streaminit(12345);
do i=1 to 20;
nbr=rand('cauchy');
output;
end;
run;
signon task1 sascmd='/usr/local/SAS/SASFoundation/9.2/sas' wait=no;
rsubmit task1;
libname in spde '/home/friedegg';
data in.test;
set in.test;
sleep=sleep(1,1);
run;
endrsubmit;
libnamein spde '/home/friedegg';
data _null_;
call sleep(5,1);
run;
data _null_;
do until(d>0);
d=open('in.test');
m=sysmsg();
put d= m=;
call sleep(1,1);
end;
run;
signoff task1;
proc datasets lib=in nolist nodetails;
delete test;
run;
Hi Fried,
First of all, thanks a lot for your posts on this subject. Much appreciated.
Second, I'm running on Windows, which has a default increment of 1 second for the SLEEP() function, whereas Unix has 1 millisecond.
I posted this on SAS-L to get a wider audience, and have this working now. Rather than re-post that long message here, here is the link to my SAS-L posting: http://www.listserv.uga.edu/cgi-bin/wa?A2=ind1110a&L=sas-l&D=0&X=746AD217485B47084E&P=26500
Regards,
Scott
As I understand it, there is not specific requirement to be able to call/update tables at the same time.
I consider it best practice to have your scheduling taking care of dependencies, it's much more obvious than have some hidden program logic which is doing locking. And there is a risk that there will be situations were your locking could get out of control...
Boring answer perhaps...
Linus
Hi Linus,
Are you saying you disagree with the points raised in http://www.lexjansen.com/pharmasug/2005/posters/po33.pdf?
Scott
Disagree is a strong word...
it seemed in your description that only processes that might lock table were other ETL-processes. In this case, a proper scheduling will be enough to avoid locking.
I can see that there can be other cases when on-line ad-hoc jobs are locking tables i.e. in Enterprise Guide. These kind of problems might need some other techniques, such as shutting down workspace servers, kill servers processes etc. These situations are probably easier handled if the data store resides in a data server environment, such as SPD Server, or other external RDBMS.
/Linus
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.