I'm having issues where if I try to run PROC SQL or PROC SORT involving a larger medical claims dataset (~15 GB). I work off a VPN and need to write my data to a remote file storage system. My code submits without error in the log, but will not update in my file libraries.
For example, the code I have been using is
proc sort data = lib.large_data;
by ID Admit_date;
run;
and
proc sql;
create table lib.merge as
select *
from lib.small_data x
left join lib.large_data y
on x.ID = y.ID;
quit;
a file titled lib.large_data.sas7bdat.lck will be created in my folder, but will not update. What I mean by this is that the file size will initialize at 0-168KB and will not grow any larger, no matter how long I let the code run. I understand that SAS creates .lck files to prevent overwrites, but I'm not sure why this is happening.
What is also strange to me is that the code I'm showing has worked previously. I started having this issue in the past week, but I have been using this exact code to do merges for a few months.
If I use the PROC SQL statement:
create table lib.merge as
select *
from lib.large data x
where x.ID in (select y.ID from lib.small_data y);
quit;
then that allows me at least select correct records from the large dataset, but it's not ideal and I am still not able to sort the resulting dataset (~300 MB). I can run other PROC statements on these datasets just fine. I.e, PROC CONTENTS, PROC MEANS, PROC FREQ, all work, but I can't sort or merge.
Does anyone have any idea why this is happening or how to resolve this issue? Any advice would be appreciated!
The lck file is created as soon as the PROC SORT starts, but then the procedure will read the source dataset and populate the utility file in WORK; during this, the lck file will stay unchanged. What you experience is a consequence of reading a dataset over a bad network connection.
Copy all datasets to a local library first, then do your work, and after that copy the final result back to the network storage.
Reading and writing large data files over VPN to remote network storage is a recipe for very poor performance. I assume SAS is running on your local PC? How long was it taking when it was working?
I suggest running a test:
proc sql;
create table work.merge as
select *
from lib.small_data x
left join lib.large_data y
on x.ID = y.ID;
quit;
This will test how much faster it is writing to your local hard drive. If it is a lot faster and successful then it will prove that the VPN data connection is the problem.
Also this test should be even faster:
data work.small_data;
set lib.small_data;
run;
proc sql;
create table work.merge as
select *
from work.small_data x
left join lib.large_data y
on x.ID = y.ID;
quit;
The lck file is created as soon as the PROC SORT starts, but then the procedure will read the source dataset and populate the utility file in WORK; during this, the lck file will stay unchanged. What you experience is a consequence of reading a dataset over a bad network connection.
Copy all datasets to a local library first, then do your work, and after that copy the final result back to the network storage.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.