Hi All, I'm getting the following error in the screenshot below. I have read that this is because I have low disk space on my desktop and the recommendations seem to be to go and delete everything in the "SAS Temporary Files" folder on the desktop to free up space. I've included screenshots below of what this looks like for me. I store everything I want on an external hard drive and not on my desktop.
Is it okay to go ahead and delete these files on my desktop or will this somehow corrupt my SAS account or link to files on the external hard drive and corrupt them somehow? Thanks!
Please post the whole log of this SQL step, including the code. Use the </> button to post the log by copy/pasting, do NOT post pictures.
You have a CPU time of just below 5 minutes, but a real time of close to 6 hours(!).
That points to either catastrophically bad usage of SQL, or a similar catastrophic setup of your WORK location.
Here is the log and then followed below by the code. I'm working with very large datasets (30GB each) and just trying to merge them based on unique identifiers so I figured that's why it takes a long time to run.
Log:
38 libname proclm 'J:\SLAP Demographics\Proc\Claim'; NOTE: Libref PROCLM was successfully assigned as follows: Engine: V9 Physical Name: J:\SLAP Demographics\Proc\Claim 39 *getting the claim data for the procedures so we have the dates; 40 %macro prmerge(lib1=, lib2=, data1=, data2=); 41 %do i = 15 %to 18; 42 43 proc sql; 44 create table &lib2..&data1._&i._&data2. as 45 select *, (proc_&i._&data2._ids.clm_trans_id = &data1._&i..clm_trans_id) 45 ! as indic, 46 (proc_&i._&data2._ids.clm_trans_id ~='') as FROMproc_ids, 47 (&data1._&i..clm_trans_id ~='') as FROM&data1. 48 from proc.proc_&i._&data2._ids full join &lib1._&i..&data1._&i. on 48 ! proc_&i._&data2._ids.clm_trans_id=&data1._&i..clm_trans_id; 49 quit; 50 51 data &lib2..&data1._&i._&data2.; 52 set &lib2..&data1._&i._&data2.; 53 if FROMproc_ids = 1 and FROM&data1. = 1; 54 run; 55 56 %end; 57 %mend prmerge; 58 %prmerge(lib1=diag, lib2=proclm, data1=claim, data2=out); WARNING: Variable clm_trans_id already exists on file PROCLM.CLAIM_15_OUT. NOTE: Table PROCLM.CLAIM_15_OUT created, with 20821258 rows and 105 columns. NOTE: PROCEDURE SQL used (Total process time): real time 2:08:40.25 cpu time 6:07.13 NOTE: There were 20821258 observations read from the data set PROCLM.CLAIM_15_OUT. NOTE: The data set PROCLM.CLAIM_15_OUT has 3551 observations and 105 variables. NOTE: DATA statement used (Total process time): real time 21:16.96 cpu time 40.84 seconds WARNING: Variable clm_trans_id already exists on file PROCLM.CLAIM_16_OUT. ERROR: Insufficient space in file WORK.'SASTMP-000000012'n.UTILITY. ERROR: File WORK.'SASTMP-000000012'n.UTILITY is damaged. I/O processing did not complete. NOTE: Error was encountered during utility-file processing. You may be able to execute the SQL statement successfully if you allocate more space to the WORK library. ERROR: There is not enough WORK disk space to store the results of an internal sorting phase. ERROR: An error has occurred. NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE SQL used (Total process time): real time 6:56:52.95 cpu time 4:40.64
Code:
libname proclm 'J:\SLAP Demographics\Proc\Claim';
*getting the claim data for the procedures so we have the dates;
%macro prmerge(lib1=, lib2=, data1=, data2=);
%do i = 15 %to 18;
proc sql;
create table &lib2..&data1._&i._&data2. as
select *, (proc_&i._&data2._ids.clm_trans_id = &data1._&i..clm_trans_id) as indic,
(proc_&i._&data2._ids.clm_trans_id ~='') as FROMproc_ids,
(&data1._&i..clm_trans_id ~='') as FROM&data1.
from proc.proc_&i._&data2._ids full join &lib1._&i..&data1._&i. on proc_&i._&data2._ids.clm_trans_id=&data1._&i..clm_trans_id;
quit;
data &lib2..&data1._&i._&data2.;
set &lib2..&data1._&i._&data2.;
if FROMproc_ids = 1 and FROM&data1. = 1;
run;
%end;
%mend prmerge;
%prmerge(lib1=diag, lib2=proclm, data1=claim, data2=out);
So it looks like you are doing a full join first, only to then select for what would be the result of an inner join.
What is the relationship of the observations in the tables with regards to the key variable?
one-to-one, one-to-many, many-to-one or many-to-many?
Your disk space problem is the result of building a monster utiliy file for the SQL join; depending on the relationship, you might be able to use a non-SQL method.
You are spot on, I'm merging and then only selecting the data that had id matches with the first dataset. It's a one-to-many merge.
The only ways I know how to merge are either proc sort -> data merge and the proc sql method (which seems to be faster and stronger than data merge since no sorting has to be done).
If there are no other solutions from a code perspective, should I clear out the work folder or is that not recommended?
My code seemed to be working fine for a few months for other projects (albeit slow like you pointed out, which is okay with me as long as it works) but this error has just recently popped up.
The fact that you don't see an explicit sort does not mean that SQL won't have to do it. In fact it has been my experience over the years that a separate sort and following data step MERGE outperforms SQL, some times by several orders of magnitude.
So try this:
proc sort
data=proc.proc_&i._&data2._ids
out=data2
;
by clm_trans_id;
run;
proc sort
data=&lib1._&i..&data1._&i.
out=data1
;
by clm_trans_id;
run;
data &lib2..&data1._&i._&data2.;
merge
data2 (in=in2)
data1 (in=in1)
;
by clm_trans_id;
if in1 and in2;
run;
Silly question:
You have this data step
data &lib2..&data1._&i._&data2.;
set &lib2..&data1._&i._&data2.;
if FROMproc_ids = 1 and FROM&data1. = 1;
run;
to subset data. Why don't you include this as a WHERE clause in selecting data in the Proc SQL?
Also, you want to be very careful of habitual use of the
Data have;
set have;
programming structure. Since the result completely replaces the data source you may find that you have issues if more complex program logic doesn't quite do what you want. With run times as long as this I suspect you may not want to rerun code to rebuild the input set very often.
Consider creating indexes on the source table variables used in the ON clause. This way SQL could optimize the join and not need as much utility storage to complete.
Hope this helps,
- Jan.
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
Learn how to install the SAS Viya CLI and a few commands you may find useful in this video by SAS’ Darrell Barton.
Find more tutorials on the SAS Users YouTube channel.