BookmarkSubscribeRSS Feed
PeterBr
Obsidian | Level 7

Hi All, I'm getting the following error in the screenshot below. I have read that this is because I have low disk space on my desktop and the recommendations seem to be to go and delete everything in the "SAS Temporary Files" folder on the desktop to free up space. I've included screenshots below of what this looks like for me. I store everything I want on an external hard drive and not on my desktop.

 

Is it okay to go ahead and delete these files on my desktop or will this somehow corrupt my SAS account or link to files on the external hard drive and corrupt them somehow? Thanks!

 

PeterBr_0-1600389618345.png

PeterBr_1-1600389726341.png

PeterBr_2-1600389740833.png

 

 

9 REPLIES 9
Kurt_Bremser
Super User

Please post the whole log of this SQL step, including the code. Use the </> button to post the log by copy/pasting, do NOT post pictures.

You have a CPU time of just below 5 minutes, but a real time of close to 6 hours(!).

That points to either catastrophically bad usage of SQL, or a similar catastrophic setup of your WORK location.

PeterBr
Obsidian | Level 7

Here is the log and then followed below by the code. I'm working with very large datasets (30GB each) and just trying to merge them based on unique identifiers so I figured that's why it takes a long time to run.

 

Log:

38   libname proclm 'J:\SLAP Demographics\Proc\Claim';
NOTE: Libref PROCLM was successfully assigned as follows:
      Engine:        V9
      Physical Name: J:\SLAP Demographics\Proc\Claim
39   *getting the claim data for the procedures so we have the dates;
40   %macro prmerge(lib1=, lib2=, data1=, data2=);
41   %do i = 15 %to 18;
42
43   proc sql;
44   create table &lib2..&data1._&i._&data2. as
45   select *, (proc_&i._&data2._ids.clm_trans_id = &data1._&i..clm_trans_id)
45 ! as indic,
46           (proc_&i._&data2._ids.clm_trans_id ~='') as FROMproc_ids,
47           (&data1._&i..clm_trans_id ~='') as FROM&data1.
48   from proc.proc_&i._&data2._ids full join &lib1._&i..&data1._&i. on
48 ! proc_&i._&data2._ids.clm_trans_id=&data1._&i..clm_trans_id;
49   quit;
50
51   data &lib2..&data1._&i._&data2.;
52   set &lib2..&data1._&i._&data2.;
53   if FROMproc_ids = 1 and FROM&data1. = 1;
54   run;
55
56   %end;
57   %mend prmerge;
58   %prmerge(lib1=diag, lib2=proclm, data1=claim, data2=out);
WARNING: Variable clm_trans_id already exists on file PROCLM.CLAIM_15_OUT.
NOTE: Table PROCLM.CLAIM_15_OUT created, with 20821258 rows and 105 columns.

NOTE: PROCEDURE SQL used (Total process time):
      real time           2:08:40.25
      cpu time            6:07.13



NOTE: There were 20821258 observations read from the data set
      PROCLM.CLAIM_15_OUT.
NOTE: The data set PROCLM.CLAIM_15_OUT has 3551 observations and 105
      variables.
NOTE: DATA statement used (Total process time):
      real time           21:16.96
      cpu time            40.84 seconds


WARNING: Variable clm_trans_id already exists on file PROCLM.CLAIM_16_OUT.
ERROR: Insufficient space in file WORK.'SASTMP-000000012'n.UTILITY.
ERROR: File WORK.'SASTMP-000000012'n.UTILITY is damaged. I/O processing did
       not complete.
NOTE: Error was encountered during utility-file processing. You may be able
      to execute the SQL statement successfully if you allocate more space to
      the WORK library.
ERROR: There is not enough WORK disk space to store the results of an
       internal sorting phase.
ERROR: An error has occurred.

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
      real time           6:56:52.95
      cpu time            4:40.64


Code:


libname proclm 'J:\SLAP Demographics\Proc\Claim';
*getting the claim data for the procedures so we have the dates;
%macro prmerge(lib1=, lib2=, data1=, data2=);
%do i = 15 %to 18;

proc sql;
create table &lib2..&data1._&i._&data2. as
select *, (proc_&i._&data2._ids.clm_trans_id = &data1._&i..clm_trans_id) as indic,
		(proc_&i._&data2._ids.clm_trans_id ~='') as FROMproc_ids,
		(&data1._&i..clm_trans_id ~='') as FROM&data1.
from proc.proc_&i._&data2._ids full join &lib1._&i..&data1._&i. on proc_&i._&data2._ids.clm_trans_id=&data1._&i..clm_trans_id;
quit;

data &lib2..&data1._&i._&data2.;
set &lib2..&data1._&i._&data2.;
if FROMproc_ids = 1 and FROM&data1. = 1;
run;

%end;
%mend prmerge;
%prmerge(lib1=diag, lib2=proclm, data1=claim, data2=out);

Kurt_Bremser
Super User

So it looks like you are doing a full join first, only to then select for what would be the result of an inner join.

What is the relationship of the observations in the tables with regards to the key variable?

one-to-one, one-to-many, many-to-one or many-to-many?

 

Your disk space problem is the result of building a monster utiliy file for the SQL join; depending on the relationship, you might be able to use a non-SQL method.

PeterBr
Obsidian | Level 7

You are spot on, I'm merging and then only selecting the data that had id matches with the first dataset. It's a one-to-many merge.

 

The only ways I know how to merge are either proc sort -> data merge and the proc sql method (which seems to be faster and stronger than data merge since no sorting has to be done). 

 

If there are no other solutions from a code perspective, should I clear out the work folder or is that not recommended?

 

My code seemed to be working fine for a few months for other projects (albeit slow like you pointed out, which is okay with me as long as it works) but this error has just recently popped up.

Kurt_Bremser
Super User

The fact that you don't see an explicit sort does not mean that SQL won't have to do it. In fact it has been my experience over the years that a separate sort and following data step MERGE outperforms SQL, some times by several orders of magnitude.

So try this:

proc sort
  data=proc.proc_&i._&data2._ids
  out=data2
;
by clm_trans_id;
run;

proc sort
  data=&lib1._&i..&data1._&i.
  out=data1
;
by clm_trans_id;
run;

data &lib2..&data1._&i._&data2.;
merge
  data2 (in=in2)
  data1 (in=in1)
;
by clm_trans_id;
if in1 and in2;
run;
ballardw
Super User

Silly question:

 

You have this data step

data &lib2..&data1._&i._&data2.;
set &lib2..&data1._&i._&data2.;
if FROMproc_ids = 1 and FROM&data1. = 1;
run;

to subset data. Why don't you include this as a WHERE clause in selecting data in the Proc SQL?

 

Also, you want to be very careful of habitual use of the

 

Data have;

   set have;

programming structure. Since the result completely replaces the data source you may find that you have issues if more complex program logic doesn't quite do what you want. With run times as long as this I suspect you may not want to rerun code to rebuild the input set very often.

PeterBr
Obsidian | Level 7
How would the code look if I added in a where statement? I'm not that familiar with proc sql.

I agree 100% with your second point.

I've been using data have; set have; to avoid having to store the massive file that the proc sql creates. It just saves space to store what I want out of the merge, but maybe you are implying this step could be removed by adding in a where clause into the proc sql?
jklaverstijn
Rhodochrosite | Level 12

Consider creating indexes on the source table variables used in the ON clause. This way SQL could optimize the join and not need as much utility storage to complete.

 

Hope this helps,

- Jan.

aser123
Calcite | Level 5
11

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1696 views
  • 1 like
  • 5 in conversation