I used this code to create an index for the variable (recip) in data file dataset1:
Data dataset1 (index=(recip));
set dataset1;
run;
The SAS log reported:
NOTE: There were 1,847,118,717 observations read from the data set Dataset1.
NOTE: The data set Dataset1 has 1,847,118,717 observations and 20 variables.
NOTE: Sort execution failure.
ERROR: insufficient space in file WORK. 'SASTMP-000000006'n.UTILITY.
NOTE: File WORK. 'SASTMP-000000006'n.UTILITY is damaged. I/O processing did not complete.
WARNING: Limited resources when loading index RECIP for file Dataset1.index.
A solution to this problem would be appreciated.
Thanks
Move the dataset out of WORK (to a library on different physical storage) and create the index with PROC DATASETS (INDEX CREATE) or CREATE INDEX in PROC SQL.
I would suggest using proc datasets, but i am not sure that the error message can be avoided. How much space is free on the drive used for work? Maybe using tagsort instead of an index could solve the problem.
Is the actual goal here to sort the data set or to create an index?
You could help by providing more information. For example, is an index your first choice, or is it your choice of last resort because PROC SORT is failing (or perhaps because this is a permanent SAS data set that you are not allowed to sort)? What operating system are you using? Do you plan to use this index multiple times, or is it for one-time use? When you use the index, will you retrieve all the observations in sorted order (avoiding the need for PROC SORT), or will you retrieve a small subset of the observations each time?
Here's an old paper that applies mostly to the MVS operating system. Maybe it would help, maybe not depending on how you would answer some of the questions above? https://www.beoptimized.be/pdf/SUGI24_39.pdf
But be aware that your worst fear might be that you are successful, but the index takes so long to retrieve the observations that it is not practical to use. So any i information you can provide about how you would use the index can only help.
You are dealing with some serious data volumes here: 1,847,118,717 observations and 20 variables.
To be successful in dealing with such data you need some "advanced" understanding how SAS works and how to implement performant code.
Just some tips "on the fly":
- Delete all tables in WORK that you don't need anymore
- Minimize passes through data
- To just create an index don't use a data step that re-creates the table but use Proc Datasets instead
- Reduce data volumes as early as possible (by dropping rows or aggregating the data)
- Investigate if using the SPDE engine could be beneficial
- Define variable lengths to the minimum required to store the data without truncation (numerical variables included).
- Avoid any logic that creates high volume intermediary data (like some cartesian join - find an alternative way to get to the desired outcome)
- etc.
Move the dataset out of WORK (to a library on different physical storage) and create the index with PROC DATASETS (INDEX CREATE) or CREATE INDEX in PROC SQL.
Thanks, Kurt. I deleted a bunch of files where my SAS Work file was being stored. That stopped the program from crashing. Also, I ran the program using the following program to create indexes, which ran successfully:
proc datasets lib=A1;
modify Dataset1;
index create recip;
run;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.