BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
HSpark
Calcite | Level 5

Hi.

I am using SAS University Edition in Windows 10 environment with VM Ware Workstation Player 12. I am now doing logistic regression and propensity score matching. My data consist of 1.5 million observations and 163 variables. The size of data file is about 450 M.  Command consists of several macros and 250 lines. When I run the program, it stops and the message pops up.

 

ERROR: Insufficient space in file WORK.BIRTH.DATA.
ERROR: File WORK.BIRTH.DATA is damaged. I/O processing did not complete.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1579785 observations read from the data set WORK.BIRTH.
WARNING: The data set WORK.BIRTH may be incomplete. When this step was stopped there were 1579785 observations and 167 variables.
WARNING: Data set WORK.BIRTH was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 5.18 seconds
cpu time 2.08 seconds
 
Part of the code is like this.


PROC IMPORT file='/folders/myfolders/epi271/natl2015_ver11_1_2.dta'
    DBMS=stata
    OUT=WORK.birth;
    
RUN;
data work.birth;
    set work.birth;
    
    newmager9=.;

    if mager9=1 then
        newmager9=2;

    if mager9 ne 1 then
        newmager9=mager9;
    id=_N_;
run;

proc logistic descending;
    class newmager9 morace newmar newmeduc para_cat overtdm chrhtn ptdhx cesar_cat;
    model inftr=newmager9 morace newmar newmeduc para_cat overtdm chrhtn ptdhx
        cesar_cat;
    output out=psmodel pred=propscore;
run;

****************************************************************************************************************************************/
data caco1;
    set psmodel;
    propint=propscore;

   if inftr=1 then
        case=1;

    if inftr=0 then
        control=1;

        
run;

proc sort ;
    by id;
    *id can be modified as wanted;
run;

********* MACRO FOR MATCHING BASED ON PROPENSITY SCORE ******************************************;

%macro caco(s, x, m, l, gd);
    data t3a(keep=id1 propsco1) t3b(keep=id2 propsco2);
        set caco1;
------------------------------------------------------------------------
 
This message shows even before logistic regression starts. I cannot understand this situation. Image file is the disk setting of the virtual machine.
cut.png
Is there any way to get around it? Please let me know.
Thank you.
 
 

cut.png
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

This may be too big for SAS UE. The original intention was for learning purposes only and there are restrictions built into the software. I would contact SAS Tech Support directly for this and see if there's any work around. Because of the limitations of the software, one way to get around is to divide the data into smaller portions and process individually, if possible. 

View solution in original post

4 REPLIES 4
Reeza
Super User

This may be too big for SAS UE. The original intention was for learning purposes only and there are restrictions built into the software. I would contact SAS Tech Support directly for this and see if there's any work around. Because of the limitations of the software, one way to get around is to divide the data into smaller portions and process individually, if possible. 

HSpark
Calcite | Level 5
Thank you for kind explanation. I will try SAS 9.4 at the school lab. Thank you.
LinusH
Tourmaline | Level 20

Need to chime in with @Reeza: this data set is obviously larger than required for self paced learning purposes. You should be able to use a subset of your input data, and still be able to perform your training.

 

A thing you could do is to remove any work data set is not needed in your process as soon as it has played out its role.

Data never sleeps
Kurt_Bremser
Super User

Although the data file size is "only" 450 MB, this does not mean that a SAS dataset won't be considerably larger.

Let's make a quick calculation:

Assume your variables were all numeric (8 bytes), then a single observation would consume

163 * 8 = 1,304 bytes

Multiply by 1.5 million

1,304 * 1,500,000 = 1,956,000,000 bytes

equates to 1,910,156.25 Kbytes

equates to 1,865.387 Mbytes

equates to 1.8217 GB

 

Depending on the structure (char variables might be _considerably_ longer), the dataset might be much bigger than that.

 

UE is simply not designed for this quantity structure.

 

Inspect your data on the Stata side and reduce the number of observations.

Use the compress=yes option to minimize disk storage consumption.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 3715 views
  • 3 likes
  • 4 in conversation