Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

data step with a huge dataset-message 'lack of resources'

Reply
Contributor
Posts: 50

data step with a huge dataset-message 'lack of resources'


I need to run data step for a dataset that may contain over 1000 million obs, it seems that it is too huge for SAS to handle it, it gave a message 'lack of resources' and stoped to process it, Is there any other way to handle this in SAS?

this the data I need to create:

data A;

      set B(keep=n1 m1 x1 y1);  *** about 5000 obs in B;

      do n = n1 +1 to &total-(m1-1);

          do m = m1+1 to &total-(n1+1) while (n+m <= &total);  *** &total could be 90-100;

               do y = y1 to n while (y1<=y<n);

                   do x = x1 to m while (x1<=x<m);

                   end; end; end; end;

run;

PROC Star
Posts: 7,473

Re: data step with a huge dataset-message 'lack of resources'

What are you trying to do?  Your data step appears to just go through a data-controlled number of iterations without actually accomplishing anything.

Contributor
Posts: 50

Re: data step with a huge dataset-message 'lack of resources'

I need to calculate some probabilities in the data step (see the following code), I removed the calculation to make the code clear in the message above.

data A;

      set B(keep=n1 m1 x1 y1);  *** about 5000 obs in B;

      do n = n1 +1 to &total-(m1-1);

          do m = m1+1 to &total-(n1+1) while (n+m <= &total);  *** &total could be 90-100;

               do y = y1 to n while (y1<=y<n);

                   do x = x1 to m while (x1<=x<m);

               term2_p0 = pdf('BINOMIAL', x1, &p01, m1)* pdf('BINOMIAL', y1, &p02, n1)

                       *pdf('BINOMIAL', (x-x1), &p01, (m-m1))* pdf('BINOMIAL', (y-y1), &p02, (n-n1));

               term2_p1 = pdf('BINOMIAL', x1, &p11, m1)* pdf('BINOMIAL', y1, &p12, n1)

                       *pdf('BINOMIAL', (x-x1), &p11, (m-m1))* pdf('BINOMIAL', (y-y1), &p12, (n-n1));

                   end; end; end; end;

run;

PROC Star
Posts: 7,473

Re: data step with a huge dataset-message 'lack of resources'

It would help if you could provide one record that causes the problem, along with the value of the macro variable you used.  My initial guess is that you have created an infinite loop which, by definition, will simply eat up all of your resources.

And, since you don't have an output statement in the loop, if the loop isn't infinite, it would simply go through all of the iterations and result in only one record per record processed.

Contributor
Posts: 50

Re: data step with a huge dataset-message 'lack of resources'

this is the code:

%let userN = 40;

%let p01 = 0.05;

%let p02= 0.05;

%let p11 = 0.25;

%let p12 = 0.25;

data s1_A;

  do n1 = 2 to &userN-2;

    do m1 = 2 to &userN-2 while ((m1 + n1) < &userN-2);

       

     do y1 = 0 to n1-1

           do x1 = 0 to m1-1 ;

                

                 term1_p1 = pdf('BINOMIAL', x1, &p11, m1)* pdf('BINOMIAL', y1, &p12, n1);

                 term1_p0 = pdf('BINOMIAL', x1, &p01, m1)* pdf('BINOMIAL', y1, &p02, n1);

           term1_p0p1 = pdf('BINOMIAL', x1, &p01, m1)* pdf('BINOMIAL', y1, &p12, n1);

       

         output;

           end;

     end;    

  end;

end;  

run;

data s2_B;

     set s1_A;

          do n = n1+1 to &userN-(m1+1);

               do m = m1+1 to &userN-(n1+1) while (n+m <= &userN);

               do y = y1 to n while (y1<=y<n);

                    do x = x1 to m while (x1<=x<m);                                        

                        

            term2_p0 = pdf('BINOMIAL', x1, &p01, m1)* pdf('BINOMIAL', y1, &p02, n1)

                       *pdf('BINOMIAL', (x-x1), &p01, (m-m1))* pdf('BINOMIAL', (y-y1), &p02, (n-n1));

            term2_p1 = pdf('BINOMIAL', x1, &p11, m1)* pdf('BINOMIAL', y1, &p12, n1)

                       *pdf('BINOMIAL', (x-x1), &p11, (m-m1))* pdf('BINOMIAL', (y-y1), &p12, (n-n1));

            term2_p0p1 = pdf('BINOMIAL', x1, &p01, m1)* pdf('BINOMIAL', y1, &p12, n1)

                       *pdf('BINOMIAL', (x-x1), &p01, (m-m1))* pdf('BINOMIAL', (y-y1), &p12, (n-n1));

               output;                  

               end;

          end;            

      end;

   end;

run;

PROC Star
Posts: 7,473

Re: data step with a huge dataset-message 'lack of resources'

How big is your hard drive?  Your first datastep creates a file with 107,415 records.  Just including the first 2 records from that datastep into your second datastep created a file that was over 22mb in size.  As such, if all of the other records produce approximately the same number of iterations, your resulting file would be approximately 2,246,014 mb in size.

I can't even test that because I don't have that much free space available on my machine.

Contributor
Posts: 50

Re: data step with a huge dataset-message 'lack of resources'

Also I need to do other calculations based on this data, I tried to separate the whole data to several subsets, but it's not working, SAS still can't handle it.

PROC Star
Posts: 1,167

Re: data step with a huge dataset-message 'lack of resources'

1. I ran the earlier code you posted, with some made-up values for the input data, and didn't have any problem. I don't see any heavy use of resources that should cause any problems; it's just a very long process because it does a lot.

2. I agree completetly with Art; this job is producing a huge amount of output. I find it hard to imagine that you'll be able to do anything useful with it.

3. Exactly what diagnostic are you receiving? Is it possible to post a piece of the log that contains the message? Do you have any indication of which step is failing, and how many records had been processed when it failed?

4. If not, try to find out where the problem is. As Art was able to run the first step, I assume it's the second step that's the problem. If you insert the following line immediately after your set statement, the log should contain a line for every thousand records read.

if mod(_n_, 1000) = 0 then put _n_ =;

Tom

PROC Star
Posts: 7,473

Re: data step with a huge dataset-message 'lack of resources'

Tom,  I'm still going to bet it is hard drive.  At least 1.25 terabytes would be needed.

Contributor
Posts: 50

Re: data step with a huge dataset-message 'lack of resources'

 

My hard drive is about 200 GB. This is the error in the log file, the userN = 50, userN need to be up to 80.

ERROR: Write to WORK.S2_A.DATA failed. File is full and may be damaged.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 18914 observations read from the data set WORK.S1_B.

WARNING: The data set WORK.S2_A may be incomplete. When this step was stopped there were

656636456 observations and 28 variables.

WARNING: Data set WORK.S2_A was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

real time 3:29:41.39

cpu time 36:00.33

Super User
Super User
Posts: 7,043

Re: data step with a huge dataset-message 'lack of resources'

There is no way to use all of the numbers you are generating.  I assume that you will want to summarize them in some way.  You could include logic to do that in the data step so that only the summarized data is output.  You should be able to dramatically reduce the size of the dataset that you need actually store.

If you want to summarize using a SAS proc like MEANS/SUMMARY then you could code your data step as a view. That should prevent SAS from creating the giant dataset.

data v1 / view=v1 ;

   .....

run;

proc sumary data=v1 ... .;

run;

Contributor
Posts: 50

Re: data step with a huge dataset-message 'lack of resources'

Thank you, Tom. I won't do proc summary, but need to calculate cumulative sum for each observation within each group (same n, m, n1, m1, x, y, x1, y1)  and need to keep those observations that the cumulative sum is less than a prespecified cutoff.

Ask a Question
Discussion stats
  • 11 replies
  • 413 views
  • 0 likes
  • 4 in conversation