Help using Base SAS procedures

Big data vs. limited computer memory

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 89
Accepted Solution

Big data vs. limited computer memory

Hello, All

I am being asked a question: if a raw data is 100 GB, however the available memory is only 16G, can SAS read in such big data?

I believe the answer should be "Yes"; but I am not sure how to do it in reality. Can we read in such big data the same way as we do with regular, small data?


Accepted Solutions
Solution
‎02-15-2012 01:25 PM
Respected Advisor
Posts: 3,156

Big data vs. limited computer memory

Posted in reply to littlestone

The beauty of SAS data step is to read data record by record instead of swallowing the whole thing at one time. You sure can read it as long as your destination hard drive is large enough.

On a side note, you probably not be able to read it into hash object, though. As it will be residing in your memory.

Not sure about proc sql though.

Regards,

Haikuo

View solution in original post


All Replies
Solution
‎02-15-2012 01:25 PM
Respected Advisor
Posts: 3,156

Big data vs. limited computer memory

Posted in reply to littlestone

The beauty of SAS data step is to read data record by record instead of swallowing the whole thing at one time. You sure can read it as long as your destination hard drive is large enough.

On a side note, you probably not be able to read it into hash object, though. As it will be residing in your memory.

Not sure about proc sql though.

Regards,

Haikuo

Frequent Contributor
Posts: 89

Big data vs. limited computer memory

Thank you.

I understand that SAS read data record by record; does it mean that, at any given time, only ONE record resides in the memory?

Respected Advisor
Posts: 3,156

Big data vs. limited computer memory

Posted in reply to littlestone

Well, I don't really know the definite answer to your question. I tends to believe there is a buffer that holds more than just ONE record, otherwise it is hard to explain functions like lag(), dif(). I am sure someone on the forum will help us out.

Haikuo

Respected Advisor
Posts: 3,156

Big data vs. limited computer memory

OK, I did some quick research, here is something form SAS help:

Memory Limit for the LAG Function

When the LAG function is compiled, SAS allocates memory in a queue to hold the values of the variable that is listed in the LAG function. For example, if the variable in function LAG100(x) is numeric with a length of 8 bytes, then the memory that is needed is 8 times 100, or 800 bytes. Therefore, the memory limit for the LAG function is based on the memory that SAS allocates, which varies with different operating environments.

It seems to me that if compiler has encountered lag() or dif(), it will allocate more memory storage for the excution. if not, maybe just ONE record at ONE time.

Regards,

Haikuo

Super Contributor
Posts: 394

Big data vs. limited computer memory

Posted in reply to littlestone

The number of observations that SAS holds in memory depends on many things, including (but not limited to) the size of an observation and the amount of memory available. See the BUFSIZE (http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202861.htm) and BUFNO (http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202857.htm) options for a start.

Trusted Advisor
Posts: 1,301

Re: Big data vs. limited computer memory

Posted in reply to littlestone

I often read multi TB raw data files into SAS with nothing special in terms of coding and using a few hundred MB of RAM, at most.  There is a lot more in memory than just a single record.  As Hai.kuo pointed out some of the function that use data stored in 'queues' to retrieve a limited set of previously seen data.  The queues however are likely only provisioned if the function are used.  The amount of data from a raw file that is stored in memory is mainly just as a result of OS file buffering where the data is read in blocks from disk into cache and then taken in by SAS and processed through your data step, flushed and repeated.  This is controlled by several things, such as the size of the a line of data from the raw data file and the settings of SAS options bufno and bufsize, as Tim@SAS says.  A 100GB should not cause any issues on a machine with 16GB of RAM unless you are forcefully trying to put the file into memory using something like the memlib option, sasfile, or hash objects.  And these options would only cause issues depending on the resulting size of the SAS dataset, rather than the size of the original raw data file size.

Frequent Contributor
Posts: 89

Big data vs. limited computer memory

Thank you all for help.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 1137 views
  • 0 likes
  • 4 in conversation