Hello, All
I am being asked a question: if a raw data is 100 GB, however the available memory is only 16G, can SAS read in such big data?
I believe the answer should be "Yes"; but I am not sure how to do it in reality. Can we read in such big data the same way as we do with regular, small data?
The beauty of SAS data step is to read data record by record instead of swallowing the whole thing at one time. You sure can read it as long as your destination hard drive is large enough.
On a side note, you probably not be able to read it into hash object, though. As it will be residing in your memory.
Not sure about proc sql though.
Regards,
Haikuo
The beauty of SAS data step is to read data record by record instead of swallowing the whole thing at one time. You sure can read it as long as your destination hard drive is large enough.
On a side note, you probably not be able to read it into hash object, though. As it will be residing in your memory.
Not sure about proc sql though.
Regards,
Haikuo
Thank you.
I understand that SAS read data record by record; does it mean that, at any given time, only ONE record resides in the memory?
Well, I don't really know the definite answer to your question. I tends to believe there is a buffer that holds more than just ONE record, otherwise it is hard to explain functions like lag(), dif(). I am sure someone on the forum will help us out.
Haikuo
OK, I did some quick research, here is something form SAS help:
When the LAG function is compiled, SAS allocates memory in a queue to hold the values of the variable that is listed in the LAG function. For example, if the variable in function LAG100(x) is numeric with a length of 8 bytes, then the memory that is needed is 8 times 100, or 800 bytes. Therefore, the memory limit for the LAG function is based on the memory that SAS allocates, which varies with different operating environments.
It seems to me that if compiler has encountered lag() or dif(), it will allocate more memory storage for the excution. if not, maybe just ONE record at ONE time.
Regards,
Haikuo
The number of observations that SAS holds in memory depends on many things, including (but not limited to) the size of an observation and the amount of memory available. See the BUFSIZE (http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202861.htm) and BUFNO (http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202857.htm) options for a start.
I often read multi TB raw data files into SAS with nothing special in terms of coding and using a few hundred MB of RAM, at most. There is a lot more in memory than just a single record. As Hai.kuo pointed out some of the function that use data stored in 'queues' to retrieve a limited set of previously seen data. The queues however are likely only provisioned if the function are used. The amount of data from a raw file that is stored in memory is mainly just as a result of OS file buffering where the data is read in blocks from disk into cache and then taken in by SAS and processed through your data step, flushed and repeated. This is controlled by several things, such as the size of the a line of data from the raw data file and the settings of SAS options bufno and bufsize, as Tim@SAS says. A 100GB should not cause any issues on a machine with 16GB of RAM unless you are forcefully trying to put the file into memory using something like the memlib option, sasfile, or hash objects. And these options would only cause issues depending on the resulting size of the SAS dataset, rather than the size of the original raw data file size.
Thank you all for help.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.