BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
littlestone
Fluorite | Level 6

Hello, All

I am being asked a question: if a raw data is 100 GB, however the available memory is only 16G, can SAS read in such big data?

I believe the answer should be "Yes"; but I am not sure how to do it in reality. Can we read in such big data the same way as we do with regular, small data?

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

The beauty of SAS data step is to read data record by record instead of swallowing the whole thing at one time. You sure can read it as long as your destination hard drive is large enough.

On a side note, you probably not be able to read it into hash object, though. As it will be residing in your memory.

Not sure about proc sql though.

Regards,

Haikuo

View solution in original post

7 REPLIES 7
Haikuo
Onyx | Level 15

The beauty of SAS data step is to read data record by record instead of swallowing the whole thing at one time. You sure can read it as long as your destination hard drive is large enough.

On a side note, you probably not be able to read it into hash object, though. As it will be residing in your memory.

Not sure about proc sql though.

Regards,

Haikuo

littlestone
Fluorite | Level 6

Thank you.

I understand that SAS read data record by record; does it mean that, at any given time, only ONE record resides in the memory?

Haikuo
Onyx | Level 15

Well, I don't really know the definite answer to your question. I tends to believe there is a buffer that holds more than just ONE record, otherwise it is hard to explain functions like lag(), dif(). I am sure someone on the forum will help us out.

Haikuo

Haikuo
Onyx | Level 15

OK, I did some quick research, here is something form SAS help:

Memory Limit for the LAG Function

When the LAG function is compiled, SAS allocates memory in a queue to hold the values of the variable that is listed in the LAG function. For example, if the variable in function LAG100(x) is numeric with a length of 8 bytes, then the memory that is needed is 8 times 100, or 800 bytes. Therefore, the memory limit for the LAG function is based on the memory that SAS allocates, which varies with different operating environments.

It seems to me that if compiler has encountered lag() or dif(), it will allocate more memory storage for the excution. if not, maybe just ONE record at ONE time.

Regards,

Haikuo

Tim_SAS
Barite | Level 11

The number of observations that SAS holds in memory depends on many things, including (but not limited to) the size of an observation and the amount of memory available. See the BUFSIZE (http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202861.htm) and BUFNO (http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202857.htm) options for a start.

FriedEgg
SAS Employee

I often read multi TB raw data files into SAS with nothing special in terms of coding and using a few hundred MB of RAM, at most.  There is a lot more in memory than just a single record.  As Hai.kuo pointed out some of the function that use data stored in 'queues' to retrieve a limited set of previously seen data.  The queues however are likely only provisioned if the function are used.  The amount of data from a raw file that is stored in memory is mainly just as a result of OS file buffering where the data is read in blocks from disk into cache and then taken in by SAS and processed through your data step, flushed and repeated.  This is controlled by several things, such as the size of the a line of data from the raw data file and the settings of SAS options bufno and bufsize, as Tim@SAS says.  A 100GB should not cause any issues on a machine with 16GB of RAM unless you are forcefully trying to put the file into memory using something like the memlib option, sasfile, or hash objects.  And these options would only cause issues depending on the resulting size of the SAS dataset, rather than the size of the original raw data file size.

littlestone
Fluorite | Level 6

Thank you all for help.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2022 views
  • 0 likes
  • 4 in conversation