Hello,
In the SAS(R) LASR(TM) Analytic Server 2.7: Reference Guide, there is a section on Memory Management.
The internal LASR table _T_TABLEMEMORY contains information on all loaded tables in LASR. I am puzzeled about the columns used in this table -
(1) InMemorySize - Amount of memory that is needed to store the table in memory.
(2) TableAllocatedMemory - Amount of memory that is used for table storage.
I am wondering what the difference is between these two columns. I am assuming that 1 is the maximum amount of memory needed to load the table fully into memory and 2 is the amount of memory currently being used for the table (perhaps the user has selected a subset of the data and therefore not all records have been read from the HDFS store). This would appear to make sense since for tables loaded from HDFS the TablesAllocatedMemory value is always a small percentage of the InMemorySize. And for user imported excel sheets the TableAllocatedMemory exceeds the InMemorySize. I understand that for user imported data there is a overhead penalty.
Anyway, does anyone have more knowledge of this table? any help appreciated.
regards,
Richard
Hi RIchard,
You are correct on the meaning of the two columns:
As you mention, tables loaded from excel or sas datasets via direct load are loaded to memory, and the overhead for the table structure makes the TableAllocatedMemory size greater than the InMemorySize value.
In the case of data loading from HDFS the value for the TableAllocatedMemory column is often less than the InMemorySize column. This is because the memory is used only when the table is actually accessed. To quote the LASr Admin guide:
"When a distributed server loads a table from HDFS to memory .., the server defers reading the rows of data into physical memory. You can direct the server to perform an aggressive memory allocation scheme at load time with the READAHEAD option for the PROC LASR statement"
Hope that helps.
Gerry
Hi RIchard,
You are correct on the meaning of the two columns:
As you mention, tables loaded from excel or sas datasets via direct load are loaded to memory, and the overhead for the table structure makes the TableAllocatedMemory size greater than the InMemorySize value.
In the case of data loading from HDFS the value for the TableAllocatedMemory column is often less than the InMemorySize column. This is because the memory is used only when the table is actually accessed. To quote the LASr Admin guide:
"When a distributed server loads a table from HDFS to memory .., the server defers reading the rows of data into physical memory. You can direct the server to perform an aggressive memory allocation scheme at load time with the READAHEAD option for the PROC LASR statement"
Hope that helps.
Gerry
Hi Gerry,
this helps a lot. many thanks. If I wanted to determine the current amount of data physically loaded into memory then I would just sum the TableAllocatedMemory column, right?
It appears that only a small percentage of the rows of my SASHDAT tables are being loaded. What kinds of actions on a table (from VA) would force more data into LASR? I have tried all kinds of visualization and filters etc but the amount od data loaded remains stable. I will investigate the READAHEAD option.
regards,
Richard
Glad to help RIchard.
Yes, summing that column will give you total memory used by tables loaded in-memory.
On your second question, I am not sure what specific actions in VA would load more memory. When using HDFS I believe the LASR server is swapping data from hdfs into and out of memory. See the last paragraph in this blog
http://blogs.sas.com/content/sgf/2015/10/30/how-to-shrink-lasr-tables/
Thank you Gerry. These aren't significant contributions to the question, but I'll include them for anyone else who finds this post:
The TableAllocatedMemory also includes some data structures that the server uses to keep track of the table, that's why it is a little larger than the InMemorySize for non-SASHDAT. For tables loaded from SASHDAT files, that number is the overhead for the data structures.
The TableAllocatedMemory should change to reflect changes such as appends, additional permanent computed columns--anything that changes the structure of the table or changes the number of rows in the table.
Hi,
I want to follow up on this thread. I have loaded a table from HDFS to LASR. The table is 50GB on HDFS. I can see from the _T_TABLEMEMORY table that indeed only 13MB (yes MB) has been actually allocated in memory. This is as expected accoding to the information provided above. The issue I have is that this mapping of 13MB of data is taking almost 3 minutes to complete. Can someone provide some insight into what actually is happening between clicking the load button and the data being available in LASR. Any help much appreciated,
regards,
Richard
The LASR Procedure
Performance Information
Host Node sasva.infra.local
Execution Mode Distributed
Number of Compute Nodes 3
Data Access Information
Data Engine Role Path
VAPUBLIC.JENNIEVATEST2 SASHDAT Input Parallel, Symmetric
Hi Richard,
Sorry for the late reply. Ultimately, in order to analyze the data, all 50G does need to pass through memory. That's can take a bit of time and if physical memory is in use (and specifically not in use for memory-mapped SASHDAT files) then you might be waiting for the OS to swap out other items.
The key takeaways are that performance is best:
Hope that helps.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.
Find more tutorials on the SAS Users YouTube channel.