BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RichardPaterson
Obsidian | Level 7

 

Hello, 

 

In the SAS(R) LASR(TM) Analytic Server 2.7: Reference Guide, there is a section on Memory Management. 

 

The internal LASR table _T_TABLEMEMORY contains information on all loaded tables in LASR. I am puzzeled about the columns used in this table - 

 

(1) InMemorySize - Amount of memory that is needed to store the table in memory.

(2) TableAllocatedMemory - Amount of memory that is used for table storage.

 

I am wondering what the difference is between these two columns. I am assuming that 1 is the maximum amount of memory needed to load the table fully into memory and 2 is the amount of memory currently being used for the table (perhaps the user has selected a subset of the data and therefore not all records have been read from the HDFS store). This would appear to make sense since for tables loaded from HDFS the TablesAllocatedMemory value is always a small percentage of the InMemorySize. And for user imported excel sheets the TableAllocatedMemory exceeds the InMemorySize. I understand that for user imported data there is a overhead penalty. 

 

Anyway, does anyone have more knowledge of this table? any help appreciated. 

 

regards, 

 

Richard

 

1 ACCEPTED SOLUTION

Accepted Solutions
GerryNelson
SAS Super FREQ

Hi RIchard,

 

You are correct on the meaning of the two columns:

 

  • InMemorySize is the amount of memory that would be needed if the table is completely loaded in memory.
  • TableAllocatedMemory is the amount of memoty currently being used by the table.

As you mention, tables loaded from excel or sas datasets via direct load are loaded to memory, and the overhead for the table structure makes the TableAllocatedMemory size greater than the InMemorySize value.

 

In the case of data loading from HDFS the value for the TableAllocatedMemory column is often less than the InMemorySize column. This is because the memory is used only when the table is actually accessed. To quote the LASr Admin guide:

 

"When a distributed server loads a table from HDFS to memory ..,  the server defers reading the rows of data into physical memory. You can direct the server to perform an aggressive memory allocation scheme at load time with the READAHEAD option for the PROC LASR statement"

 

Hope that helps.

Gerry

 

 

 

 

 

 

View solution in original post

6 REPLIES 6
GerryNelson
SAS Super FREQ

Hi RIchard,

 

You are correct on the meaning of the two columns:

 

  • InMemorySize is the amount of memory that would be needed if the table is completely loaded in memory.
  • TableAllocatedMemory is the amount of memoty currently being used by the table.

As you mention, tables loaded from excel or sas datasets via direct load are loaded to memory, and the overhead for the table structure makes the TableAllocatedMemory size greater than the InMemorySize value.

 

In the case of data loading from HDFS the value for the TableAllocatedMemory column is often less than the InMemorySize column. This is because the memory is used only when the table is actually accessed. To quote the LASr Admin guide:

 

"When a distributed server loads a table from HDFS to memory ..,  the server defers reading the rows of data into physical memory. You can direct the server to perform an aggressive memory allocation scheme at load time with the READAHEAD option for the PROC LASR statement"

 

Hope that helps.

Gerry

 

 

 

 

 

 

RichardPaterson
Obsidian | Level 7

Hi Gerry, 

 

this helps a lot. many thanks. If I wanted to determine the current amount of data physically loaded into memory then I would just sum the TableAllocatedMemory column, right?

 

It appears that only a small percentage of the rows of my SASHDAT tables are being loaded. What kinds of actions on a table (from VA) would force more data into LASR? I have tried all kinds of visualization and filters etc but the amount od data loaded remains stable. I will investigate the READAHEAD option.

 

regards, 

Richard

GerryNelson
SAS Super FREQ

Glad to help RIchard.

 

Yes, summing that column will give you total memory used by tables loaded in-memory.

 

On your second question, I am not sure what specific actions in VA would load more memory. When using HDFS I believe the LASR server is swapping data from hdfs into and out of memory. See the last paragraph in this blog

 

http://blogs.sas.com/content/sgf/2015/10/30/how-to-shrink-lasr-tables/

 

 

MikeMcKiernan
SAS Employee

Thank you Gerry.  These aren't significant contributions to the question, but I'll include them for anyone else who finds this post:

 

The TableAllocatedMemory also includes some data structures that the server uses to keep track of the table, that's why it is a little larger than the InMemorySize for non-SASHDAT.  For tables loaded from SASHDAT files, that number is the overhead for the data structures.

 

The TableAllocatedMemory should change to reflect changes such as appends, additional permanent computed columns--anything that changes the structure of the table or changes the number of rows in the table.

RichardPaterson
Obsidian | Level 7

Hi, 

 

I want to follow up on this thread. I have loaded a table from HDFS to LASR. The table is 50GB on HDFS. I can see from the _T_TABLEMEMORY table that indeed only 13MB (yes MB) has been actually allocated in memory. This is as expected accoding to the information provided above. The issue I have is that this mapping of 13MB of data is taking almost 3 minutes to complete. Can someone provide some insight into what actually is happening between clicking the load button and the data being available in LASR. Any help much appreciated, 

 

regards, 

Richard

 

The LASR Procedure

Performance Information

Host Node sasva.infra.local
Execution Mode Distributed
Number of Compute Nodes 3


Data Access Information

Data Engine Role Path

VAPUBLIC.JENNIEVATEST2 SASHDAT Input Parallel, Symmetric

 

 

MikeMcKiernan
SAS Employee

Hi Richard,

 

Sorry for the late reply.  Ultimately, in order to analyze the data, all 50G does need to pass through memory.  That's can take a bit of time and if physical memory is in use (and specifically not in use for memory-mapped SASHDAT files) then you might be waiting for the OS to swap out other items.

 

The key takeaways are that performance is best:

 

  1. when the entire table fits in physical RAM.
  2. SASHDAT files are memory efficient--there's no performance disadvantage to using them.
  3. SASHDAT files give you a performance advantage when memory demand is greater than physical RAM because you avoid the performance penalty of swapping the table out of memory.

Hope that helps.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Tips for filtering data sources in SAS Visual Analytics

See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 3089 views
  • 0 likes
  • 3 in conversation