BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MattSan
Calcite | Level 5

Hello All,

 

I am currently working on a large data migration project that requires me to process very large datasets. I am looking for a way to optimise I/O efficiency as the project data processing is time sensitive i.e. the client needs their systems to be offline for the least amount of time possible. The SAS server that I am using has the following base specifications: 6 CPU's with 128GB of RAM. On average the server CPU's are at around 90% idle and uses about 25% of its RAM during daily use meaning that approximately 75% of the RAM is underutilised.

 

I have started experimenting with the SASFILE and COMPRESS commands however I would still like to reduce I/O between datasteps.

 

Does SAS automatically keep processed datasets in memory within a SASProg if the following datastep requires the same data for processing? i.e. will SAS only write the final dataset to disk after the final datastep and if not, how could I possibly implement this?

 

E.g. 

 

data test1; 
   set mydata.census; 1
run;

data test2; 
   set mydata.census; 2
run;

proc summary data=mydata.census print; 3
run;

data mydata.census; 4
   modify mydata.census; 
   .
   . (statements to modify data)
   .
run;

 Any advice or tips would be greatly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
boemskats
Lapis Lazuli | Level 10

Hi Matt,

 

It is indeed possible on a UNIX system, but it depends on your flavour of UNIX, and possibly your relationship with your administrator.

 

For Linux you would need to use tmpfs and ask your admin to create a tmpfs mount point for you. If you're on Linux you might have /dev/shm already mounted, but it will be restricted in size to half of the RAM available on your physical machine.

 

I wrote a paper on this for SAS Global Forum last year. You'll find the 'Reporposing Memory' section on Page 6 most interesting as it discusses the different approaches of storing data directly in memory on *nix.

 

Hope that helps.

 

Nik

View solution in original post

17 REPLIES 17
LinusH
Tourmaline | Level 20

Very interesting subject.

The more memory you assign to the SAS session (MEMESIZE), there more likely is it for SAS to use this as an internal swap. That said, I'm not convinced that is used for data set output processing. I think that output data is written directly to disk, and not kept in RAM. But that might have to be verified by a SAS Institute SW developer...

 

SASFILE will actually store specified data into RAM. But in your example, it would be awkward, and will not give you any advantage. SASFILE is best suited for data sets referenced several times.

 

COMPRESS does only work on disk. So yes, it would in some cases reduce I/O, but you pay with CPU.

 

The one thing that you could do is work with views whenever you can. Then there a quite high possibility that those are evaluated within memory (if enough). So if this is crucial, you may need to change from PROC SUMMARY to PROC SQL GROUP BY etc.

Data never sleeps
MattSan
Calcite | Level 5

Thank you for your insights regarding MEMSIZE (will investigate this further), would be great to have some input from a SAS Institute SW developer on this topic.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

If it is a massive project, and teh data is critical, then you should be running a mirrored system.  I.e. two of the same.  So that if one goes down the other effectively carries on.  This secondary system should be away from the main to give disaster recovery.  

If you have that, then it shouldn't be a problem, you take the backup offline to upgrade, then set that as main, and set the other to mirror the updates.  Of course that is once you have taken at least one full system restore point (probably to tape or something) and moved that offsite.

Kurt_Bremser
Super User

When you write data, the data is first written into the non-persistent file cache that the operating system maintains, and is subsequently flushed out to disk. The flush happens either periodically (to reduce probable data loss in case of system crash or power loss) or when the system runs out of file cache space.

If the system has enough RAM for file cache, a read of a dataset that was just written will occur mainly from the cache and will be quite fast.

 

Mind that this requires proper operating system configuration. What platform are you using for SAS?

MattSan
Calcite | Level 5

The cache that you mention, is it held in RAM or on disk?

 

Platform information:

 

PROC PRODUCT_STATUS;
25 RUN;

For Base SAS Software ...
Custom version information: 9.3_M1
Image version information: 9.03.01M1P110211

 

Do you perhaps have any insight on reducing I/O requests?

Patrick
Opal | Level 21

I'd be careful with the following but instead of using the SASFILE command, you can also assign a library which points to memory as documented here (link for Windows OS but similar info also available for UNIX/LINUX):

http://support.sas.com/documentation/cdl/en/hostwin/69955/HTML/default/viewer.htm#p041tbb02reefnn1jm...

 

 

AhmedAl_Attar
Rhodochrosite | Level 12

Hi,

 

You may want to investigate the Piping Feature of SAS/CONNECT (if you have that module licensed). The benefits of piping include:

  • overlapped execution of proc and/or data step
  • eliminate intermediate write to disk
  • improved performance
  • reduced disk space requirements

 

Hope this helps,

Ahmed

rogerjdeangelis
Barite | Level 11
SAS Forum How do you persist data in memory (RAM)
          between datasteps i.e. reduce I/O between datasteps?

inspired by
https://goo.gl/bVcEGd
https://communities.sas.com/t5/General-SAS-Programming/How-do-you-persist-data-in-memory-RAM-between-datasteps-i-e/m-p/326987

I do not know exactly how SAS implements DOSUBL, but I suspect
a DOSUBL runs in one virtual address space.

Here I am sharing a storage location anmong datasteps.
I suspect they both live in the same address space
so storages is less likely  to be released or paged?

A lot depends on how SAS colpiles DOSUBL.

Seems to me this has potential for sharing HASHes?

HAVE
====

  A common block or memory that I want to share with multiple datasteps

  data parent;
    %commonc(cartype $8,ACTION=INIT);  /* same virtual address */
    cartype='PARENT';
    put cartype=;
  ...
  data child1;
    %commonc(cartype $8,ACTION=PUT);   /* same virtual address */
    put cartype=;

WANT  cartype=PARENT in both datastep (even though not defined in child)
========================================================================

   Cartype=PARENT

   and again

   Cartype=PARENT

   Note wou can change the cartype in the
   child and it will appear in the parent

SOLUTION  (Here I chage cartype in the child and the change shows up in the parent)
====================================================================================

data _null_;
 %commonc(cartype $8,action=INIT);
 set sashelp.cars(obs=1);
 cartype=make;
 put cartype=; /* CARTYPE=Acura */
 rc=dosubl('
     data test1;
        set class;
     run;
     data test2;
        set class(obs=1);
        cartype="HONDA";
        %commonc(cartype $8,ACTION=PUT);
     run;
     proc means data=class;
     run;
');
 put cartype=; /* CARTYPE=HONDA */
run;quit;

rogerjdeangelis
Barite | Level 11
Here is the commonc macro

%macro commonc(var,action=INIT);
 * dosubl sets sysindex to 1;
 * we are in dosubl if sysindex=1;
 * increment sysindex so it is not 1 next time macro called;
 %local varcut varlen;
 %let varcut=%scan(&var,1);
 %let varlen=%scan(&var,2);
 %if %upcase(&action) = INIT %then %do;
    length &var;
    retain &varcut " ";
    call symputx("varadr",put(addrlong(&varcut.),hex16.),"G");
    put "***PARENT &var &varcut &varlen &SYSDATASTEPPHASE &sysindex";
 %end;
 %if "%upcase(&action)" = "PUT" %then %do;
    length &var;
    retain &varcut;
    call pokelong(&varcut.,"&varadr."x, &varlen.);
 %end;
 %else %if "%upcase(&action)" = "GET" %then %do;

    retain &varcut " ";
    &varcut = peekclong("&varadr."x,&varlen.);
    %end;
    put "***CHILD &var &varcut &varlen &SYSDATASTEPPHASE &sysindex";
%mend commonc;

ChrisNZ
Tourmaline | Level 20

Have you looked at the MEMLIB option (Windows only) for libnames ?

 

A SAS option called MEMCACHE also exists for using and managing in-memory data, but I do not recommend using it as it is still immature.

 

I attach below a few excepts from

https://www.amazon.com/High-Performance-SAS-Coding-Christian-Graffeuille/dp/1512397490

where the topic of memory-based data in SAS is covered.

 

MEMLIB

Set As: System Option At Startup, Libname Option (Windows only)

...

If we use this option within the LIBNAME statement, we can create a library in memory with the characteristics mentioned above: high speed with an associated risk of filling up available memory. The syntax is slightly odd because we will have to provide an existing physical path for the library, which will never be used.

 

libname RAMLIB "c:\" memlib;

...

- If you want to keep the data created in your RAM libraries, don’t forget to copy it to a permanent library before ending your SAS session.

 

- When you no longer need your library, make sure to free up the memory by deleting all the files, otherwise the data will stay in memory. This can be done from within SAS by running this program:

 

proc datasets lib=RAMLIB kill nolist;
quit;
libname RAMLIB clear;

 

 

MEMMAXSZ

Set As: System Option At Startup (Windows only)

This option specifies the maximum amount of memory to allocate for memory-based libraries. The memory allocated by MEMMAXSZ is outside of the REALMEMSIZE allocation.

 

MEMBLKSZ

Set As: System Option At Startup (Windows only)

This option sets the memory block size for RAM-based libraries. The value of the MEMBLKSZ system option defines the amount of memory that is initially allocated.

Additional memory can be allocated as needed in multiples of MEMBLKSZ up to the amount of memory that is specified by the MEMMAXSZ option.

...

aaa1.PNG

MattSan
Calcite | Level 5

Thank you for your detailed response. Do you perhaps know if this is possible in a Unix based SAS environment?

boemskats
Lapis Lazuli | Level 10

Hi Matt,

 

It is indeed possible on a UNIX system, but it depends on your flavour of UNIX, and possibly your relationship with your administrator.

 

For Linux you would need to use tmpfs and ask your admin to create a tmpfs mount point for you. If you're on Linux you might have /dev/shm already mounted, but it will be restricted in size to half of the RAM available on your physical machine.

 

I wrote a paper on this for SAS Global Forum last year. You'll find the 'Reporposing Memory' section on Page 6 most interesting as it discusses the different approaches of storing data directly in memory on *nix.

 

Hope that helps.

 

Nik

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 4103 views
  • 4 likes
  • 9 in conversation