BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Afshin
Calcite | Level 5

I am processing a big data and during the processing a lot of data sets will be created that are not required for the results. They are required for the calculations.

Because of I/O speed limitation (and storage limitation) I have broke the data into smaller chunks. And to improve speed I have defined a library on the memory using memlib. 

 

I use a data step and call execute to run a macro on each chunk of data. The datasets required for calculations require roughly 100MB, and will be replaced in each iteration. 

 

The problem is SAS will crashes when I run the data step. I am sure it is not because of the datasets storage. 

I tested by limiting the observations in data step (i.e. iterations ), when I put obs=25 on average SAS uses 140MB of memory. when obs=200 it uses the 1.2GB and if it is above 200 it will crash. 

I run the code for 25 steps in each data step up to 300 and it just uses 140 MB. So it is not the datasets on the memory. But when I put obs=300 memory usage goes up above 1.2GB at the early stage just a seconds and SAS crashes and closes. 

I suspect SAS do not know how much the macro uses the memory, so it buffer the data base on the data set and when starts the macro it will crash. 

 

I couldn't solve the problem by BUFSIZE, or BUFFNO.

data _null_  ; set ric_date(obs=200);

call symput('taq_day', ric_date); 

***;
AXJO_date =  catt( 'Mkt_' ,substr(ric_date, index( ric_date, '_') +1));

call symput('Mkt_day', AXJO_date);


****;

if substr(ric_date, index( ric_date, '_') +1) = substr(AXJO_date, index( ric_date, '_') +1) then

  call execute('%HFEQ_ALL');

/*%HFEQ_ALL*/
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

It seems pretty clear that the memory problems result from stacking up too much code.  Some of the proposed solutions contain pitfalls and without getting into the pitfalls, here is a way to run all the macro calls accurately without having to break up the data into chunks:

 

%macro run_all;
   %local i nobs taq_day mkt_day run_flag;
   data _null_;   
      set ric_date nobs=_nobs_;
      call symputx('nobs', _nobs_);
      stop;
   run;

   %do i=1 %to &nobs;   
      %let run_flag = N;
     data _null_  ;
          i = &i;
         set ric_date point=i;
        call symputx('taq_day', ric_date); 
***;
        AXJO_date =  catt( 'Mkt_' ,substr(ric_date, index( ric_date, '_') +1));
        call symputx('Mkt_day', AXJO_date);
****;
        if substr(ric_date, index( ric_date, '_') +1) = 
        substr(AXJO_date, index( ric_date, '_') +1) then
        call call symputx('run_flag', 'Y');
       stop;
   run;

      %if &run_flag=Y %then %HFEQ_ALL;  
   %end;
%mend;

%run_all

 

View solution in original post

13 REPLIES 13
Astounding
PROC Star
Your sample program calls the macro 200 times (not just.once for a chunk of 200 observations). If you want help restructuring the program, you will have to supply details of what the macro is supposed to accomplish.
Afshin
Calcite | Level 5

Thanks. The Data step is just for input the name of datasets into the macro. The macro is 800 line code with several datasteps and procs.

 

I put some part of it as example. But the puzzle is that if run the data step with 100 obs each time (100 time running the macro), it does not fill the mmory but when I put obs=max it fills the memory and crashes. 

the mem library is on the memory. 
each step of macro does not need more than 100MB for the datasets. 

at the end of the macro I move them to dataset on the storage and they will replaced in the next macro call.

 

%macro HFEQ_ALL;


	Data mem.taq; set split.&taq_day; run;
	data mem.indices; set split.&Mkt_day; run;

	data mem.markers ; set mem.taq;
		if first.date; 
								
		%do i=((10*60*60)+(15)) %to ((16*60*60)); 
			miliseconds=&i*1000; output; 
		%end;
	run;

...
...
...
...

data HFAT.HFT; set HFAT.HFT mem.HFT; run;
data HFAT.spreads; set HFAT.spreads mem.spreads; run;
data HFAT.inefficiencyMetrics; set HFAT.inefficiencyMetrics mem.inefficiencyMetrics; run;



%mend HFEQ_ALL;

 

Patrick
Opal | Level 21

@Afshin 

From what you describe memory doesn't get released as you expect.

One way to go: Assign and clear your memlib within the macro.

%macro HFEQ_ALL;
  libname mem '<your path>' memlib;
  .....
  libname mem clear;
%mend HFEQ_ALL;

 

And just as a side note:

When calling a macro without brackets for parameter passing (even if empty) then I'd always end the call with a semicolon like done below.

call execute('%HFEQ_ALL;');

 

Afshin
Calcite | Level 5

the memory won't release but it is replacing by new data set. Just to make sure I added the " mem clear" and didn't work.

I dont think the datasets on memory are problem. The code works fine when I run 200 observation each time. for examle 1-200, 200-400, 400-600 ... and the momory usage by sas stays around 1.2GB. 

when I put obs=max which is milions observation. in couple of seconds it crashes, not after many steps. 

To make it clear: 
Observations                SAS memory usage in task manager
firstobs=1 obs=25  --->                    250 MB    ( constant during the process)

firstobs=26 obs=50  --->                  250 MB                         ~

firstobs=51 obs=75  --->                  250 MB

firstobs=76 obs=100  --->                250 MB

firstobs=101 obs=125  --->                250 MB

firstobs=126 obs=150  --->                250 MB

firstobs=151 obs=175 --->                250 MB

firstobs=176 obs=200  --->                250 MB

 

 

firstobs=1 obs=200  --->                1.2GB        (Constant during the process)

 

firstobs=1 obs=max  --->              goes above  1.2GB   and crashes in couple of seconds- crash in first data step 

 

 

Patrick
Opal | Level 21

@Afshin wrote:

......

firstobs=1 obs=200  --->                1.2GB        (Constant during the process)

firstobs=1 obs=max  --->              goes above  1.2GB   and crashes in couple of seconds- crash in first data step 


 

Ah, o.k., then you've got simply not enough memory available for what you're trying to do. If already a limitation of 200 obs takes you to 1.2GB of memory consumption then I don't believe there will be another way than to either keep your tables on disk or then to fully re-design your code so that it consumes much less memory and you only load the data into memory which you need for processing.

It's also not so that loading data into memory always speeds up processing. For example your first data steps loading the data from disk into memory also take time - time you could already use for data step processing in the same data step.

 

To see what's available to you you can execute:

proc options group=memory;run;

Tom
Super User Tom
Super User

How many lines of SAS code does that macro generate everytime you call it?

Why are you pushing the resulting code into the program stack over 200 times instead of just pushing the macro call?

I am not sure where SAS stores that stack of code that is waiting to run after the data step with the CALL EXECUTE() function calls finishes, but perhaps it is in memory and so perhaps that is what is causing the crash.

call execute('%nrstr(%HFEQ_ALL);');

Also, just as a side issue, why not define the macro to take TAG_DAY and MKT_DAY as parameters instead of having it assume that those macro variables already exist when the macro starts?

 

You can also just write the lines of code to a file and %INCLUDE the file instead of using CALL EXECUTE().  Then you can take advantage of the PUT statement to help generate the code you want to run.

%macro HFEQ_ALL(taq_day,Mkt_day);
...
%mend;

filename code temp;
data _null_  ; 
  set ric_date ;
  taq_day=ric_date;
  AXJO_date =  catt( 'Mkt_' ,substr(taq_day, index( taq_day, '_') +1));
  if substr(taq_day, index( taq_day, '_') +1)
   = substr(AXJO_date, index( taq_day, '_') +1) then do;
    put '%HFEQ_ALL(' taq_day= ',' AXJO_date= ')' ;
  end;
run;
%include code / source2;

PS the test in your IF condition does not make sense to me. Didn't you just set the end of AXJO_DATE with the end of TAQ_DAY (ric_date)?  How could they get to be different? You don't seem to have defined a length for the string AXJO_DATE are you testing if the value got truncated?

Astounding
PROC Star

It seems pretty clear that the memory problems result from stacking up too much code.  Some of the proposed solutions contain pitfalls and without getting into the pitfalls, here is a way to run all the macro calls accurately without having to break up the data into chunks:

 

%macro run_all;
   %local i nobs taq_day mkt_day run_flag;
   data _null_;   
      set ric_date nobs=_nobs_;
      call symputx('nobs', _nobs_);
      stop;
   run;

   %do i=1 %to &nobs;   
      %let run_flag = N;
     data _null_  ;
          i = &i;
         set ric_date point=i;
        call symputx('taq_day', ric_date); 
***;
        AXJO_date =  catt( 'Mkt_' ,substr(ric_date, index( ric_date, '_') +1));
        call symputx('Mkt_day', AXJO_date);
****;
        if substr(ric_date, index( ric_date, '_') +1) = 
        substr(AXJO_date, index( ric_date, '_') +1) then
        call call symputx('run_flag', 'Y');
       stop;
   run;

      %if &run_flag=Y %then %HFEQ_ALL;  
   %end;
%mend;

%run_all

 

Tom
Super User Tom
Super User

Seems like a lot of effort when you can just use %NRSTR() to prevent SAS from generating too many lines of code.

Because the macro doesn't use parameters you need also protect the setting of the macro variables so that they execute in the right order also.

call execute(cats('%nrstr(%let) taq_day=',ric_date,';'));
call execute(cats('%nrstr(%let) Mkt_day=',AXJO_date,';'));
call execute('%nrstr(%HFEQ_ALL);');

Or just write the code to a file instead of using CALL EXECUTE.

Astounding
PROC Star

Tom,

 

I like this version a lot.  Do you want to explain to the rest of the board what the issues are, why this is necessary, and show the entire (slightly longer) program?  If not, I can do it later today.

Tom
Super User Tom
Super User

So here is a little demo of the timing issues around CALL EXECUTE and CALL SYMPUTX().

First let's define a little macro that references some macro variable that is NOT one of its parameters.

%macro mymac;
data _null_;
* Here is the code the macro generates ;
  mvarvalue=symget('mymvar');
  put mvarvalue=;
run;
%mend mymac;

Now let's try a little data step that sets values to the macro variable using CALL SYMPUTX() and calls the macro using CALL EXECUTE().

data _null_;
  call symput('mymvar','First Value');
  call execute('%mymac;');
  call symput('mymvar','Second Value');
  call execute('%mymac;');
run;

Notice how the code that the macro generates is echoed to the log with the + sign in front of it. Also notice how both runs of the macro used the second value of MYMVAR.

9     data _null_;
10      call symput('mymvar','First Value');
11      call execute('%mymac;');
12      call symput('mymvar','Second Value');
13      call execute('%mymac;');
14    run;

NOTE: DATA statement used (Total process time):
      real time           0.25 seconds
      cpu time            0.01 seconds


NOTE: CALL EXECUTE generated line.
1    + data _null_; * Here is the code the macro generates ;   mvarvalue=symget('mymvar');   put
mvarvalue=; run;

mvarvalue=Second Value
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


1    +
;
2    + data _null_; * Here is the code the macro generates ;   mvarvalue=symget('mymvar');   put
mvarvalue=; run;

mvarvalue=Second Value
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


2    +
;

15

Now if we use %NRSTR() to prevent the macro call from being expanded while the code it being pushed onto the stack to run.

data _null_;
  call symput('mymvar','First Value');
  call execute('%nrstr(%mymac);');
  call symput('mymvar','Second Value');
  call execute('%nrstr(%mymac);');
run;

Notice how now the lines with + in front just show the call to the macro.  But the result is still just showing the second value assigned to MYMVAR.

16    data _null_;
17      call symput('mymvar','First Value');
18      call execute('%nrstr(%mymac);');
19      call symput('mymvar','Second Value');
20      call execute('%nrstr(%mymac);');
21    run;

NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


NOTE: CALL EXECUTE generated line.
1    + %mymac;

mvarvalue=Second Value
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


2    + %mymac;

mvarvalue=Second Value
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds



Now if we use %NRSTR() around %LET to generate a %LET statement.

data _null_;
  call execute(cats('%nrstr(%let) mymvar=','First Value',';'));
  call execute('%nrstr(%mymac);');
  call execute(cats('%nrstr(%let) mymvar=','Second Value',';'));
  call execute('%nrstr(%mymac);');
run;

We see that the macro variable MYMVAR is updated AFTER the first run of the macro.

22    data _null_;
23      call execute(cats('%nrstr(%let) mymvar=','First Value',';'));
24      call execute('%nrstr(%mymac);');
25      call execute(cats('%nrstr(%let) mymvar=','Second Value',';'));
26      call execute('%nrstr(%mymac);');
27    run;

NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


NOTE: CALL EXECUTE generated line.
1    + %let mymvar=First Value;
2    + %mymac;

mvarvalue=First Value
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


3    + %let mymvar=Second Value;
4    + %mymac;

mvarvalue=Second Value
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

Finally we can just skip CALL EXECUTE() and use PUT to write the code and %INCLUDE to run it.

filename code temp;
data _null_;
  file code;
  put '%let mymvar=First Value;';
  put '%mymac;';
  put '%let mymvar=Second Value;';
  put '%mymac;';
run;
%include code / source2;

Now we have the full power of the PUT statement to make generating the code easier.  We can stop after the data step and examine the generated code and make sure our code generation logic is correct.  We can generate code that is formatted the way we like so that the log is easier to read. etc. etc.

37    filename code temp;
38    data _null_;
39      file code;
40      put '%let mymvar=First Value;';
41      put '%mymac;';
42      put '%let mymvar=Second Value;';
43      put '%mymac;';
44    run;

NOTE: The file CODE is:

      Filename=C:\Users\ABERNA~1\AppData\Local\Temp\1\SAS Temporary
      Files\_TD20432_AMRL20L6F1E4992_\#LN00058,
      RECFM=V,LRECL=32767,File Size (bytes)=0,
      Last Modified=01Jul2019:09:52:45,
      Create Time=01Jul2019:09:52:45

NOTE: 4 records were written to the file CODE.
      The minimum record length was 7.
      The maximum record length was 25.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


45    %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file C:\Users\ABERNA~1\AppData\Local\Temp\1\SAS Temporary
      Files\_TD20432_AMRL20L6F1E4992_\#LN00058.
46   +%let mymvar=First Value;
47   +%mymac;

mvarvalue=First Value
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


48   +%let mymvar=Second Value;
49   +%mymac;

mvarvalue=Second Value
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


NOTE: %INCLUDE (level 1) ending.

PS If you define your macro to take its inputs as parameters instead of referencing "magic" macro variables that are assumed to have been created in advance then one aspect of the timing problem is eliminated.

data _null_;
  call execute('%nrstr(%mymac)(mymvar=First Value)');
  call execute('%nrstr(%mymac)(mymvar=Second Value)');
run;
Afshin
Calcite | Level 5
	Data RDWork.TAQ1; 
		pnt=&pt;
		set RDWork.TAQ  point=pnt ;

	if month(date)=12 then stop;

	run;

I am trying to do a change in code using the POINT that you used. But the data step doesn't stop and goes to end.

Astounding
PROC Star

Try posting the full question as a new question, instead of adding on to an existing question.  Most posters will just skip over questions that have already been answered.

 

It's probably something simple, but without the details (where does &PT come from, what is in DATE, what is the program supposed to do), it's impossible to answer.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 2546 views
  • 2 likes
  • 5 in conversation