BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

There seems to be a bug, introduced with M7, which influences memory consumption while reading a dataset with a WHERE= dataset option.

See this code:

options fullstimer;

data TEST_BIGDS (drop=i);
    length CHAR1 $5. CHAR2 $10. KEY $16. DATE 5.;
    format KEY $HEX32. DATE DDMMYYP10.;
    do i=1 to 16195392;
        if i le 7825 then CHAR1="FGHIJ";
        else CHAR1="ABCDE";
        CHAR2="KLMNO";
        KEY=put(i,8.);
        DATE=today();
        output;
    end;
run;

data TEST_SMALLDS;
    length CHAR3 $5. KEY $16.;
    format KEY $HEX32.;
    CHAR3="PQRST";
    do i=1 to 20000;
        KEY=put(i,8.);
        output;
    end;
run;

data HASH_LOOKUP (drop=rc);
    length CHAR2 $10. DATE 5.;
    format DATE DDMMYYP10.;
    set TEST_SMALLDS;
    if _n_ eq 1 then do;
        declare hash H (dataset: "TEST_BIGDS (where=(CHAR1 eq 'FGHIJ'))");
        H.defineKey("KEY");
        H.defineData("CHAR2","DATE");
        H.defineDone();
        call missing(CHAR2,DATE);
    end;
    rc=H.find();
run;

Code like this suddenly crashed after upgrading from M6 to M7 (on AIX).

Since we had a backup server still on M6, we could make a comparative test.

This is the log from M6:

26   data HASH_LOOKUP (drop=rc);
27       length CHAR2 $10. DATE 5.;
28       format DATE DDMMYYP10.;
29       set TEST_SMALLDS;
30       if _n_ eq 1 then do;
31           declare hash H (dataset: "TEST_BIGDS (where=(CHAR1 eq 'FGHIJ'))");
32           H.defineKey("KEY");
33           H.defineData("CHAR2","DATE");
34           H.defineDone();
35           call missing(CHAR2,DATE);
36       end;
37       rc=H.find();
38   run;

NOTE: There were 7825 observations read from the data set WORK.TEST_BIGDS.
      WHERE CHAR1='FGHIJ';
NOTE: There were 20000 observations read from the data set WORK.TEST_SMALLDS.
NOTE: The data set WORK.HASH_LOOKUP has 20000 observations and 5 variables.
NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
      real time           4.50 seconds
      user cpu time       0.28 seconds
      system cpu time     0.31 seconds
     memory              1834.62k
     OS Memory           12276.00k
      Timestamp           04.03.2021 05:38:31 nachm.
      Step Count                        3  Switch Count  46
      Page Faults                       1064
      Page Reclaims                     1192
      Page Swaps                        0
      Voluntary Context Switches        125
      Involuntary Context Switches      340
      Block Input Operations            0
      Block Output Operations           0

and this from M7:

254        data HASH_LOOKUP (drop=rc);
255            length CHAR2 $10. DATE 5.;
256            format DATE DDMMYYP10.;
257            set TEST_SMALLDS;
258            if _n_ eq 1 then do;
259                declare hash H (dataset: "TEST_BIGDS (where=(CHAR1 eq 'FGHIJ'))");
260                H.defineKey("KEY");
261                H.defineData("CHAR2","DATE");
262                H.defineDone();
263                call missing(CHAR2,DATE);
264            end;
265            rc=H.find();
266        run;

NOTE: There were 7825 observations read from the data set WORK.TEST_BIGDS.
      WHERE CHAR1='FGHIJ';
NOTE: There were 20000 observations read from the data set WORK.TEST_SMALLDS.
NOTE: The data set WORK.HASH_LOOKUP has 20000 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           1.82 seconds
      user cpu time       0.67 seconds
      system cpu time     0.11 seconds
     memory              539046.65k
     OS Memory           553580.00k
      Timestamp           03/04/2021 05:34:02 PM
      Step Count                        4  Switch Count  0
      Page Faults                       0
      Page Reclaims                     131388
      Page Swaps                        0
      Voluntary Context Switches        2
      Involuntary Context Switches      73
      Block Input Operations            0
      Block Output Operations           0

This can cause (batch) jobs to fail when the MEMSIZE is not sufficient to deal with this sudden increase.

Creating an intermediate dataset with the WHERE condition is a suitable workaround for the moment.

 

Thanks to @ccaero who found this and created the test.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisHemedinger
Community Manager

SAS confirmed the problem and issued a SAS Note 67620 " A hash object in SAS® 9.4M7 (TS1M7) might consume significantly more memory than it did in previous releases"

 

From the note:

This problem occurs when the HASHEXP method is not specified when defining the hash object. The amount of memory that is allocated can vary, depending on the data within the defined hash.

The workaround is to define the hash object with the HASHEXP method, as illustrated by the syntax fragment below:

data HASH_LOOKUP;
   if _n_ eq 1 then do;
   declare hash H (dataset: "TEST_DSET (where=(CHAR1 eq 'ABCDE'))",HASHEXP:8);
   ...more code...

If the HASHEXP method is not specified in the declaration, a default value of 8 is used. However, specifying HASHEXP:8 (the default) in the DECLARE statement will dramatically reduce the step memory footprint than not coding the method. The value for the HASHEXP method depends on usage. It is recommended that you test different values to find the optimal value for each case.

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!

View solution in original post

6 REPLIES 6
ChrisHemedinger
Community Manager

Hi @Kurt_Bremser has this been reported to SAS Tech Support? If not, I'll be happy to open a track for it.

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!
ccaero
Fluorite | Level 6

Hi @ChrisHemedinger ,
I opened a support-track on this topic today, the ticket number is 7613295889.

raks2301
Calcite | Level 5
What was the outcome of the track ?
ChrisHemedinger
Community Manager

SAS confirmed the problem and issued a SAS Note 67620 " A hash object in SAS® 9.4M7 (TS1M7) might consume significantly more memory than it did in previous releases"

 

From the note:

This problem occurs when the HASHEXP method is not specified when defining the hash object. The amount of memory that is allocated can vary, depending on the data within the defined hash.

The workaround is to define the hash object with the HASHEXP method, as illustrated by the syntax fragment below:

data HASH_LOOKUP;
   if _n_ eq 1 then do;
   declare hash H (dataset: "TEST_DSET (where=(CHAR1 eq 'ABCDE'))",HASHEXP:8);
   ...more code...

If the HASHEXP method is not specified in the declaration, a default value of 8 is used. However, specifying HASHEXP:8 (the default) in the DECLARE statement will dramatically reduce the step memory footprint than not coding the method. The value for the HASHEXP method depends on usage. It is recommended that you test different values to find the optimal value for each case.

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!
raks2301
Calcite | Level 5

Whats the outcome of the track?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1052 views
  • 4 likes
  • 4 in conversation