SAS Programming

left · Posted 06-22-2020 12:28 PM

I try to set up several hash objects and be able to iterate through them with a "hash of hashes" object. I want to use the same dataset, but with different subsettings (where-clause empty or with some modifications). So I expect to load datasets with different sizes.

For some reason the loaded "where"-clause is not in effect (see output screenshot below). I can't find the reason what I'm missing.

data a;
    do id=1 to 50 by 1;
        value=19;
        output;
    end;
run;

data _null_;
    if 0 then set a;
    if _n_=1 then do;
        declare hash hoh(multidata: "yes");
        hoh.defineKey("table", "where");
        hoh.defineData("h", "table", "where");
        hoh.defineDone();
        declare hiter i_hoh("hoh");

        declare hash h();

        * For comparisons with input dataset --> "h" = rp19;
        table="a";
        where=" ";
        h = _new_ hash(dataset: catx(" ", table, where),
                       multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();

        table="a";
        where="(where=(id<20))";
        h = _new_ hash(dataset: catx(" ", table, where),
                       multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();

        table="a (where=(id>40))";
        where=" ";
        h = _new_ hash(dataset: catx(" ", table, where),
                       multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();
    end;
    * Iterate over "hash of hashes" object to reference the hash objects (rp19, rp23, rp49, rp79) like in an array;
    do while (i_hoh.next() = 0);
        rows = h.num_items;
        put (table rows) (=);
    end;
    stop;
run;

The log window gives me the following output:

RichardDeVen · Posted 06-22-2020 07:43 PM

You create a hash instance that memory leaks (albeit very small leak)

declare hash h();   * create uninitialized instance whilst adding to pdv;

In HoH the hash data element h only needs to be declared. Of course, the _new_ instance must have occurred prior to .add() or .replace() time. So, simply declare the host reference, h, that will interact with your anonymous hashes.

declare hash h;    * add h, a hash reference, to PDV;

The table variable should be $41 to accommodate longest plain <libref>.<dataset>, that being $8 + $1 + $32. If the table value is to further accommodate data set options you might want the length to be more like $200. However, since the where clause is part of the key, you should exhibit discipline and assign only plain data set names to table and use the separate where, say $200, for loading subsets.

As stated elsewhere, the 'master' hash hoh does not need to be multidata in your situation. Also, you do not need to repeatedly code table="<whatever>"; The value will not be altered by a prior hoh.add()

Example

data have;
    do id=1 to 50 by 1;
        value=19;
        output;
    end;
run;

data _null_;
    if 0 then set have;

    length table $41;
    length where $200;

    if _n_=1 then do;
        declare hash hoh();
        hoh.defineKey("table", "where");
        hoh.defineData("h", "table", "where");
        hoh.defineDone();

        declare hiter i_hoh("hoh");

        declare hash h;

        table="have";
        where=" ";
        h = _new_ hash(dataset: catx(" ", table, where), multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();

        where="(where=(id<20))";
        h = _new_ hash(dataset: catx(" ", table, where), multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();

        where="(where=(id>40))"; 
        h = _new_ hash(dataset: catx(" ", table, where), multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();
    end;

    do while (i_hoh.next() = 0);
        rows = h.num_items;
        put (table where rows) (=);
    end;

    stop;
run;

Log

NOTE: There were 50 observations read from the data set WORK.HAVE.
NOTE: There were 19 observations read from the data set WORK.HAVE.
      WHERE id<20;
NOTE: There were 10 observations read from the data set WORK.HAVE.
      WHERE id>40;
table=have where=(where=(id<20)) rows=19
table=have where=(where=(id>40)) rows=10
table=have where=  rows=50

View solution in original post

left · Posted 06-22-2020 12:32 PM

The comment lines/ part of the comment lines can be ignored (non-relevant and maybe misleading):
--> * For comparisons with input dataset --> "h" = rp19;
--> (rp19, rp23, rp49, rp79)

FreelanceReinh · Posted 06-22-2020 12:48 PM

Hi @left,

You just forgot to specify sufficient lengths for the character variables table and where, so that the intended values are truncated to one character.

For example:

length table where $20;

(but I'm sure you actually know that).

mkeintz · Posted 06-22-2020 01:48 PM

Also, I don't think you have any need for multidata:"yes" for hoh, once you have corrected the lengths of table and where.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

left · Posted 06-23-2020 02:08 AM

Hi @mkeintz,

In my first attempt I built the "hoh" in the following manner:

declare hash hoh(multidata: "no");
hoh.defineKey("table");
hoh.defineData("h", "table");
hoh.defineDone();

That resulted in an error in the log file:

NOTE: There were 50 observations read from the data set WORK.A.
NOTE: There were 19 observations read from the data set WORK.A.
      WHERE id<20;
ERROR: Duplicate key.
NOTE: There were 10 observations read from the data set WORK.A.
      WHERE id>40;
ERROR: Duplicate key.
table=a rows=50
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.04 seconds

Therefore I changed the multidata option from "no" to "yes":

declare hash hoh(multidata: "yes");
hoh.defineKey("table");
hoh.defineData("h", "table");
hoh.defineDone();

This results in the following log output:

NOTE: There were 50 observations read from the data set WORK.A.
NOTE: There were 19 observations read from the data set WORK.A.
      WHERE id<20;
NOTE: There were 10 observations read from the data set WORK.A.
      WHERE id>40;
table=a rows=50
table=a rows=19
table=a rows=10
NOTE: DATA statement used (Total process time):
      real time           0.06 seconds
      cpu time            0.04 seconds

However, splitting the key "table" into 2 parts ("table" + "where") requires no more multidata: "yes" as you highlighted:

declare hash hoh(multidata: "no");
hoh.defineKey("table", "where");
hoh.defineData("h", "table", "where");
hoh.defineDone();

Therefore multidata option can be omitted:

declare hash hoh();
hoh.defineKey("table", "where");
hoh.defineData("h", "table", "where");
hoh.defineDone();

RichardDeVen · Posted 06-22-2020 07:43 PM

You create a hash instance that memory leaks (albeit very small leak)

declare hash h();   * create uninitialized instance whilst adding to pdv;

In HoH the hash data element h only needs to be declared. Of course, the _new_ instance must have occurred prior to .add() or .replace() time. So, simply declare the host reference, h, that will interact with your anonymous hashes.

declare hash h;    * add h, a hash reference, to PDV;

The table variable should be $41 to accommodate longest plain <libref>.<dataset>, that being $8 + $1 + $32. If the table value is to further accommodate data set options you might want the length to be more like $200. However, since the where clause is part of the key, you should exhibit discipline and assign only plain data set names to table and use the separate where, say $200, for loading subsets.

As stated elsewhere, the 'master' hash hoh does not need to be multidata in your situation. Also, you do not need to repeatedly code table="<whatever>"; The value will not be altered by a prior hoh.add()

Example

data have;
    do id=1 to 50 by 1;
        value=19;
        output;
    end;
run;

data _null_;
    if 0 then set have;

    length table $41;
    length where $200;

    if _n_=1 then do;
        declare hash hoh();
        hoh.defineKey("table", "where");
        hoh.defineData("h", "table", "where");
        hoh.defineDone();

        declare hiter i_hoh("hoh");

        declare hash h;

        table="have";
        where=" ";
        h = _new_ hash(dataset: catx(" ", table, where), multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();

        where="(where=(id<20))";
        h = _new_ hash(dataset: catx(" ", table, where), multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();

        where="(where=(id>40))"; 
        h = _new_ hash(dataset: catx(" ", table, where), multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();
    end;

    do while (i_hoh.next() = 0);
        rows = h.num_items;
        put (table where rows) (=);
    end;

    stop;
run;

Log

NOTE: There were 50 observations read from the data set WORK.HAVE.
NOTE: There were 19 observations read from the data set WORK.HAVE.
      WHERE id<20;
NOTE: There were 10 observations read from the data set WORK.HAVE.
      WHERE id>40;
table=have where=(where=(id<20)) rows=19
table=have where=(where=(id>40)) rows=10
table=have where=  rows=50

SAS Programming

Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

SAS Hash Object Debugging Tips

[SAS 고급] Hash object 방법(Hash object Methods)

Comparing date in where clause

Hash object internal hash function

Hash tables

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...