Solved: Hash of hashes: Load identical file multiple times with different wher...

left · Posted 06-22-2020 12:28 PM

I try to set up several hash objects and be able to iterate through them with a "hash of hashes" object. I want to use the same dataset, but with different subsettings (where-clause empty or with some modifications). So I expect to load datasets with different sizes.

For some reason the loaded "where"-clause is not in effect (see output screenshot below). I can't find the reason what I'm missing.

data a;
    do id=1 to 50 by 1;
        value=19;
        output;
    end;
run;

data _null_;
    if 0 then set a;
    if _n_=1 then do;
        declare hash hoh(multidata: "yes");
        hoh.defineKey("table", "where");
        hoh.defineData("h", "table", "where");
        hoh.defineDone();
        declare hiter i_hoh("hoh");

        declare hash h();

        * For comparisons with input dataset --> "h" = rp19;
        table="a";
        where=" ";
        h = _new_ hash(dataset: catx(" ", table, where),
                       multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();

        table="a";
        where="(where=(id<20))";
        h = _new_ hash(dataset: catx(" ", table, where),
                       multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();

        table="a (where=(id>40))";
        where=" ";
        h = _new_ hash(dataset: catx(" ", table, where),
                       multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();
    end;
    * Iterate over "hash of hashes" object to reference the hash objects (rp19, rp23, rp49, rp79) like in an array;
    do while (i_hoh.next() = 0);
        rows = h.num_items;
        put (table rows) (=);
    end;
    stop;
run;

The log window gives me the following output:

RichardDeVen · Posted 06-22-2020 07:43 PM

You create a hash instance that memory leaks (albeit very small leak)

declare hash h();   * create uninitialized instance whilst adding to pdv;

In HoH the hash data element h only needs to be declared. Of course, the _new_ instance must have occurred prior to .add() or .replace() time. So, simply declare the host reference, h, that will interact with your anonymous hashes.

declare hash h;    * add h, a hash reference, to PDV;

The table variable should be $41 to accommodate longest plain <libref>.<dataset>, that being $8 + $1 + $32. If the table value is to further accommodate data set options you might want the length to be more like $200. However, since the where clause is part of the key, you should exhibit discipline and assign only plain data set names to table and use the separate where, say $200, for loading subsets.

As stated elsewhere, the 'master' hash hoh does not need to be multidata in your situation. Also, you do not need to repeatedly code table="<whatever>"; The value will not be altered by a prior hoh.add()

Example

data have;
    do id=1 to 50 by 1;
        value=19;
        output;
    end;
run;

data _null_;
    if 0 then set have;

    length table $41;
    length where $200;

    if _n_=1 then do;
        declare hash hoh();
        hoh.defineKey("table", "where");
        hoh.defineData("h", "table", "where");
        hoh.defineDone();

        declare hiter i_hoh("hoh");

        declare hash h;

        table="have";
        where=" ";
        h = _new_ hash(dataset: catx(" ", table, where), multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();

        where="(where=(id<20))";
        h = _new_ hash(dataset: catx(" ", table, where), multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();

        where="(where=(id>40))"; 
        h = _new_ hash(dataset: catx(" ", table, where), multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();
    end;

    do while (i_hoh.next() = 0);
        rows = h.num_items;
        put (table where rows) (=);
    end;

    stop;
run;

Log

NOTE: There were 50 observations read from the data set WORK.HAVE.
NOTE: There were 19 observations read from the data set WORK.HAVE.
      WHERE id<20;
NOTE: There were 10 observations read from the data set WORK.HAVE.
      WHERE id>40;
table=have where=(where=(id<20)) rows=19
table=have where=(where=(id>40)) rows=10
table=have where=  rows=50

View solution in original post

left · Posted 06-22-2020 12:32 PM

The comment lines/ part of the comment lines can be ignored (non-relevant and maybe misleading):
--> * For comparisons with input dataset --> "h" = rp19;
--> (rp19, rp23, rp49, rp79)

FreelanceReinh · Posted 06-22-2020 12:48 PM

Hi @left,

You just forgot to specify sufficient lengths for the character variables table and where, so that the intended values are truncated to one character.

For example:

length table where $20;

(but I'm sure you actually know that).

mkeintz · Posted 06-22-2020 01:48 PM

Also, I don't think you have any need for multidata:"yes" for hoh, once you have corrected the lengths of table and where.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

left · Posted 06-23-2020 02:08 AM

Hi @mkeintz,

In my first attempt I built the "hoh" in the following manner:

declare hash hoh(multidata: "no");
hoh.defineKey("table");
hoh.defineData("h", "table");
hoh.defineDone();

That resulted in an error in the log file:

NOTE: There were 50 observations read from the data set WORK.A.
NOTE: There were 19 observations read from the data set WORK.A.
      WHERE id<20;
ERROR: Duplicate key.
NOTE: There were 10 observations read from the data set WORK.A.
      WHERE id>40;
ERROR: Duplicate key.
table=a rows=50
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.04 seconds

Therefore I changed the multidata option from "no" to "yes":

declare hash hoh(multidata: "yes");
hoh.defineKey("table");
hoh.defineData("h", "table");
hoh.defineDone();

This results in the following log output:

NOTE: There were 50 observations read from the data set WORK.A.
NOTE: There were 19 observations read from the data set WORK.A.
      WHERE id<20;
NOTE: There were 10 observations read from the data set WORK.A.
      WHERE id>40;
table=a rows=50
table=a rows=19
table=a rows=10
NOTE: DATA statement used (Total process time):
      real time           0.06 seconds
      cpu time            0.04 seconds

However, splitting the key "table" into 2 parts ("table" + "where") requires no more multidata: "yes" as you highlighted:

declare hash hoh(multidata: "no");
hoh.defineKey("table", "where");
hoh.defineData("h", "table", "where");
hoh.defineDone();

Therefore multidata option can be omitted:

declare hash hoh();
hoh.defineKey("table", "where");
hoh.defineData("h", "table", "where");
hoh.defineDone();

RichardDeVen · Posted 06-22-2020 07:43 PM

You create a hash instance that memory leaks (albeit very small leak)

declare hash h();   * create uninitialized instance whilst adding to pdv;

In HoH the hash data element h only needs to be declared. Of course, the _new_ instance must have occurred prior to .add() or .replace() time. So, simply declare the host reference, h, that will interact with your anonymous hashes.

declare hash h;    * add h, a hash reference, to PDV;

The table variable should be $41 to accommodate longest plain <libref>.<dataset>, that being $8 + $1 + $32. If the table value is to further accommodate data set options you might want the length to be more like $200. However, since the where clause is part of the key, you should exhibit discipline and assign only plain data set names to table and use the separate where, say $200, for loading subsets.

As stated elsewhere, the 'master' hash hoh does not need to be multidata in your situation. Also, you do not need to repeatedly code table="<whatever>"; The value will not be altered by a prior hoh.add()

Example

data have;
    do id=1 to 50 by 1;
        value=19;
        output;
    end;
run;

data _null_;
    if 0 then set have;

    length table $41;
    length where $200;

    if _n_=1 then do;
        declare hash hoh();
        hoh.defineKey("table", "where");
        hoh.defineData("h", "table", "where");
        hoh.defineDone();

        declare hiter i_hoh("hoh");

        declare hash h;

        table="have";
        where=" ";
        h = _new_ hash(dataset: catx(" ", table, where), multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();

        where="(where=(id<20))";
        h = _new_ hash(dataset: catx(" ", table, where), multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();

        where="(where=(id>40))"; 
        h = _new_ hash(dataset: catx(" ", table, where), multidata: "no");
        h.defineKey("id");
        h.defineData(all: "yes");
        h.defineDone();
        hoh.add();
    end;

    do while (i_hoh.next() = 0);
        rows = h.num_items;
        put (table where rows) (=);
    end;

    stop;
run;

Log

NOTE: There were 50 observations read from the data set WORK.HAVE.
NOTE: There were 19 observations read from the data set WORK.HAVE.
      WHERE id<20;
NOTE: There were 10 observations read from the data set WORK.HAVE.
      WHERE id>40;
table=have where=(where=(id<20)) rows=19
table=have where=(where=(id>40)) rows=10
table=have where=  rows=50

Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Registration is open

Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Re: Hash of hashes: Load identical file multiple times with different where-clause

Registration is open

SAS Training: Just a Click Away