BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
left
Obsidian | Level 7

Being still novel to hashing I wonder if anyone can make my attempt for a left-join more elegant and/ or efficient.

 

Here is what I have (first generating sample data as preparation called [have_source], then calculate needed structure in dataset [have]) :

 

* Generate sample data;
data have_source (drop=i);
    do id=1 to 1000;
        do i=65 to 65+6;                            * Generate letters A-G;
            do arefpce=1 to 9;                      * [arefpce] = episode counter (episode can be x days);
                paramcd=byte(i);
                aval=rand("integer", 0, 15);        * Random number of events within one episode of type "paramcd";
                output;
            end;
        end;
    end;
run;

 

 

Create needed structure using data [have_source]:

 

data have (drop= aval_:);
    do until (last.paramcd);
        set have_source;
        by id paramcd arefpce;
        * Summarize number of events in episodes 2-3;
        if (arefpce >= 2 AND arefpce <= 3) then do;
            aval_23=sum(aval_23, aval);
        end;
        * Summarize number of events in episodes 4-9;
        if (arefpce >= 4 AND arefpce <= 9) then do;
            aval_49=sum(aval_49, aval);
        end;
        * Summarize number of events in episodes 7-9;
        if (arefpce >= 7 AND arefpce <= 9) then do;
            aval_79=sum(aval_79, aval);
        end;
    end;
    * Print summarized results for each time frame ([arefpce]=2-3, 4-9, 7-9) separately to output;
    arefpce=203;
    aval=aval_23;
    output;
    arefpce=409;
    aval=aval_49;
    output;
    arefpce=709;
    aval=aval_79;
    output;
    * Run through dataset again to output "detail" data where calculations were based on;
    do until (last.paramcd);
        set have;
        by id paramcd arefpce;
        output;
    end;
run;

 

 

Sort it:

 

* Sort dataset to have the detail data before the summarized data;
proc sort data=have;
    by id paramcd arefpce;
run;

 

The first few lines look like this:

Dataset "have" (first few lines)Dataset "have" (first few lines)

 

 

------------------

Now the "want" part: A left join using subsets of the [have] dataset. 

 

First the SQL solution (serving as reference):

 

proc sql;
    create table want_sql as
           select l.id
                , l.paramcd
                , l.aval    as aval23
                , r49.aval  as aval49
                , r79.aval  as aval79
           from           have (where=(arefpce eq 203)) as l
                left join have (where=(arefpce eq 409)) as r49 on l.id eq r49.id and l.paramcd eq r49.paramcd
                left join have (where=(arefpce eq 709)) as r79 on l.id eq r79.id and l.paramcd eq r79.paramcd;
quit;

 

 

The same using base SAS:

data want_base (drop=arefpce:);
    merge have (where=(arefpce_23 = 203) rename=(aval=aval_23 arefpce=arefpce_23) in=h_23)
          have (where=(arefpce_49 = 409) rename=(aval=aval_49 arefpce=arefpce_49) in=h_49)
          have (where=(arefpce_79 = 709) rename=(aval=aval_79 arefpce=arefpce_79) in=h_79);
    by id paramcd;
    if h_23;
run;

The output looks like this:

Dataset [want] (first few lines)Dataset [want] (first few lines)

 

--------------------------------

My attempt on hashing works, but is lacking clarity/ efficiency and also not so concise than I imagine it could be:

* Want --> Left join using hashing (not SQL) in an elegant and/ or efficient manner;
data _null_;
    if 0 then set have (keep=id paramcd /*aval*/)
                  have (keep=id aval rename=(aval=aval23))
                  have (keep=id aval rename=(aval=aval49))
                  have (keep=id aval rename=(aval=aval79));
    length table  $41
           _where $200;
    if _n_=1 then do;
        declare hash h_out(ordered: "yes");
        h_out.defineKey("id", "paramcd");
        h_out.defineData("id", "paramcd", "aval23", "aval49", "aval79");
        h_out.defineDone();

        table="have";
        _where="(where=(arefpce=203) rename=(aval=aval23))";
        declare hash h23(dataset: cats(table, _where));
        h23.defineKey("id", "paramcd");
        h23.defineData("aval23");
        h23.defineDone();

        _where="(where=(arefpce=409) rename=(aval=aval49))";
        declare hash h49(dataset: cats(table, _where));
        h49.defineKey("id", "paramcd");
        h49.defineData("aval49");
        h49.defineDone();

        _where="(where=(arefpce=709) rename=(aval=aval79))";
        declare hash h79(dataset: cats(table, _where));
        h79.defineKey("id", "paramcd");
        h79.defineData("aval79");
        h79.defineDone();

        call missing(aval23, aval49, aval79);
    end;
    do until (done);
        set have (where=(arefpce=203)) end=done;
        rc=h23.find();
        if (rc=0) then h_out.replace();
        if (rc=0 AND h49.find()=0) then h_out.replace();
        else do;
            call missing(aval49);
            h_out.replace();
        end;
        if (rc=0 AND h79.find()=0) then h_out.replace();
        else do;
            call missing(aval79);
            h_out.replace();
        end;
    end;
    h_out.output(dataset: "want");
run;

Obviously solutions can emphasize several aspects (brevity, readability, flexibility, elegance, etc.) and therefore be rather cryptic, verbatim, more complex (hash of hashes), etc. So I'm happy to learn some more options on this example. 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Here is a simple hash "left join"

 

data want_hash (drop=rc);
    if _N_ = 1 then do;
        dcl hash h49 (dataset : "have(where=(arefpce=409) rename=(aval=aval49))");
        h49.defineKey("id", "paramcd");
        h49.defineData("aval49");
        h49.defineDone();
        dcl hash h79 (dataset : "have(where=(arefpce=709) rename=(aval=aval79))");
        h79.defineKey("id", "paramcd");
        h79.defineData("aval79");
        h79.defineDone();
    end;

    set have(where=(arefpce=203) rename=(aval=aval23));
    call missing (aval49, aval79);

    rc = h49.find();
    rc = h79.find();
run;

View solution in original post

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

Here is a simple hash "left join"

 

data want_hash (drop=rc);
    if _N_ = 1 then do;
        dcl hash h49 (dataset : "have(where=(arefpce=409) rename=(aval=aval49))");
        h49.defineKey("id", "paramcd");
        h49.defineData("aval49");
        h49.defineDone();
        dcl hash h79 (dataset : "have(where=(arefpce=709) rename=(aval=aval79))");
        h79.defineKey("id", "paramcd");
        h79.defineData("aval79");
        h79.defineDone();
    end;

    set have(where=(arefpce=203) rename=(aval=aval23));
    call missing (aval49, aval79);

    rc = h49.find();
    rc = h79.find();
run;
PeterClemmensen
Tourmaline | Level 20

Also, you could do:

 

data want_hash (drop=aval rc);

    dcl hash h49 ();
    h49.defineKey("id", "paramcd");
    h49.defineData("aval");
    h49.defineDone();
    dcl hash h79 ();
    h79.defineKey("id", "paramcd");
    h79.defineData("aval");
    h79.defineDone();

    do until (lr1);
        set have (where=(arefpce in (409, 709))) end=lr1 ;
        if   arefpce=409 then h49.add();
        else                  h79.add();
    end;

    do until (lr2);
        set have (where=(arefpce = 203)) end=lr2;
        aval23 = aval;
        rc = h49.find(); aval49 = aval;
        rc = h79.find(); aval79 = aval;
        output;
    end;

run;
s_lassen
Meteorite | Level 14

You create a hash for the AVAL23 subset, but you also read the data directly. Why? And there is no reason to put the output in a hash. That is mostly used when you want to create a number of tables dynamically.

 

I do not think hashes are called for in this case, but if you must do it, just use them to get at the two additional tables in the left join:

data want;
  set have (where=(arefpce=203) rename=(aval=aval23));
  if _N_=1 then do;
    declare hash h49(dataset: 'have(where=(arefpce=409)');
    h49.defineKey("id", "paramcd");
    h49.defineData("aval");
    h49.defineDone();
    
    declare hash h79(dataset: 'have(where=(arefpce=709)');
    h79.defineKey("id", "paramcd");
    h79.defineData("aval");
    h79.defineDone();
    end;
  if h49.find()=0 then
    aval49=aval;
  if h79.find()=0 then
    aval79=aval;
  keep id paramcd aval23 aval49 aval79;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 591 views
  • 2 likes
  • 3 in conversation