<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Aggregation using hashing: Repeat summarized results for each observation in by-group in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Aggregation-using-hashing-Repeat-summarized-results-for-each/m-p/633265#M187854</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/315410"&gt;@left&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your "have" and "want" pictures are currently a bit inconsistent: some statistics from "have" (e.g. &lt;FONT face="courier new,courier"&gt;_unc&lt;/FONT&gt;) do not occur in "want", some from "want" (e.g. &lt;FONT face="courier new,courier"&gt;sum_wt_norm&lt;/FONT&gt;) cannot be based on &lt;FONT face="courier new,courier"&gt;_heart_src&lt;/FONT&gt; and some variable names differ (&lt;FONT face="courier new,courier"&gt;_sum&lt;/FONT&gt; vs. &lt;FONT face="courier new,courier"&gt;sum_sys&lt;/FONT&gt;). Most importantly, your "updated code" contains a mistake:&amp;nbsp;&lt;FONT face="courier new,courier"&gt;_sum_alv&lt;/FONT&gt; is not reset to missing in the CALL MISSING routine, resulting in incorrect values for this summary statistic. (Also the STOP statement is missing.)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That said, here's one way to combine detail rows and summary statistics using hash object (and hash iterator) methods:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Yes, you can define a multidata hash object "detail", but I wouldn't want to create much redundancy by copying all summary statistics to each of its entries. Instead, I would restrict the list of data items to the detail variables needed:&lt;BR /&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;dcl hash detail(ordered: "a", multidata: "y");
detail.defineKey("smoking_status", "bp_status", "sex");
detail.defineData("systolic", "status");
detail.defineDone();&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;Hash object "detail" can be populated using the ADD method after the SET statement:&lt;BR /&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;detail.add();&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;A hash iterator object could be used to traverse the hash object containing the summary statistics.&lt;BR /&gt;Declaration:&lt;BR /&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;dcl hiter iout('out');&lt;/CODE&gt;&lt;/PRE&gt;
Usage (after the DoW loop, replacing the call of the OUTPUT &lt;EM&gt;method&lt;/EM&gt;)&lt;FONT face="helvetica"&gt;:&lt;/FONT&gt;&lt;BR /&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;do while(iout.next()=0);
  do while(detail.do_over()=0);
    output;
  end;
end;&lt;/CODE&gt;&lt;/PRE&gt;
Of course, the OUTPUT &lt;EM&gt;statement&lt;/EM&gt; requires a change to the DATA statement:&lt;BR /&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/LI&gt;
&lt;/UL&gt;</description>
    <pubDate>Thu, 19 Mar 2020 13:14:13 GMT</pubDate>
    <dc:creator>FreelanceReinh</dc:creator>
    <dc:date>2020-03-19T13:14:13Z</dc:date>
    <item>
      <title>Aggregation using hashing: Repeat summarized results for each observation in by-group</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Aggregation-using-hashing-Repeat-summarized-results-for-each/m-p/633181#M187816</link>
      <description>&lt;P&gt;This is a follow-up question on&amp;nbsp;&lt;A href="https://communities.sas.com/t5/SAS-Programming/Aggregation-using-hashing-counting-distinct-occurrences-based-on/m-p/632745#M187612" target="_blank" rel="noopener"&gt;Aggregation using hashing: counting distinct occurrences based on a string&lt;/A&gt;&amp;nbsp;(solved and helpful comments thanks to &lt;A href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31304" target="_blank" rel="noopener"&gt;@draycut&lt;/A&gt;&amp;nbsp;,&lt;A href="https://communities.sas.com/t5/user/viewprofilepage/user-id/32733" target="_blank" rel="noopener"&gt;@FreelanceReinhard&lt;/A&gt;&amp;nbsp;and&amp;nbsp;&lt;A href="https://communities.sas.com/t5/user/viewprofilepage/user-id/21262" target="_blank" rel="noopener"&gt;@hashman&lt;/A&gt;!).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Data for this scenario:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="sas"&gt;data _heart_src;
    set sashelp.heart (keep=smoking_status bp_status sex systolic status);
    label   sex="Sex"
            status="Survival status"
            systolic="Systolic";
run;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've updated the code for the hashing based solution (see &lt;A href="https://communities.sas.com/t5/SAS-Programming/Aggregation-using-hashing-counting-distinct-occurrences-based-on/m-p/632745#M187612" target="_blank" rel="noopener"&gt;Aggregation using hashing: counting distinct occurrences based on a string&lt;/A&gt;), but would like to give one more enhancement to it:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Instead of having only the summarized data (n=36 by-groups) I want to create a dataset with the full detail information from the input dataset together with the summarized data&lt;/LI&gt;&lt;LI&gt;Solution shall be done using hashing&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Can it be done without passing through the data once again (no additional DoW-Loop), e.g. by only creating further hash tables?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I think how it should work (but cannot code it):&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Create another hash object [detail] and add entries from first DoW-Loop&lt;/LI&gt;&lt;LI&gt;Aggregated data are populated in hash table [out] (already implemented).&lt;/LI&gt;&lt;LI&gt;Update hash table [detail] using [multidata: "yes"] with matching keys from [out] with [detail.ref()].&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="sas"&gt;    dcl hash detail (ordered: "a", multidata: "y");
    detail.defineKey("smoking_status", "bp_status", "sex");
    detail.defineData("smoking_status", "bp_status", "sex", "systolic", "status", 
                      "_sum", "_mean", "_usys", "_n", "_sum_alv");
    detail.defineDone();&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Output of hash table [detail]&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So using pictures - this is what I have:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Have" style="width: 631px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/37027iF0EBD7A030488A9D/image-size/large?v=v2&amp;amp;px=999" role="button" title="hashing_aggregated_result_revisited.png" alt="Have" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Have&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is what I want:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Want" style="width: 763px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/37028i2DBC89CCE05163C1/image-size/large?v=v2&amp;amp;px=999" role="button" title="hashing_heart_data_detailed.png" alt="Want" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Want&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;SPAN class="lia-inline-image-caption"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Updated code (what I have so far)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* 01 */ data _null_;
/* 02 */     declare hash out(ordered: "a");
/* 03 */     out.defineKey ("smoking_status", "bp_status", "sex");
/* 04 */     out.defineData("smoking_status", "bp_status", "sex", 
/* 05 */                    "_sum", "_unq", "_sum_alv", "_n", "_mean");
/* 06 */     out.defineDone();
/* 07 */ 
/* 08 */     declare hash unq();
/* 09 */     unq.defineKey("smoking_status", "bp_status", "sex", "systolic");
/* 10 */     unq.defineDone();
/* 11 */ 
/* 12 */     do until (end);
/* 13 */         set _heart_src end=end;
/* 14 */         if (out.find() ne 0) then call missing(_sum, _unq, _n);
/* 15 */         _sum+systolic;             * _sum=sum(_sum, systolic);
/* 16 */         _n+1;                      * _n=sum(_n, 1);
/* 17 */         if (unq.check() ne 0) then do;
/* 18 */             _unq+1;                * _unq=sum(_unq, 1);
/* 19 */             unq.add();
/* 20 */         end;

/* 20a */        _sum_alv=sum(_sum_alv, status =: "A");     * _sum_alv+(status ="Alive");

/* 21 */         _mean=_sum/_n;
/* 22 */         out.replace();
/* 23 */     end;
/* 24 */     out.output(DATASET: "_hash_agg");
/* 25 */     format _mean 8.1;
/* 26 */     label  _mean="Mean (systolic)"
/* 27 */            _sum="Sum (systolic)"
/* 28 */            _unq="Unique values (systolic)"
/* 29 */            _n="N (# of subjects)"

/* 29a */           _sum_alv="Patients alive";

/* 30 */ run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;So my questions:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;How can this be done with hashing?&lt;/LI&gt;&lt;LI&gt;If yes, can it be done in a single run (or is a further DoW-Loop required)?&lt;/LI&gt;&lt;/UL&gt;</description>
      <pubDate>Thu, 19 Mar 2020 08:44:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Aggregation-using-hashing-Repeat-summarized-results-for-each/m-p/633181#M187816</guid>
      <dc:creator>left</dc:creator>
      <dc:date>2020-03-19T08:44:42Z</dc:date>
    </item>
    <item>
      <title>Re: Aggregation using hashing: Repeat summarized results for each observation in by-group</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Aggregation-using-hashing-Repeat-summarized-results-for-each/m-p/633265#M187854</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/315410"&gt;@left&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your "have" and "want" pictures are currently a bit inconsistent: some statistics from "have" (e.g. &lt;FONT face="courier new,courier"&gt;_unc&lt;/FONT&gt;) do not occur in "want", some from "want" (e.g. &lt;FONT face="courier new,courier"&gt;sum_wt_norm&lt;/FONT&gt;) cannot be based on &lt;FONT face="courier new,courier"&gt;_heart_src&lt;/FONT&gt; and some variable names differ (&lt;FONT face="courier new,courier"&gt;_sum&lt;/FONT&gt; vs. &lt;FONT face="courier new,courier"&gt;sum_sys&lt;/FONT&gt;). Most importantly, your "updated code" contains a mistake:&amp;nbsp;&lt;FONT face="courier new,courier"&gt;_sum_alv&lt;/FONT&gt; is not reset to missing in the CALL MISSING routine, resulting in incorrect values for this summary statistic. (Also the STOP statement is missing.)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That said, here's one way to combine detail rows and summary statistics using hash object (and hash iterator) methods:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Yes, you can define a multidata hash object "detail", but I wouldn't want to create much redundancy by copying all summary statistics to each of its entries. Instead, I would restrict the list of data items to the detail variables needed:&lt;BR /&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;dcl hash detail(ordered: "a", multidata: "y");
detail.defineKey("smoking_status", "bp_status", "sex");
detail.defineData("systolic", "status");
detail.defineDone();&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;Hash object "detail" can be populated using the ADD method after the SET statement:&lt;BR /&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;detail.add();&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/LI&gt;
&lt;LI&gt;A hash iterator object could be used to traverse the hash object containing the summary statistics.&lt;BR /&gt;Declaration:&lt;BR /&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;dcl hiter iout('out');&lt;/CODE&gt;&lt;/PRE&gt;
Usage (after the DoW loop, replacing the call of the OUTPUT &lt;EM&gt;method&lt;/EM&gt;)&lt;FONT face="helvetica"&gt;:&lt;/FONT&gt;&lt;BR /&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;do while(iout.next()=0);
  do while(detail.do_over()=0);
    output;
  end;
end;&lt;/CODE&gt;&lt;/PRE&gt;
Of course, the OUTPUT &lt;EM&gt;statement&lt;/EM&gt; requires a change to the DATA statement:&lt;BR /&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Thu, 19 Mar 2020 13:14:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Aggregation-using-hashing-Repeat-summarized-results-for-each/m-p/633265#M187854</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2020-03-19T13:14:13Z</dc:date>
    </item>
    <item>
      <title>Re: Aggregation using hashing: Repeat summarized results for each observation in by-group</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Aggregation-using-hashing-Repeat-summarized-results-for-each/m-p/633551#M187937</link>
      <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/32733"&gt;@FreelanceReinh&lt;/a&gt;&amp;nbsp;for the corrections in my code and for providing a solution! My apologize for the inconsistencies in the above code and the screenshots. The second screenshot was from a "double DoW-Loop" solution with slightly different naming for the variables.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have updated the code, but used a slight variation of your solution due to the fact that I run SAS v9.2 which does not support the "do_over()" method. I intended to use another hash table for output instead of the dataset [want] just for the sake of finding my way.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* 01 */ data _null_;
/* 01a */    * Hash table [out] for aggregated display (no duplicate keys);
/* 02 */     declare hash out(ordered: "a");
/* 03 */     out.defineKey ("smoking_status", "bp_status", "sex");
/* 04 */     out.defineData("smoking_status", "bp_status", "sex", 
/* 05 */                    "_unq", "_sum", "_mean", "_sum_alv", "_n");
/* 06 */     out.defineDone();
/* 07 */ 
/* 07a */    * Hash table [unq] used for counting unique occurrences of value [systolic] within the key of hash table [h];
/* 08 */     declare hash unq();
/* 09 */     unq.defineKey("smoking_status", "bp_status", "sex", "systolic");
/* 10 */     unq.defineDone();

/* 10a */    * Hash table [detail] for detailed data on patients status (alive/ dead) and systolic blood pressure;
/* 10b */    dcl hash detail (ordered: "a", multidata: "y");
/* 10c */    detail.defineKey("smoking_status", "bp_status", "sex");
/* 10d */    detail.defineData("smoking_status", "bp_status", "sex", "systolic", "status");
/* 10e */    dcl hiter h_detail ("detail");     * Iterator to be able to walk through detail hash table [detail];
/* 10f */    detail.defineDone();

/* 10g */    * Hash table [det_out] for output of detailed data (updated with aggregated values from hash table [h];
/* 10h */    dcl hash det_out(ordered: "a", multidata: "y");
/* 10i */    det_out.defineKey("smoking_status", "bp_status", "sex");
/* 10j */    det_out.defineData("smoking_status", "bp_status", "sex", "systolic", "_unq", "_sum", "_mean", "status", "_sum_alv", "_n");
/* 10k */    det_out.defineDone();
/* 11 */ 
/* 12 */     do until (end);
/* 13 */         set _heart_src end=end;
/* 13a */        detail.add();  * fill hash table [detail] for each observation;
/* 14 */         if (out.find() ne 0) then call missing(_sum, _unq, _n, _sum_alv);
/* 15 */         _sum+systolic;             * _sum=sum(_sum, systolic);
/* 16 */         _n+1;                      * _n=sum(_n, 1);
/* 16a */        _sum_alv=sum(_sum_alv, status =: "A");      * Count only patients alive;
/* 16b */        * Check if value is unique within its "by-group"/ key;
/* 17 */         if (unq.check() ne 0) then do;
/* 18 */             _unq+1;                * _unq=sum(_unq, 1);
/* 19 */             unq.add();
/* 20 */         end;
/* 21 */         _mean=_sum/_n;
/* 21a */        * Update hash table [h] (aggregated values only);
/* 22 */         out.replace();
/* 23 */     end;

/* 23a */    rc=h_detail.first();    * Place "pointer" to first item in hash table [detail];
/* 23b */    do while (rc=0);
/* 23c */        * Write data portion of hash table [h] into PDV with "h.find()" (for each matching key);
/* 23d */        if out.find()=0 then det_out.add();
/* 23e */        rc=h_detail.next();
/* 23f */    end;

/* 23g */    * Output of final hash tables;
/* 24 */     out.output    (dataset: "_hash_agg");      * [h]   aggravated dataset (1 entry for each key);
/* 24a */    det_out.output(dataset: "_hash_agg_all");  * [all] aggravated values for each observation (duplicate keys);

/* 25 */     format _mean 8.1;
/* 26 */     label  _mean="Mean (systolic)"
/* 27 */            _sum="Sum (systolic)"
/* 28 */            _unq="Unique values (systolic)"
/* 29 */            _n="N (# of subjects)"
/* 29a */           _sum_alv="Patients alive";
/* 29b */    stop;
/* 30 */ run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The final output comes to this (to have the post complete):&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Want (Aggregated data repeated for each observation)" style="width: 864px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/37081i07A0EE6947801DC0/image-size/large?v=v2&amp;amp;px=999" role="button" title="hashing_heart_data_detailed_final.png" alt="Want (Aggregated data repeated for each observation)" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Want (Aggregated data repeated for each observation)&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Mar 2020 12:05:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Aggregation-using-hashing-Repeat-summarized-results-for-each/m-p/633551#M187937</guid>
      <dc:creator>left</dc:creator>
      <dc:date>2020-03-20T12:05:53Z</dc:date>
    </item>
  </channel>
</rss>

