This is a follow-up question on Aggregation using hashing: counting distinct occurrences based on a string (solved and helpful comments thanks to @draycut ,@FreelanceReinhard and @hashman!).
Data for this scenario:
data _heart_src;
set sashelp.heart (keep=smoking_status bp_status sex systolic status);
label sex="Sex"
status="Survival status"
systolic="Systolic";
run;
I've updated the code for the hashing based solution (see Aggregation using hashing: counting distinct occurrences based on a string), but would like to give one more enhancement to it:
Can it be done without passing through the data once again (no additional DoW-Loop), e.g. by only creating further hash tables?
What I think how it should work (but cannot code it):
dcl hash detail (ordered: "a", multidata: "y");
detail.defineKey("smoking_status", "bp_status", "sex");
detail.defineData("smoking_status", "bp_status", "sex", "systolic", "status",
"_sum", "_mean", "_usys", "_n", "_sum_alv");
detail.defineDone();
So using pictures - this is what I have:
This is what I want:
Updated code (what I have so far)
/* 01 */ data _null_;
/* 02 */ declare hash out(ordered: "a");
/* 03 */ out.defineKey ("smoking_status", "bp_status", "sex");
/* 04 */ out.defineData("smoking_status", "bp_status", "sex",
/* 05 */ "_sum", "_unq", "_sum_alv", "_n", "_mean");
/* 06 */ out.defineDone();
/* 07 */
/* 08 */ declare hash unq();
/* 09 */ unq.defineKey("smoking_status", "bp_status", "sex", "systolic");
/* 10 */ unq.defineDone();
/* 11 */
/* 12 */ do until (end);
/* 13 */ set _heart_src end=end;
/* 14 */ if (out.find() ne 0) then call missing(_sum, _unq, _n);
/* 15 */ _sum+systolic; * _sum=sum(_sum, systolic);
/* 16 */ _n+1; * _n=sum(_n, 1);
/* 17 */ if (unq.check() ne 0) then do;
/* 18 */ _unq+1; * _unq=sum(_unq, 1);
/* 19 */ unq.add();
/* 20 */ end;
/* 20a */ _sum_alv=sum(_sum_alv, status =: "A"); * _sum_alv+(status ="Alive");
/* 21 */ _mean=_sum/_n;
/* 22 */ out.replace();
/* 23 */ end;
/* 24 */ out.output(DATASET: "_hash_agg");
/* 25 */ format _mean 8.1;
/* 26 */ label _mean="Mean (systolic)"
/* 27 */ _sum="Sum (systolic)"
/* 28 */ _unq="Unique values (systolic)"
/* 29 */ _n="N (# of subjects)"
/* 29a */ _sum_alv="Patients alive";
/* 30 */ run;
So my questions:
Hi @left,
Your "have" and "want" pictures are currently a bit inconsistent: some statistics from "have" (e.g. _unc) do not occur in "want", some from "want" (e.g. sum_wt_norm) cannot be based on _heart_src and some variable names differ (_sum vs. sum_sys). Most importantly, your "updated code" contains a mistake: _sum_alv is not reset to missing in the CALL MISSING routine, resulting in incorrect values for this summary statistic. (Also the STOP statement is missing.)
That said, here's one way to combine detail rows and summary statistics using hash object (and hash iterator) methods:
dcl hash detail(ordered: "a", multidata: "y");
detail.defineKey("smoking_status", "bp_status", "sex");
detail.defineData("systolic", "status");
detail.defineDone();
detail.add();
dcl hiter iout('out');
Usage (after the DoW loop, replacing the call of the OUTPUT method):do while(iout.next()=0);
do while(detail.do_over()=0);
output;
end;
end;
Of course, the OUTPUT statement requires a change to the DATA statement:data want;
Hi @left,
Your "have" and "want" pictures are currently a bit inconsistent: some statistics from "have" (e.g. _unc) do not occur in "want", some from "want" (e.g. sum_wt_norm) cannot be based on _heart_src and some variable names differ (_sum vs. sum_sys). Most importantly, your "updated code" contains a mistake: _sum_alv is not reset to missing in the CALL MISSING routine, resulting in incorrect values for this summary statistic. (Also the STOP statement is missing.)
That said, here's one way to combine detail rows and summary statistics using hash object (and hash iterator) methods:
dcl hash detail(ordered: "a", multidata: "y");
detail.defineKey("smoking_status", "bp_status", "sex");
detail.defineData("systolic", "status");
detail.defineDone();
detail.add();
dcl hiter iout('out');
Usage (after the DoW loop, replacing the call of the OUTPUT method):do while(iout.next()=0);
do while(detail.do_over()=0);
output;
end;
end;
Of course, the OUTPUT statement requires a change to the DATA statement:data want;
Thank you @FreelanceReinh for the corrections in my code and for providing a solution! My apologize for the inconsistencies in the above code and the screenshots. The second screenshot was from a "double DoW-Loop" solution with slightly different naming for the variables.
I have updated the code, but used a slight variation of your solution due to the fact that I run SAS v9.2 which does not support the "do_over()" method. I intended to use another hash table for output instead of the dataset [want] just for the sake of finding my way.
/* 01 */ data _null_;
/* 01a */ * Hash table [out] for aggregated display (no duplicate keys);
/* 02 */ declare hash out(ordered: "a");
/* 03 */ out.defineKey ("smoking_status", "bp_status", "sex");
/* 04 */ out.defineData("smoking_status", "bp_status", "sex",
/* 05 */ "_unq", "_sum", "_mean", "_sum_alv", "_n");
/* 06 */ out.defineDone();
/* 07 */
/* 07a */ * Hash table [unq] used for counting unique occurrences of value [systolic] within the key of hash table [h];
/* 08 */ declare hash unq();
/* 09 */ unq.defineKey("smoking_status", "bp_status", "sex", "systolic");
/* 10 */ unq.defineDone();
/* 10a */ * Hash table [detail] for detailed data on patients status (alive/ dead) and systolic blood pressure;
/* 10b */ dcl hash detail (ordered: "a", multidata: "y");
/* 10c */ detail.defineKey("smoking_status", "bp_status", "sex");
/* 10d */ detail.defineData("smoking_status", "bp_status", "sex", "systolic", "status");
/* 10e */ dcl hiter h_detail ("detail"); * Iterator to be able to walk through detail hash table [detail];
/* 10f */ detail.defineDone();
/* 10g */ * Hash table [det_out] for output of detailed data (updated with aggregated values from hash table [h];
/* 10h */ dcl hash det_out(ordered: "a", multidata: "y");
/* 10i */ det_out.defineKey("smoking_status", "bp_status", "sex");
/* 10j */ det_out.defineData("smoking_status", "bp_status", "sex", "systolic", "_unq", "_sum", "_mean", "status", "_sum_alv", "_n");
/* 10k */ det_out.defineDone();
/* 11 */
/* 12 */ do until (end);
/* 13 */ set _heart_src end=end;
/* 13a */ detail.add(); * fill hash table [detail] for each observation;
/* 14 */ if (out.find() ne 0) then call missing(_sum, _unq, _n, _sum_alv);
/* 15 */ _sum+systolic; * _sum=sum(_sum, systolic);
/* 16 */ _n+1; * _n=sum(_n, 1);
/* 16a */ _sum_alv=sum(_sum_alv, status =: "A"); * Count only patients alive;
/* 16b */ * Check if value is unique within its "by-group"/ key;
/* 17 */ if (unq.check() ne 0) then do;
/* 18 */ _unq+1; * _unq=sum(_unq, 1);
/* 19 */ unq.add();
/* 20 */ end;
/* 21 */ _mean=_sum/_n;
/* 21a */ * Update hash table [h] (aggregated values only);
/* 22 */ out.replace();
/* 23 */ end;
/* 23a */ rc=h_detail.first(); * Place "pointer" to first item in hash table [detail];
/* 23b */ do while (rc=0);
/* 23c */ * Write data portion of hash table [h] into PDV with "h.find()" (for each matching key);
/* 23d */ if out.find()=0 then det_out.add();
/* 23e */ rc=h_detail.next();
/* 23f */ end;
/* 23g */ * Output of final hash tables;
/* 24 */ out.output (dataset: "_hash_agg"); * [h] aggravated dataset (1 entry for each key);
/* 24a */ det_out.output(dataset: "_hash_agg_all"); * [all] aggravated values for each observation (duplicate keys);
/* 25 */ format _mean 8.1;
/* 26 */ label _mean="Mean (systolic)"
/* 27 */ _sum="Sum (systolic)"
/* 28 */ _unq="Unique values (systolic)"
/* 29 */ _n="N (# of subjects)"
/* 29a */ _sum_alv="Patients alive";
/* 29b */ stop;
/* 30 */ run;
The final output comes to this (to have the post complete):
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.