This is follow-up question of Aggregation using hashing: Repeat summarized results for each observation in by-group.
Would it be possible to implement a solution using hashing without relying on a hash iterator (or an additional DoW-Loop), but by a further hash table lookup?
I wonder if the smart combination of aggravated summary result in hash table [out] and the detailed data in hash table [detail] can be used for a look-up to achieve this?
Here is the underlying code where a possible solution could be done or explained why it cannot be done without a hash iterator or a further DoW-Loop.
/* 01 */ data _null_;
/* 01a */ * Hash table [out] for aggregated display (no duplicate keys);
/* 02 */ declare hash out(ordered: "a");
/* 03 */ out.defineKey ("smoking_status", "bp_status", "sex");
/* 04 */ out.defineData("smoking_status", "bp_status", "sex",
/* 05 */ "_unq", "_sum", "_mean", "_sum_alv", "_n");
/* 06 */ out.defineDone();
/* 07 */
/* 07a */ * Hash table [unq] used for counting unique occurrences of value [systolic] within the key of hash table [h];
/* 08 */ declare hash unq();
/* 09 */ unq.defineKey("smoking_status", "bp_status", "sex", "systolic");
/* 10 */ unq.defineDone();
/* 10a */ * Hash table [detail] for detailed data on patients status (alive/ dead) and systolic blood pressure;
/* 10b */ dcl hash detail (ordered: "a", multidata: "y");
/* 10c */ detail.defineKey("smoking_status", "bp_status", "sex");
/* 10d */ detail.defineData("smoking_status", "bp_status", "sex", "systolic", "status");
/* 10e */ dcl hiter h_detail ("detail"); * Iterator to be able to walk through detail hash table [detail];
/* 10f */ detail.defineDone();
/* 10g */ * Hash table [det_out] for output of detailed data (updated with aggregated values from hash table [h];
/* 10h */ dcl hash det_out(ordered: "a", multidata: "y");
/* 10i */ det_out.defineKey("smoking_status", "bp_status", "sex");
/* 10j */ det_out.defineData("smoking_status", "bp_status", "sex", "systolic", "_unq", "_sum", "_mean", "status", "_sum_alv", "_n");
/* 10k */ det_out.defineDone();
/* 11 */
/* 12 */ do until (end);
/* 13 */ set _heart_src end=end;
/* 13a */ detail.add(); * fill hash table [detail] for each observation;
/* 14 */ if (out.find() ne 0) then call missing(_sum, _unq, _n, _sum_alv);
/* 15 */ _sum+systolic; * _sum=sum(_sum, systolic);
/* 16 */ _n+1; * _n=sum(_n, 1);
/* 16a */ _sum_alv=sum(_sum_alv, status =: "A"); * Count only patients alive;
/* 16b */ * Check if value is unique within its "by-group"/ key;
/* 17 */ if (unq.check() ne 0) then do;
/* 18 */ _unq+1; * _unq=sum(_unq, 1);
/* 19 */ unq.add();
/* 20 */ end;
/* 21 */ _mean=_sum/_n;
/* 21a */ * Update hash table [h] (aggregated values only);
/* 22 */ out.replace();
/* 23 */ end;
/* 23a */ rc=h_detail.first(); * Place "pointer" to first item in hash table [detail];
/* 23b */ do while (rc=0);
/* 23c */ * Write data portion of hash table [h] into PDV with "h.find()" (for each matching key);
/* 23d */ if out.find()=0 then det_out.add();
/* 23e */ rc=h_detail.next();
/* 23f */ end;
/* 23g */ * Output of final hash tables;
/* 24 */ out.output (dataset: "_hash_agg"); * [h] aggravated dataset (1 entry for each key);
/* 24a */ det_out.output(dataset: "_hash_agg_all"); * [all] aggravated values for each observation (duplicate keys);
/* 25 */ format _mean 8.1;
/* 26 */ label _mean="Mean (systolic)"
/* 27 */ _sum="Sum (systolic)"
/* 28 */ _unq="Unique values (systolic)"
/* 29 */ _n="N (# of subjects)"
/* 29a */ _sum_alv="Patients alive";
/* 29b */ stop;
/* 30 */ run;
Hash tables can only access one entry at at time.
In order to calculate sum, average. etc, you need to read all these values one by one. So you need to iterate somehow.
What's bothering you?
Note that this code can be made much clearer. For example:
proc sql ;
create table WANT as
select SMOKING_STATUS, BP_STATUS, SEX, SYSTOLIC, STATUS
, count(unique SYSTOLIC) as _UNQ
, sum(SYSTOLIC) as _SUM
, mean(SYSTOLIC) as _MEAN
, sum(STATUS='Alive') as _SUM_ALV
, count(*) as _N
from HAVE
group by SMOKING_STATUS, BP_STATUS, SEX
order by SMOKING_STATUS, BP_STATUS, SEX;
quit;
You can use the options _method and magic to monitor and (hope to) influence the SQL parser
Thank you for your reply, @RichardDeVen. The sample data is an excerpt from [sashelp.heart]:
data _heart_src;
set sashelp.heart (keep=smoking_status bp_status sex systolic status);
label sex="Sex"
status="Survival status"
systolic="Systolic";
run;
The data set [_heart_src] has one row per subject as shown in this screenshot:
My mentioned code gives me the following result:
So this is what I want, but without using the hash iterator. I look for a solution with a further hash table facilitating a clever table-lookup instead of using the hash iterator or another DoW-Loop.If this is not possible I am glad to hear why not.
Hash tables can only access one entry at at time.
In order to calculate sum, average. etc, you need to read all these values one by one. So you need to iterate somehow.
What's bothering you?
Note that this code can be made much clearer. For example:
proc sql ;
create table WANT as
select SMOKING_STATUS, BP_STATUS, SEX, SYSTOLIC, STATUS
, count(unique SYSTOLIC) as _UNQ
, sum(SYSTOLIC) as _SUM
, mean(SYSTOLIC) as _MEAN
, sum(STATUS='Alive') as _SUM_ALV
, count(*) as _N
from HAVE
group by SMOKING_STATUS, BP_STATUS, SEX
order by SMOKING_STATUS, BP_STATUS, SEX;
quit;
You can use the options _method and magic to monitor and (hope to) influence the SQL parser
There is no 'clever lookup'. In order to output the aggregates, associated with the details, you need to loop over the details.
A better long term solution might be to use an SQL query that is more clearly understood by more people than those that can decipher the hashing code.
Example of SQL solving the same problem. This query relies on the SAS Proc SQL feature that automatically re-merges summary statistics back with the original data.
data have;
call streaminit(123);
do response = 1 to 1e3;
category1 = ceil(rand('uniform', 10));
category2 = ceil(rand('uniform', 5));
category3 = ceil(rand('uniform', 2));
continuous_measurement = 100 + floor(rand('uniform', 100));
binary_categorical_response = rand('uniform') < 0.09;
output;
end;
run;
proc sql;
create table want as
select
category1, category2, category3, continuous_measurement, binary_categorical_response
, count(distinct continuous_measurement) as _unq
, sum(continuous_measurement) as _sum
, mean(continuous_measurement) as _mean format = 5.1
, sum(binary_categorical_response) as _count_binary_assert_1
, count(*) as _n
from
have
group by
category1, category2, category3
order by
category1, category2, category3, continuous_measurement desc ;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.