BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
val_nikolajevs
Obsidian | Level 7

Hello I am learning hash tables and and one of the task i would liko to get variable value and prorate based on month.

Fro example

ID 1 from jun15 to jul10 was 25 units or 1 unit per day therefore 15 units will be in June bucket and 10 in July for this period

from jul10 to Aug10 it was 30 units or 30/31 per day threfore 21*(30/31) would go to July bucket and 10*(30/31) would go to the august bucket and so on.

So the aggregated value for the ID one for July would 10+21*(30/31);

 

Inside the step I am breaking down each ID with time range to a separate prorated rows--it works results are correct;

The aggregation function gives me trouble; Tried different options and placing sum at different locations and got ether missing results and or repeats or some unrelated one.


data _null_;
retain sum 0;
dcl hash hkw(ordered:"A", multidata:"Y",SUMINC:"prorated");
hkw.defineKey("ID","month");
hkw.defineData("ID","month","sum","prorated");
hkw.defineDone();

format prorated sum COMMA15.12;
format month Date9.;
Do until (LR=1);
set p1 end=LR;
rc=hkw.find();
do i=0 to intck('month',time_start,time_stop);
month = intnx('month',time_start,i,'b');
days = min(bill_to-1,intnx('month',month,0,'e'))-max(time_start,month)+1;
prorated=(var/(time_stop-time_start))*days;
hkw.ADD();
hkw.sum(key:ID,key:month, sum:sum);

end;

end;


end;


hkw.output(dataset:"wh1");
stop;
run;

For some reason I can not make aggregation function to work-- Could you please let me know what I am missing? Need to aggregate by ID and month so each ID has sum by month.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @val_nikolajevs,

 

You've got great responses from Patrick and mkeintz. Let me just add a few words about the SUM method, as this was the primary subject of your question. This method is really a bit tricky, but still relatively simple compared to the SUMDUP method.

 

Basically, by adding the argument tag suminc:"prorated" to the declaration of hash object HKW you instruct SAS to set up a sum of values of variable PRORATED for each item of HKW. (These sums are initialized to zero automatically, they are never missing, so there's no need to initialize them manually.) If you also specify multidata:"Y", this means, in general, multiple sums per key value combination. But your key items are ID and MONTH and you also want to aggregate on that level. So, omit the multidata:"Y" argument tag to get only one sum per ID-MONTH combination.

 

It seems unusual to me that your sum variable (named SUM) is also a data item of HKW. Typically, the final sum values are only "harvested" at the end, but if SUM is a data item, it will be involved in all intermediate operations of the hash object. Therefore, I would rather keep SUM out of the data items and write the output dataset with a traditional OUTPUT statement (instead of using the OUTPUT method of HKW, requiring SUM to be a data item).

 

You can use the REF method to trigger the incrementation of the sums when a new PRORATED value is available. A hash iterator object can drive the retrieval of the accumulated sum values at the end of the DATA step. All this is shown in the code below. (With the code structure using the DOW loop and the STOP statement, copied from your original code, there's no need to wrap the hash object declaration into "if _n_=1 then do; ... end;" because it's executed only once anyway: _n_ is constantly 1 throughout this DATA step.)

 

data want(keep=id month sum);
dcl hash hkw(ordered:'a', suminc:'prorated');
hkw.defineKey('ID','month');
hkw.defineDone();
dcl hiter hi('hkw');

do until (LR=1);
  set pk1 end=LR;
  do i=0 to intck('month',time_start,time_stop);
    month = intnx('month',time_start,i,'b');
    days = min(time_stop-1,intnx('month',month,0,'e'))-max(time_start,month)+1;
    prorated=(var/(time_stop-time_start))*days;
    hkw.ref();
  end;
end;

do while(hi.next()=0);
  hkw.sum(sum:sum);
  output;
end;

stop;
format month monyy7. sum comma18.8;
run;

 

View solution in original post

8 REPLIES 8
Patrick
Opal | Level 21

To help us help you it's for questions like yours normally really useful to get sample data provided via a  fully working data step creating such data (here for your work.p1 table). 

Then show us the desired result based on this sample data and explain us the logic how to get from Have to Want.

 

To post SAS code use the running man icon.
Patrick_0-1660960999168.png

If you have existing sample data you can share (if volume "reasonable") then either attach your SAS table to your question or alternatively generate SAS datastep code that creates the sample data https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/  

 

val_nikolajevs
Obsidian | Level 7
*define data sample;
Data pk1;
format time_start time_stop date9.;
ID=1;
time_start='15JUN2021'd;
time_stop='10JUL2021'd;
var=100;
output;
ID=1;
time_start='10JUL2021'd;
time_stop='15AUG2021'd;
var=120;
output;
ID=1;
time_start='15AUG2021'd;
time_stop='10SEP2021'd;
var=20;
output;
ID=2;
time_start='05JUN2021'd;
time_stop='03JUL2021'd;
var=400;
output;
ID=2;
time_start='03JUL2021'd;
time_stop='12AUG2021'd;
var=343;
output;
ID=2;
time_start='12AUG2021'd;
time_stop='13SEP2021'd;
var=543;
output;
run;
*prorate based on number of days in a month;
data pk2 (drop=time_start time_stop var i days);
format time_start time_stop date9.;
format month monyy7.;
set pk1;
  do i=0 to intck('month',time_start,Time_Stop);
    month = intnx('month',time_start,i,'b');
    days = min(time_stop-1,intnx('month',month,0,'e'))-max(time_start,month)+1;
	prorated=(var/(time_stop-time_start))*days;
    output;
  end;

run;
*aggregate using proc sql;
proc sql;
create table pk3 as
select distinct ID,month, sum(prorated) as sum_var
from pk2
group by ID,month
order by ID,month;
quit; 


*transpose to create a single row for each ID with prorated values for each month;

proc transpose data=pk3 out=p3k4(DROP= _NAME_ _LABEL_);
by ID ;
id month ;
var sum_var;
run;

The output of the above program sample is what I am trying to acomplish using hash tables. The above works and gets exactly what I need.

I am looking how to use hash table to combine data step with proration, aggregation and ordering output to feed output to transpose step.

Thank you.

mkeintz
PROC Star

You don't need the proc transpose.  You can do what you in a data step, with a two-dimensional array (rows for year, and columns for months - assuming you need more than one year.

 

Say you know your earliest data is in MAR2020 and the latest data is in SEP2021:

 

data want;
  set pk1;
  by id;
  array ym {2020:2021,1:12}
   _dummy  _dummy  MAR2020 APR2020 MAY2020 JUN2020 JUL2020 AUG2020 SEP2020 OCT2020 NOV2020 DEC2020
   JAN2021 FEB2021 MAR2021 APR2021 MAY2021 JUN2021 JUL2021 AUG2021 SEP2021 _dummy  _dummy  _dummy;

  if first.id then call missing (of ym{*});

  _ndays=time_stop - time_start;
  do dateval=time_start by 0 until (dateval>time_stop);
    nxt_dateval=intnx('month',dateval,1,'beg');
    _ndays_this_month=min(nxt_dateval,time_stop)-dateval;
    ym{year(dateval),month(dateval)} + var * _ndays_this_month/_ndays;
    dateval=nxt_dateval;
  end;
  keep ID  mar2020 -- sep2021;
  if last.id;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
val_nikolajevs
Obsidian | Level 7
Thank you very much for the reply. Great techniques.
Patrick
Opal | Level 21

I might have used suminc not more than once or twice. I normally just find it too hard and not intuitive enough to use.

Based on the code you share that's how I would do it if using a hash. 

data _null_;

  if _n_=1 then
    do;
      dcl hash hkw(ordered:"A", multidata:"N");
      hkw.defineKey("ID","month");
      hkw.defineData("ID","month","prorated");
      hkw.defineDone();
    end;

  format time_start time_stop date9.;
  format month monyy7.;
  set pk1 end=_last;

  do i=0 to intck('month',time_start,Time_Stop);
    month = intnx('month',time_start,i,'b');
    days = min(time_stop-1,intnx('month',month,0,'e'))-max(time_start,month)+1;
    _prorated=(var/(time_stop-time_start))*days;
    if hkw.find()=0 then prorated=sum(prorated,_prorated);
    else prorated=_prorated;
    _rc=hkw.replace();
  end;
  if _last then hkw.output(dataset:'pk3');
run;

proc print data=pk3;
run;

The most relevant thing you missed is to execute the hash definition only once in the data step and not for every single iteration (if _n_=1 then....).

val_nikolajevs
Obsidian | Level 7
Hi Patrick--thank you very much that what i was looking for.
FreelanceReinh
Jade | Level 19

Hello @val_nikolajevs,

 

You've got great responses from Patrick and mkeintz. Let me just add a few words about the SUM method, as this was the primary subject of your question. This method is really a bit tricky, but still relatively simple compared to the SUMDUP method.

 

Basically, by adding the argument tag suminc:"prorated" to the declaration of hash object HKW you instruct SAS to set up a sum of values of variable PRORATED for each item of HKW. (These sums are initialized to zero automatically, they are never missing, so there's no need to initialize them manually.) If you also specify multidata:"Y", this means, in general, multiple sums per key value combination. But your key items are ID and MONTH and you also want to aggregate on that level. So, omit the multidata:"Y" argument tag to get only one sum per ID-MONTH combination.

 

It seems unusual to me that your sum variable (named SUM) is also a data item of HKW. Typically, the final sum values are only "harvested" at the end, but if SUM is a data item, it will be involved in all intermediate operations of the hash object. Therefore, I would rather keep SUM out of the data items and write the output dataset with a traditional OUTPUT statement (instead of using the OUTPUT method of HKW, requiring SUM to be a data item).

 

You can use the REF method to trigger the incrementation of the sums when a new PRORATED value is available. A hash iterator object can drive the retrieval of the accumulated sum values at the end of the DATA step. All this is shown in the code below. (With the code structure using the DOW loop and the STOP statement, copied from your original code, there's no need to wrap the hash object declaration into "if _n_=1 then do; ... end;" because it's executed only once anyway: _n_ is constantly 1 throughout this DATA step.)

 

data want(keep=id month sum);
dcl hash hkw(ordered:'a', suminc:'prorated');
hkw.defineKey('ID','month');
hkw.defineDone();
dcl hiter hi('hkw');

do until (LR=1);
  set pk1 end=LR;
  do i=0 to intck('month',time_start,time_stop);
    month = intnx('month',time_start,i,'b');
    days = min(time_stop-1,intnx('month',month,0,'e'))-max(time_start,month)+1;
    prorated=(var/(time_stop-time_start))*days;
    hkw.ref();
  end;
end;

do while(hi.next()=0);
  hkw.sum(sum:sum);
  output;
end;

stop;
format month monyy7. sum comma18.8;
run;

 

val_nikolajevs
Obsidian | Level 7
Thank you very much for detailed explanation--that really helps to grasp how this works.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1440 views
  • 4 likes
  • 4 in conversation