Hash object to sum a by group conditionally

Accepted Solution Solved
Reply
Frequent Contributor
Frequent Contributor
Posts: 133
Accepted Solution

Hash object to sum a by group conditionally

[ Edited ]

I have to convert a complicated Excel formula to SAS. The formula uses SUMIFs and in most cases I can just use the criteria range as the "By" variables and get a sum by the combination of "By" variables ; however, in this specific formula the criteria being looked up is calculated ("group + 1") . Also, another criteria from one field is looking up a match in a different column. I thought I could use a hash object or a double-dow loop but haven't been successful. My code is below but the data step with the hashtable does not work. I want to create "projsum" in SAS to match the SumGroup variable created from the excel formula below.

 

Excel formula: =SUMIFS($C$2:$C$100,$D$2:$D$100,$D2,F$2:$F$100,$F2+1,I$2:I$100,$H2)

 

This is just a partial formula so it doesn't make much sense on its own. My hope is that someone can help me figure out how to convert the F$2:$F$100,$F2+1 part of the code above and the I$2:I$100,$H2 code above.

 

Columns=SAS var

C=sumvar

D=maxgroup

F=group

I=path

H=root

 

 

Thanks. 

 

data have ;
input ID include sumvar maxgroup sortgroup group code: $20. root: $100. path: $100. sumgroup ;
infile datalines dsd ;

datalines;
482755,1,61,2,1,1,61,4,4.61,61
482755,1,61,2,2,2,4,,4,0
482755,1,61,2,2,82,2,4,4.2,0
482829,1,96,6,1,1,61,17.4.188.50.42,17.4.188.50.42.61,96
482829,1,96,6,2,2,42,17.4.188.50,17.4.188.50.42,96
482829,1,96,6,3,3,50,17.4.188,17.4.188.50,96
482829,1,96,6,4,4,188,17.4,17.4.188,96
482829,1,96,6,5,5,4,17,17.4,96
482829,1,96,6,6,6,17,,17,0
482872,1,44,2,1,1,61,76,76.61,44
482872,1,44,2,2,2,76,,76,0
482872,0,,2,2.01,2.01,,,,0
482917,1,61,2,1,1,61,7,7.61,122
482917,1,61,2,2,2,7,,7,0
482917,1,61,2,2,82,2,7,7.2,0
9482917,1,61,2,1,1,61,7,7.61,122
9482917,1,61,2,2,2,7,,7,0
9482917,1,61,2,2,82,2,7,7.2,0
2603928,1,94,4,1,1,7,17.145.61,17.145.61.7,94
2603928,0,,4,1.01,1.01,,,,0
2603928,1,94,4,2,2,61,17.145,17.145.61,188
2603928,1,94,4,3,3,145,17,17.145,376
2603928,1,94,4,4,4,17,,17,0
354184,1,94,4,1,1,61,17.145.50,17.145.50.61,94
354184,1,94,4,2,2,50,17.145,17.145.50,188
354184,1,94,4,2,82,177,17.145.50,17.145.50.177,0
354184,1,94,4,3,3,145,17,17.145,376
354184,1,94,4,4,4,17,,17,0
649494,1,94,4,1,1,61,17.7.18,17.7.18.61,188
649494,1,94,4,2,2,18,17.7,17.7.18,188
649494,1,94,4,2,82,2,17.7.18,17.7.18.2,0
649494,1,94,4,3,3,7,17,17.7,376
649494,1,94,4,4,4,17,,17,0
6494,1,94,4,1,1,61,17.7.18,17.7.18.61,188
6494,1,94,4,2,2,18,17.7,17.7.18,188
6494,1,94,4,2,82,2,17.7.18,17.7.18.2,0
6494,1,94,4,3,3,7,17,17.7,376
6494,1,94,4,4,4,17,,17,0
;

run;
proc print; run;

proc sort data=have ;
by id maxgroup descending sortgroup group;
run;
proc print; run;


Data want;

declare hash project() ;
rc=project.defineKey ('maxGroup','group','path') ;
rc=project.defineData ('maxGroup', 'Group', 'path','root', 'SumGroup') ;
rc=project.defineDone() ;


do until (last.eof) ;

set have end=eof ;
groupNext=Group+1 ;
rc = project.find() ;
if rc = 0 then do ;
if include = 1 and root=path then do;
Projsum=sum(ProjSum,SumVar) ;
end;
end ;
output;
end;
run;

proc print data=want;
run;

 

 The following PROC SQL is getting me closer to what I want but ProjSum still does not equal SumGroup for all the rows. If anyone could help tweak this SQL code that would be great!!


proc sql ;
create table want as
select distinct a.*,sum(b.sumvar) as projsum
from have a
, have b
where a.maxgroup = b.maxgroup
and b.path = a.root
and b.group=a.group+1
group by a.maxgroup,a.path, a.group+1
order by a.id
;
quit;
proc print data=want ;
run;


Accepted Solutions
Solution
‎08-25-2016 10:14 AM
Frequent Contributor
Frequent Contributor
Posts: 133

Re: Hash object to sum a by group conditionally

Success! This Proc Sql appears to work with my sample data. 

 


proc sql ;
create table want as
select a.*, b.projsum
From have as a
left join
(select maxgroup, group, path, sum(sumvar) as projsum
from have
group by maxgroup, group, path ) as b
on a.maxgroup = b.maxgroup
and b.path = a.root
and b.group=a.group+1
order by a.id;
quit ;
proc print; run;

 

 

View solution in original post


All Replies
Grand Advisor
Posts: 10,239

Re: Hash object to sum a by group conditionally

Since this is a SAS forum you should explain what the heck a SUMIFS is in general.

I avoid Excel as much as possible and need an explanation as to what that does and others may as well.

I suspect since you are apparently doing something with an entire(?) column that SAS/IML might actually be a better solution.

 

It does help to show what the expected result should be as well.

Frequent Contributor
Frequent Contributor
Posts: 133

Re: Hash object to sum a by group conditionally

Good idea. 

 

SUMIFS(sum_range, criteria_range1, criteria1,[criteria_range2, criteria2],...)

https://www.ablebits.com/office-addins-blog/2014/11/12/excel-sumifs-sumif-multiple-criteria/#sumifs-...

 

The result for ProjSum should be the same as the column SumGroup (SumGroup was generated using the Excel formula in Excel).

 

Thanks.

Respected Advisor
Posts: 3,840

Re: Hash object to sum a by group conditionally

Conversion of Excel formulas into a SAS program can become very "ugly" as in Excel you can "jump around" between cells in different rows as you wish; so it will strongly depend on your Excel formulas how to approach this.

 

IF there is a lot of "jumping around" then you could transpose your data into arrays as this will give you access to all the data "at once" and you can address the array elements similar to cells in an Excel sheet.

 

data have;
  input ID include sumvar maxgroup sortgroup group code: $20. root: $100. path: $100. sumgroup;
  infile datalines dsd;
  datalines;
482755,1,61,2,1,1,61,4,4.61,61
482755,1,61,2,2,2,4,,4,0
482755,1,61,2,2,82,2,4,4.2,0
482829,1,96,6,1,1,61,17.4.188.50.42,17.4.188.50.42.61,96
482829,1,96,6,2,2,42,17.4.188.50,17.4.188.50.42,96
482829,1,96,6,3,3,50,17.4.188,17.4.188.50,96
482829,1,96,6,4,4,188,17.4,17.4.188,96
482829,1,96,6,5,5,4,17,17.4,96
482829,1,96,6,6,6,17,,17,0
482872,1,44,2,1,1,61,76,76.61,44
482872,1,44,2,2,2,76,,76,0
482872,0,,2,2.01,2.01,,,,0
482917,1,61,2,1,1,61,7,7.61,122
482917,1,61,2,2,2,7,,7,0
482917,1,61,2,2,82,2,7,7.2,0
9482917,1,61,2,1,1,61,7,7.61,122
9482917,1,61,2,2,2,7,,7,0
9482917,1,61,2,2,82,2,7,7.2,0
2603928,1,94,4,1,1,7,17.145.61,17.145.61.7,94
2603928,0,,4,1.01,1.01,,,,0
2603928,1,94,4,2,2,61,17.145,17.145.61,188
2603928,1,94,4,3,3,145,17,17.145,376
2603928,1,94,4,4,4,17,,17,0
354184,1,94,4,1,1,61,17.145.50,17.145.50.61,94
354184,1,94,4,2,2,50,17.145,17.145.50,188
354184,1,94,4,2,82,177,17.145.50,17.145.50.177,0
354184,1,94,4,3,3,145,17,17.145,376
354184,1,94,4,4,4,17,,17,0
649494,1,94,4,1,1,61,17.7.18,17.7.18.61,188
649494,1,94,4,2,2,18,17.7,17.7.18,188
649494,1,94,4,2,82,2,17.7.18,17.7.18.2,0
649494,1,94,4,3,3,7,17,17.7,376
649494,1,94,4,4,4,17,,17,0
6494,1,94,4,1,1,61,17.7.18,17.7.18.61,188
6494,1,94,4,2,2,18,17.7,17.7.18,188
6494,1,94,4,2,82,2,17.7.18,17.7.18.2,0
6494,1,94,4,3,3,7,17,17.7,376
6494,1,94,4,4,4,17,,17,0
;
run;

data _null_;
  call symputx('nobs',nobs);
  stop;
  set have nobs=nobs;
run;

data inter;
  set have nobs=nobs end=last;
  array  _id {&nobs} 8;
  array  _include {&nobs} 8;
  array  _sumvar {&nobs} 8;
  array  _maxgroup {&nobs} 8;
  array  _sortgroup {&nobs} 8;
  array  _group {&nobs} 8;
  array  _code {&nobs} $20;
  array  _root {&nobs} $100;
  array  _path {&nobs} $100;

  retain _id ;
  retain _include ;
  retain _sumvar ;
  retain _maxgroup ;
  retain _sortgroup ;
  retain _group ;
  retain _code ;
  retain _root ;
  retain _path ;

  _id[_n_]         =  id;
  _include[_n_]    =  include;
  _sumvar[_n_]     =  sumvar;
  _maxgroup[_n_]   =  maxgroup;
  _sortgroup[_n_]  =  sortgroup;
  _group[_n_]      =  group;
  _code[_n_]       =  code;
  _root[_n_]       =  root;
  _path[_n_]       =  path;

  if last then output;
  keep _:;
  
run;
Frequent Contributor
Frequent Contributor
Posts: 133

Re: Hash object to sum a by group conditionally

Thanks. I will consider that option but I'm getting pretty close with the following SQL. I just don't know where it's going wrong on some of the rows. 

 


proc sql ;
create table want as
select distinct a.*,sum(b.sumvar) as projsum
from have a
, have b
where a.maxgroup = b.maxgroup
and b.path = a.root
and b.group=a.group+1
group by a.maxgroup,a.path, a.group+1
order by a.id
;
quit;
proc print data=want ;
run;

Solution
‎08-25-2016 10:14 AM
Frequent Contributor
Frequent Contributor
Posts: 133

Re: Hash object to sum a by group conditionally

Success! This Proc Sql appears to work with my sample data. 

 


proc sql ;
create table want as
select a.*, b.projsum
From have as a
left join
(select maxgroup, group, path, sum(sumvar) as projsum
from have
group by maxgroup, group, path ) as b
on a.maxgroup = b.maxgroup
and b.path = a.root
and b.group=a.group+1
order by a.id;
quit ;
proc print; run;

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 341 views
  • 0 likes
  • 3 in conversation