DATA Step, Macro, Functions and more

SUMIF with multiple conditions

Reply
Frequent Contributor
Frequent Contributor
Posts: 134

SUMIF with multiple conditions

I am still having trouble with converting Excel SUMIFs to SAS code. Here is a different example from previous post. I want to end up with the value in "valueWant" below. The Excel formula that creates "ValueWant" is below: Basically I need to sum column F based on three different criteria. I've tried various versions of proc SQL, Hash objects, retain statements and I am still stuck on this. 

 

=SUMIFS(F2:F$100,B2:B$100,B2,C2:C$100,C2+1,E2:E$100,D2)

 

for non Excel users this is what the formula means:

=SUMIFS(sum range, criteria1 Range, criteria1, criteria2 range, criteria3 range, criteria3)

 

If the criteria was from the same row instead of C2+1 for example, or the criteria was in the same column as the range instead of E2:E$100, D2 then it would be a simple sum of column F by columns B,C, E as grouped variables ; however, the complicated criteria is causing difficulty,

 

data want ;

infile datalines dsd missover;
input ID Col_B Col_C Col_D $ Col_E $ Col_F ValueWant Include ;
datalines;
10001,6,1,a.b.c.d,a.b.c.d.e,5,2,1
10001,6,2,a.b.c,a.b.c.d,1,2,1
10001,6,3,a.b,a.b.c,1,10,1
10001,6,4,a,a.b,5,0,1
10001,6,5,,,,0,0
10001,6,6,,a,5,0,1
10002,3,1,a.b,a.b.c,1,5,1
10002,3,2,a.,a.b,5,0,1
10002,3,3,,a,5,0,1
10003,6,1,a.b.c.d,a.b.c.d.e,5,1,1
10003,6,2,a.b.c,a.b.c.d,1,1,1
10003,6,3,a.b,a.b.c,1,5,1
10003,6,4,a,a.b,5,0,1
10003,6,5,,,,0,0
10003,6,6,,a,5,0,1
;
run;

Super User
Posts: 19,875

Re: SUMIF with multiple conditions

How about your logic in English? Or logical terms. Not Excel. 

Frequent Contributor
Frequent Contributor
Posts: 134

Re: SUMIF with multiple conditions

This is part of a longer formula which is a very complicated weighting structure so it's hard to create an analogy. I'll try to say what I need based on these codes.

 

I want to add the values in column F for all values of col_b and all values of col_c + 1 (indicating that the row progressed to the next level), and for all values of column d that match values in column e. Typically, the value in column d can only match column e in the col_c+1 row. 

 

Not sure if this helps. 

Super User
Posts: 19,875

Re: SUMIF with multiple conditions


CP2 wrote:

This is part of a longer formula which is a very complicated weighting structure so it's hard to create an analogy. I'll try to say what I need based on these codes.

 

I want to add the values in column F for all values of col_b and all values of col_c + 1 (indicating that the row progressed to the next level), and for all values of column d that match values in column e. Typically, the value in column d can only match column e in the col_c+1 row. 

 

Not sure if this helps. 


No, you're still trying to be too technical.

 

I suggest starting with a single issue, stating the logic as clearly as possible and working through that. Then moving on to your next issue. As is, your question is broad and too involved for a simple forum answer.  It may be worth starting over and explaining in more simple terms. 

 

Including data as a data step and your desired variable is a good idea and thank you for providing that!

Frequent Contributor
Frequent Contributor
Posts: 134

Re: SUMIF with multiple conditions

Thank you for your help. I guess I keep hoping there are others more familiar with Excel than I am. The logic doesn't make sense to me so working through it was difficult. Anyway, it turns out there seems to be a reason why I can't articulate the logic - there really isn't a good explanation! So we examined the Excel formula i was supposed to convert and found the value in the range to be summed in the sumifs function is the same for each id so we don't need to do the complicated +1 and D=F criteria. It is just  a simple grouped sum by columns B,C,D.

Trusted Advisor
Posts: 1,586

Re: SUMIF with multiple conditions

As mutch as I anderstand the excel SUMIFS function, I would convert the formula

    =SUMIFS(F2:F$100,B2:B$100,B2,C2:C$100,C2+1,E2:E$100,D2)

into:

 

 

    if  Col_B  of row _N_ =  Col_B of row 2  and

        Col_C  of row _N_ = (Col_C of row 2) +1  and

        Col_E  of row _N_ =  Col_D of row 2 

    then ValueWant + Col_F;

 

If the above pseudo code is right then do:

 

     data temp;

           keep Col_B2 Col_C2 Col_D2;

        set have (point=2);

               Col_B2 = Col_B;

               Col_C2 = Col_C + 1;

               Col_D2 = Col_E;

               output;

    run;

    proc sql;

            create table want as select

               t.*, sum(h.Col_F) as ValueWant

             from temp as t

             left join have as h

         ON Col_B = Col_B2 and

               Col_C = Col_C2 and

               Col_D = Col_d2

         order by ID, Col_B, Col_C, Col_D;

   quit;

 

NOTE: The excel formula does not use the ID column.

            The result is summing relevant rows from the whole input.

               

Frequent Contributor
Frequent Contributor
Posts: 134

Re: SUMIF with multiple conditions

thank you. the formula I showed was the formula for row 2 but that same formula is in every row under the valuewant column but the number is changed in the criteria depending on the row. For example for row 3 under that column the formula is:
=SUMIFS(F2:F$100,B2:B$100,B3,C2:C$100,C3+1,E2:E$100,D3)
Frequent Contributor
Frequent Contributor
Posts: 134

Re: SUMIF with multiple conditions

also, I tried the following code but keep getting missing values.

 

 

proc sql ;
create table want as
select a.*, b.valuewant2
From want as a
left join
(select col_b, col_c, col_e, sum(col_f) as valuewant2
from want
group by col_b, col_c, col_e ) as b
on a.col_b = b.col_b
and a.col_c = b.col_c+1
and a.col_d = b.col_e ;

quit ;

 

Trusted Advisor
Posts: 1,586

Re: SUMIF with multiple conditions

To help you I need to clarrify few points:

 

1) By "Col_C + 1"  within SAS the meaning is: Value of Col_C plus 1
     (and not: Value of Col_c in row _N_ +1, which is the next row)

 

2) Columns B C D are keys to match data, so 1st step is to calculate and define the keys.

    Using the excel terms: I define the keys according to the criteria in the excel formula.

 

   The 2nd step is to sum Col_F per every combination of "keys";

 

3) You gave the example for row = 2; I understand you want to do it for all rows.

     Does data have duplicates - i.e. more than one row with same "keys".

     Obviously YES, otherwise ther will be no reason to sum Col_F;

     It means we need to select DISTINCT combinations of "keys";

 

If you agree with my clarifications, then better do the work in two different steps:

1) Calculate distinct combinations of keys. Check the list is it OK. If true continue to the 2nd step;

     Thus I have done creating dataset temp. In order to calculate all combinations of "keys" 

     drop the option (point=2) and finally sort by Col_B Col_C Col_D NODUPKEY;

 

2) Summarise Col_F per group defined by the "keys"; 

 

I hope I made myself clear. Try it. Upload your code and input lines and output you got - just per one "keys" example;

Frequent Contributor
Frequent Contributor
Posts: 134

Re: SUMIF with multiple conditions

Thank you for your response.

 

Here are my responses:

 

1) data is in sort order of col_c so col_C+1 is the next consecutive number. 

2) B,C,E are the keyed fields - but this is part of the confusion. When the values are summed by these keyes then they need to be matched to the rows where C=C+1 (in summed database) and D= E (in summed database)

3)The rows are unique by id and col_c.

Trusted Advisor
Posts: 1,586

Re: SUMIF with multiple conditions

I need still to clariffy some points:

 

Is Col_B same value for all lines of an ID ?

Is Col_B the number of lines given per ID ?

 

What is Col_C + 1 for the last line in ID group ?
e.g. for ID=10002 on last line Col_C=3.   What is Col_C +1 in this case ?

 

Looking at IDs 10001 and 10003 they have same key values (e.g. Col_B=6, Col_c=1 and Col_E=5);

Do you want to some Col_F of those lines, despite different IDs ?

 

Why is ID not a member in the keys ? Is it deliberately ?

 

I am trying to convert the excel formula into sas technicaly but it is very difficult without understanding the logig and the materia .

I have the feeling that we argue to understand your data more than arguing SAS. I'm not sure this forum is intended for it.

Trusted Advisor
Posts: 1,586

Re: SUMIF with multiple conditions

Some more clarifications:

 

Attached is a print of work.have created by your datalines.

I understand that column ValuWant is what you expect.

Line 1 and line 10 on attached PDF have the same keys: 

   Col_B=6 , Col_C=1, Col_D=a.b.c.d (=Col_E)

why on line 1 (id=10001) is ValuWant = 2 while on line 10 (id=10003) ValuWant = 1 ?

 

ID is not mentioned in your excel formula ?!

 

and - what is the origin/meaning of the INCLUDE column ?

does it participate in the algorithm to select lines to summarize ?

 

Frequent Contributor
Frequent Contributor
Posts: 134

Re: SUMIF with multiple conditions

thank you so much for trying to help! It turns out that because the column being summed is typically the same value we do not need to use +1 as in the excel formula. So, the code is much simpler - phew! It is a simple grouped sum with column B,C,D as the group variables. 

 

Ask a Question
Discussion stats
  • 12 replies
  • 1902 views
  • 2 likes
  • 3 in conversation