Desktop productivity for business analysts and programmers

Cumulative sum (row operation)

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 81
Accepted Solution

Cumulative sum (row operation)

So I have two datasets:

/* dataset A */
          Col_1   Col_2   Col_3   Col_4
Row_1       3      2        5       7
Row_2       4      1        0       6
Row_3       11     8        3       1

/* dataset B */
           Col_1   Col_2   Col_3   Col_4
Row_1       10      6        1       3
Row_2       2       0        8       3
Row_3       1       2        9       4

The final dataset that I need would look like:

/* dataset output */
           Col_1   Col_2    Col_3   Col_4
Row_1       13       21      27       37
Row_2       6        7       15       24
Row_3       12       22      34       39

Bascially 1st sum up the same cells from dataset A and dataset B.

then calculate the cumulative sum starting from Col_1. 

 

e.g. the Col_3, Row_2 cell value in output dataset would be calculated as: (4+2) + (0+1) + (0+8) = 15;

the Col_2, Row_1 cell: (10+3)+(2+6) = 21.

 

It is quite easy to do in Excel, not sure how to do it in SAS? 


Accepted Solutions
Solution
‎03-06-2017 09:36 PM
PROC Star
Posts: 7,437

Re: Cumulative sum (row operation)

I would use a datastep like the following:

 

data A;
  input Col_1-Col_4;
  cards;
3      2        5       7
4      1        0       6
11     8        3       1
;
data B;
  input Col_1-Col_4;
  cards;
10      6        1       3
2       0        8       3
1       2        9       4
;

data want (keep=col:);
  set a;
  set b (rename=(Col_1-Col_4=_Col_1-_Col_4));
  array cols(*) Col_1-Col_4;
  array _cols(*) _Col_1-_Col_4;
  do i=1 to dim(cols);
    if i=1 then cols(i)=sum(cols(i),_cols(i));
    else do;
      cols(i)=sum(cols(i-1),cols(i),_cols(i));
    end;
  end;
run;

Art, CEO, AnalystFinder.com

View solution in original post


All Replies
Super User
Posts: 19,188

Re: Cumulative sum (row operation)

EG Tasks -> Append two datasets into 1, then use a Summary Task to sum the records across groups. 

 

 

Solution
‎03-06-2017 09:36 PM
PROC Star
Posts: 7,437

Re: Cumulative sum (row operation)

I would use a datastep like the following:

 

data A;
  input Col_1-Col_4;
  cards;
3      2        5       7
4      1        0       6
11     8        3       1
;
data B;
  input Col_1-Col_4;
  cards;
10      6        1       3
2       0        8       3
1       2        9       4
;

data want (keep=col:);
  set a;
  set b (rename=(Col_1-Col_4=_Col_1-_Col_4));
  array cols(*) Col_1-Col_4;
  array _cols(*) _Col_1-_Col_4;
  do i=1 to dim(cols);
    if i=1 then cols(i)=sum(cols(i),_cols(i));
    else do;
      cols(i)=sum(cols(i-1),cols(i),_cols(i));
    end;
  end;
run;

Art, CEO, AnalystFinder.com

Frequent Contributor
Posts: 81

Re: Cumulative sum (row operation)

Hey thanks for the quick reply. Your codes worked.

What if the name starts with 0, so column names are like 'Col_0', 'Col_1' to 'Col_3'? I did a little bit adjustments and replace '1','4' with '0','3' in your codes, but then it says: "ERROR: Array subscript out of range at line 44 column 29."

I guess it has something to do with 0 but could you please tell me how to change it?
Super User
Posts: 19,188

Re: Cumulative sum (row operation)

It worked fine for me. 

 

data A;
  input Col_0-Col_3;
  cards;
3      2        5       7
4      1        0       6
11     8        3       1
;
data B;
  input Col_0-Col_3;
  cards;
10      6        1       3
2       0        8       3
1       2        9       4
;

data want (keep=col:);
  set a;
  set b (rename=(Col_0-Col_3=_Col_0-_Col_3));
  array cols(*) Col_0-Col_3;
  array _cols(*) _Col_0-_Col_3;
  do i=1 to dim(cols);
    if i=1 then cols(i)=sum(cols(i),_cols(i));
    else do;
      cols(i)=sum(cols(i-1),cols(i),_cols(i));
    end;
  end;
run;
PROC Star
Posts: 7,437

Re: Cumulative sum (row operation)

My guess would be that in changing the number you also changed, but should'nt have:

 

do i=1 to dim(cols);

That, and the statements under it, shouldn't be changed.

 

Art, CEO, AnalystFinder.com

 

Frequent Contributor
Posts: 81

Re: Cumulative sum (row operation)

Hi, in addition to this problem, what if there are more than two input datasets? like three, four datasets. (and users will specify an input, a number, to determine how many datasets they will use)

Then how should we calculate the sum? (rather than hard-code). Suppose datasets have name like: Lib.&type._&class._&j._name; where 'type', 'class', 'j' are global macro variables.
Frequent Contributor
Posts: 81

Re: Cumulative sum (row operation)

Just figured out a solution:

%macro sum;
data want (keep=Class Array:);      
     retain Class Array:;
     array Array(*) Array_0-Array_%eval(&num.-1);
     do i=1 to dim(Array);
     Array(i)=0;            /* give the array initial values */
     end;

     %do _l=1 %to &Num_type;
     %let Type_name=%scan(&type,&_l);
        set lib.&type;
        array &Type_name.(*) &Type_name._0-&Type_name._%eval(&num.-1);    /* rename each column */
        do i=1 to dim(Array);
          Array(i)=&Type_name.(i) + Array(i);        /* sum up */
        end;
     %end;

run;
%mend sum;
%sum;

This process can be further expanded to cover more macro variables.

Super User
Posts: 19,188

Re: Cumulative sum (row operation)

Macros are one way, but I would suggest IML or Base SAS instead.

 

Here's a way, doesn't matter how many datasets or variables. 

 

data A;
	input Col_1-Col_4;
	cards;
3      2        5       7
4      1        0       6
11     8        3       1
;

data B;
	input Col_1-Col_4;
	cards;
10      6        1       3
2       0        8       3
1       2        9       4
;

data combined;
	set a b indsname=source;;

	*doesn't matter how many datasets here;
	dsn=source;
run;

proc sort data=combined;
	by dsn;
run;

data combined;
	set combined;
	by dsn;

	if first.dsn then
		row_count=1;
	else row_count+1;
run;

proc means data=combined nway noprint;
	class row_count;
	var col_1-col_4;

	*list all vars here, doesn't matter how many;
	output out=want (drop = _type_ _freq_) sum=;
run;
Super User
Super User
Posts: 6,850

Re: Cumulative sum (row operation)

I have no idea how you would do something like that in Excel.

But it is easy in SAS.

First combine all of the datasets and rotate into tall skinny format.  Note you could make this step as a view if you want.

data combine;
 set  B A indsname=source ;
 array c _numeric_;
 if source ne lag(source) then row=0;
 row+1;
 do col=1 to dim(c);
   value=c(col);
   output;
 end;
 keep row col value;
run;

Then calculate the sum of the values for the same ROW*COL cell;

proc summary nway ;
  class row col ;
  var value ;
  output out=sum (keep=row col value) sum=value ;
run;

Then generate the cummulative sums across the columns.

data cum_sum ;
  do until (last.row);
    set sum ;
    by row col;
    new_value = sum(new_value,value);
    output;
  end;
run;

Then if you want you could convert it back into a wide format.

proc transpose data=cum_sum out=want(drop=_: ) prefix=col_ ;
  by row ;
  id col ;
  var new_value ;
run;

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 337 views
  • 3 likes
  • 4 in conversation