## Cumulative Sum with grouped variables

Solved
Occasional Contributor
Posts: 17

# Cumulative Sum with grouped variables

Hi,

I am trying to do a cumulative sum without much luck. I have tried joining the table to itself and the cumulative sum works, but the original dataset is very large and the query does not run .. So I am looking at alternatives and would appreciate any help from members.

The sold has negative values to reflect returns made on the day

The cumulative sum should group by store id, product code, sub code and days. I am looking at dataset where each store ID has several products and sub products for 365 days and all days have sold values (either 0, positive or negative).

INPUT DATA

data have;

input storeid productcode\$ productsubcode DAYS sold best12.;

datalines;

100 Y 1 1 2

100 Y 1 2 -1

100 Y 1 3 4

100 Y 2 1 2

100 Y 2 2 1

100 Y 2 3 0

101 Y 1 1 2

101 Y 1 2 -2

101 Y 1 3 2

101 K 4 1 2

101 K 4 2 1

101 K 4 3 1

;

run;

DESIRED OUTPUT

storeid productcode productsubcode DAYS sold cum_sold

100 Y   1 1 2 5

100 Y   1 2 -1 3

100 Y   1 3 4 4

100 Y   2 1 2 3

100 Y   2 2 1 1

100 Y   2 3 0 0

101 Y   1 1 2 2

101 Y   1 2 -2 0

101 Y   1 3 2 2

101 K   4 1 2 4

101 K   4 2 1 2

101 K   4 3 1 1

Accepted Solutions
Solution
‎04-22-2015 03:27 PM
Valued Guide
Posts: 863

## Re: Cumulative Sum with grouped variables

I think we are there:

data have;

input storeid productcode\$ productsubcode DAYS sold best12.;

datalines;

100 Y 1 1 2

100 Y 1 2 -1

100 Y 1 3 4

100 Y 2 1 2

100 Y 2 2 1

100 Y 2 3 0

101 Y 1 1 2

101 Y 1 2 -2

101 Y 1 3 2

101 K 4 1 2

101 K 4 2 1

101 K 4 3 1

;

run;

proc sort data=have;by storeid productsubcode descending days;

data want;

set have;

by productsubcode notsorted;

sum + sold;

if first.productsubcode then sum = sold;

run;

proc sort data=want;by storeid productsubcode days;

All Replies
Posts: 2,985

## Re: Cumulative Sum with grouped variables

I'm not following this example.

The first row of the raw data has a value for sold of 2. Shouldn't the corresponding cum_sold in the output data set also have a value of 2? Where does the cum_sold in your desired output come from?

--
Paige Miller
SAS Employee
Posts: 85

## Re: Cumulative Sum with grouped variables

This should give you close to what I think you want based on your description:

proc sql;

create table want as

select

storeid,

productcode,

productsubcode,

days,

sum(sold) as cum_sold

from

have

group by

storeid,

productcode,

productsubcode,

days;

quit;

Valued Guide
Posts: 863

## Re: Cumulative Sum with grouped variables

I also do not follow the example.  The output doesn't go along with what you are saying.  If you run this code you will not get the cum column that you have listed:

data have;

input storeid productcode\$ productsubcode DAYS sold best12.;

datalines;

100 Y 1 1 2

100 Y 1 2 -1

100 Y 1 3 4

100 Y 2 1 2

100 Y 2 2 1

100 Y 2 3 0

101 Y 1 1 2

101 Y 1 2 -2

101 Y 1 3 2

101 K 4 1 2

101 K 4 2 1

101 K 4 3 1

;

run;

proc sql;

create table want as

select *,sum(sold) as sum_sold

from have

group by storeid,productcode,productsubcode,days;

Occasional Contributor
Posts: 17

## Re: Cumulative Sum with grouped variables

storeid productcode productsubcode DAYS sold cum_sold      calculation

100         Y                     1                 1            2         5           2+(-1)+4

100         Y                     1                 2           -1         3           -1+4

100         Y                     1                 3            4         4           4

This shows on day 1 -> 5 were sold and on day 2 -> 3 were sold and so on.

The cumulative totals should increment to day 1

Sorry, I just realized in the desired output the days has been left out.

Valued Guide
Posts: 863

## Re: Cumulative Sum with grouped variables

Here you go:

data have;

input storeid productcode\$ productsubcode DAYS sold best12.;

datalines;

100 Y 1 1 2

100 Y 1 2 -1

100 Y 1 3 4

100 Y 2 1 2

100 Y 2 2 1

100 Y 2 3 0

101 Y 1 1 2

101 Y 1 2 -2

101 Y 1 3 2

101 K 4 1 2

101 K 4 2 1

101 K 4 3 1

;

run;

proc sql;

create table start1 as

select *,sum(sold) as _sum_sold2

from have

group by storeid,productcode,productsubcode

order by storeid,productsubcode,days desc;

data want;

set start1;

by storeid notsorted;

sum_sold + sold;

if days = 1 then sum_sold = _sum_sold2;

drop _:;

run;

proc sort data=want;by storeid productsubcode days;

Occasional Contributor
Posts: 17

## Re: Cumulative Sum with grouped variables

Thanks for your prompt response. The above query does not work beyond the first product code and sub code.  I have written the expected answer in brackets.

 storeid productcode productsubcode DAYS sold sum_sold 100 Y 1 1 2 5 100 Y 1 2 -1 3 100 Y 1 3 4 4 100 Y 2 1 2 3 (3 = 1 + 2) 100 Y 2 2 1 6 (1 = 0 + 1) 100 Y 2 3 0 5 (0 = 0) 101 Y 1 1 2 2 (2) 101 Y 1 2 -2 3 (0) 101 Y 1 3 2 5 (2) 101 K 4 1 2 4 (4) 101 K 4 2 1 4 (2) 101 K 4 3 1 3 (1)
Solution
‎04-22-2015 03:27 PM
Valued Guide
Posts: 863

## Re: Cumulative Sum with grouped variables

I think we are there:

data have;

input storeid productcode\$ productsubcode DAYS sold best12.;

datalines;

100 Y 1 1 2

100 Y 1 2 -1

100 Y 1 3 4

100 Y 2 1 2

100 Y 2 2 1

100 Y 2 3 0

101 Y 1 1 2

101 Y 1 2 -2

101 Y 1 3 2

101 K 4 1 2

101 K 4 2 1

101 K 4 3 1

;

run;

proc sort data=have;by storeid productsubcode descending days;

data want;

set have;

by productsubcode notsorted;

sum + sold;

if first.productsubcode then sum = sold;

run;

proc sort data=want;by storeid productsubcode days;

Occasional Contributor
Posts: 17