turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Cumulative Sum with grouped variables

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-22-2015 11:15 AM

Hi,

I am trying to do a cumulative sum without much luck. I have tried joining the table to itself and the cumulative sum works, but the original dataset is very large and the query does not run .. So I am looking at alternatives and would appreciate any help from members.

I do not have access to proc expand

The sold has negative values to reflect returns made on the day

The cumulative sum should group by store id, product code, sub code and days. I am looking at dataset where each store ID has several products and sub products for 365 days and all days have sold values (either 0, positive or negative).

Thanks in advance.

INPUT DATA

data have;

input storeid productcode$ productsubcode DAYS sold best12.;

datalines;

100 Y 1 1 2

100 Y 1 2 -1

100 Y 1 3 4

100 Y 2 1 2

100 Y 2 2 1

100 Y 2 3 0

101 Y 1 1 2

101 Y 1 2 -2

101 Y 1 3 2

101 K 4 1 2

101 K 4 2 1

101 K 4 3 1

;

run;

DESIRED OUTPUT

storeid productcode productsubcode DAYS sold cum_sold

100 Y 1 1 2 5

100 Y 1 2 -1 3

100 Y 1 3 4 4

100 Y 2 1 2 3

100 Y 2 2 1 1

100 Y 2 3 0 0

101 Y 1 1 2 2

101 Y 1 2 -2 0

101 Y 1 3 2 2

101 K 4 1 2 4

101 K 4 2 1 2

101 K 4 3 1 1

Accepted Solutions

Solution

04-22-2015
03:27 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Mumbai_1983

04-22-2015 03:27 PM

I think we are there:

data have;

input storeid productcode$ productsubcode DAYS sold best12.;

datalines;

100 Y 1 1 2

100 Y 1 2 -1

100 Y 1 3 4

100 Y 2 1 2

100 Y 2 2 1

100 Y 2 3 0

101 Y 1 1 2

101 Y 1 2 -2

101 Y 1 3 2

101 K 4 1 2

101 K 4 2 1

101 K 4 3 1

;

run;

proc sort data=have;by storeid productsubcode descending days;

data want;

set have;

by productsubcode notsorted;

sum + sold;

if first.productsubcode then sum = sold;

run;

proc sort data=want;by storeid productsubcode days;

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Mumbai_1983

04-22-2015 11:39 AM

I'm not following this example.

The first row of the raw data has a value for sold of 2. Shouldn't the corresponding cum_sold in the output data set also have a value of 2? Where does the cum_sold in your desired output come from?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Mumbai_1983

04-22-2015 12:28 PM

This should give you close to what I think you want based on your description:

proc sql;

create table want as

select

storeid,

productcode,

productsubcode,

days,

sum(sold) as cum_sold

from

have

group by

storeid,

productcode,

productsubcode,

days;

quit;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Mumbai_1983

04-22-2015 12:51 PM

I also do not follow the example. The output doesn't go along with what you are saying. If you run this code you will not get the cum column that you have listed:

data have;

input storeid productcode$ productsubcode DAYS sold best12.;

datalines;

100 Y 1 1 2

100 Y 1 2 -1

100 Y 1 3 4

100 Y 2 1 2

100 Y 2 2 1

100 Y 2 3 0

101 Y 1 1 2

101 Y 1 2 -2

101 Y 1 3 2

101 K 4 1 2

101 K 4 2 1

101 K 4 3 1

;

run;

proc sql;

create table want as

select *,sum(sold) as sum_sold

from have

group by storeid,productcode,productsubcode,days;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Steelers_In_DC

04-22-2015 12:55 PM

storeid productcode productsubcode DAYS sold cum_sold calculation

100 Y 1 1 2 5 2+(-1)+4

100 Y 1 2 -1 3 -1+4

100 Y 1 3 4 4 4

This shows on day 1 -> 5 were sold and on day 2 -> 3 were sold and so on.

The cumulative totals should increment to day 1

Sorry, I just realized in the desired output the days has been left out.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Mumbai_1983

04-22-2015 01:56 PM

Here you go:

data have;

input storeid productcode$ productsubcode DAYS sold best12.;

datalines;

100 Y 1 1 2

100 Y 1 2 -1

100 Y 1 3 4

100 Y 2 1 2

100 Y 2 2 1

100 Y 2 3 0

101 Y 1 1 2

101 Y 1 2 -2

101 Y 1 3 2

101 K 4 1 2

101 K 4 2 1

101 K 4 3 1

;

run;

proc sql;

create table start1 as

select *,sum(sold) as _sum_sold2

from have

group by storeid,productcode,productsubcode

order by storeid,productsubcode,days desc;

data want;

set start1;

by storeid notsorted;

sum_sold + sold;

if days = 1 then sum_sold = _sum_sold2;

drop _:;

run;

proc sort data=want;by storeid productsubcode days;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Steelers_In_DC

04-22-2015 03:05 PM

Thanks for your prompt response. The above query does not work beyond the first product code and sub code. I have written the expected answer in brackets.

storeid | productcode | productsubcode | DAYS | sold | sum_sold |

100 | Y | 1 | 1 | 2 | 5 |

100 | Y | 1 | 2 | -1 | 3 |

100 | Y | 1 | 3 | 4 | 4 |

100 | Y | 2 | 1 | 2 | 3 (3 = 1 + 2) |

100 | Y | 2 | 2 | 1 | 6 (1 = 0 + 1) |

100 | Y | 2 | 3 | 0 | 5 (0 = 0) |

101 | Y | 1 | 1 | 2 | 2 (2) |

101 | Y | 1 | 2 | -2 | 3 (0) |

101 | Y | 1 | 3 | 2 | 5 (2) |

101 | K | 4 | 1 | 2 | 4 (4) |

101 | K | 4 | 2 | 1 | 4 (2) |

101 | K | 4 | 3 | 1 | 3 (1) |

Solution

04-22-2015
03:27 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Mumbai_1983

04-22-2015 03:27 PM

I think we are there:

data have;

input storeid productcode$ productsubcode DAYS sold best12.;

datalines;

100 Y 1 1 2

100 Y 1 2 -1

100 Y 1 3 4

100 Y 2 1 2

100 Y 2 2 1

100 Y 2 3 0

101 Y 1 1 2

101 Y 1 2 -2

101 Y 1 3 2

101 K 4 1 2

101 K 4 2 1

101 K 4 3 1

;

run;

proc sort data=have;by storeid productsubcode descending days;

data want;

set have;

by productsubcode notsorted;

sum + sold;

if first.productsubcode then sum = sold;

run;

proc sort data=want;by storeid productsubcode days;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Steelers_In_DC

04-22-2015 04:42 PM

Thanks, that works perfectly well ... I think I understand what you have done .. By going to the lowest level (product subcode), you have used the sum function at that level.