Hi,
I am trying to do a cumulative sum without much luck. I have tried joining the table to itself and the cumulative sum works, but the original dataset is very large and the query does not run .. So I am looking at alternatives and would appreciate any help from members.
I do not have access to proc expand
The sold has negative values to reflect returns made on the day
The cumulative sum should group by store id, product code, sub code and days. I am looking at dataset where each store ID has several products and sub products for 365 days and all days have sold values (either 0, positive or negative).
Thanks in advance.
INPUT DATA
data have;
input storeid productcode$ productsubcode DAYS sold best12.;
datalines;
100 Y 1 1 2
100 Y 1 2 -1
100 Y 1 3 4
100 Y 2 1 2
100 Y 2 2 1
100 Y 2 3 0
101 Y 1 1 2
101 Y 1 2 -2
101 Y 1 3 2
101 K 4 1 2
101 K 4 2 1
101 K 4 3 1
;
run;
DESIRED OUTPUT
storeid productcode productsubcode DAYS sold cum_sold
100 Y 1 1 2 5
100 Y 1 2 -1 3
100 Y 1 3 4 4
100 Y 2 1 2 3
100 Y 2 2 1 1
100 Y 2 3 0 0
101 Y 1 1 2 2
101 Y 1 2 -2 0
101 Y 1 3 2 2
101 K 4 1 2 4
101 K 4 2 1 2
101 K 4 3 1 1
I think we are there:
data have;
input storeid productcode$ productsubcode DAYS sold best12.;
datalines;
100 Y 1 1 2
100 Y 1 2 -1
100 Y 1 3 4
100 Y 2 1 2
100 Y 2 2 1
100 Y 2 3 0
101 Y 1 1 2
101 Y 1 2 -2
101 Y 1 3 2
101 K 4 1 2
101 K 4 2 1
101 K 4 3 1
;
run;
proc sort data=have;by storeid productsubcode descending days;
data want;
set have;
by productsubcode notsorted;
sum + sold;
if first.productsubcode then sum = sold;
run;
proc sort data=want;by storeid productsubcode days;
I'm not following this example.
The first row of the raw data has a value for sold of 2. Shouldn't the corresponding cum_sold in the output data set also have a value of 2? Where does the cum_sold in your desired output come from?
This should give you close to what I think you want based on your description:
proc sql;
create table want as
select
storeid,
productcode,
productsubcode,
days,
sum(sold) as cum_sold
from
have
group by
storeid,
productcode,
productsubcode,
days;
quit;
I also do not follow the example. The output doesn't go along with what you are saying. If you run this code you will not get the cum column that you have listed:
data have;
input storeid productcode$ productsubcode DAYS sold best12.;
datalines;
100 Y 1 1 2
100 Y 1 2 -1
100 Y 1 3 4
100 Y 2 1 2
100 Y 2 2 1
100 Y 2 3 0
101 Y 1 1 2
101 Y 1 2 -2
101 Y 1 3 2
101 K 4 1 2
101 K 4 2 1
101 K 4 3 1
;
run;
proc sql;
create table want as
select *,sum(sold) as sum_sold
from have
group by storeid,productcode,productsubcode,days;
storeid productcode productsubcode DAYS sold cum_sold calculation
100 Y 1 1 2 5 2+(-1)+4
100 Y 1 2 -1 3 -1+4
100 Y 1 3 4 4 4
This shows on day 1 -> 5 were sold and on day 2 -> 3 were sold and so on.
The cumulative totals should increment to day 1
Sorry, I just realized in the desired output the days has been left out.
Here you go:
data have;
input storeid productcode$ productsubcode DAYS sold best12.;
datalines;
100 Y 1 1 2
100 Y 1 2 -1
100 Y 1 3 4
100 Y 2 1 2
100 Y 2 2 1
100 Y 2 3 0
101 Y 1 1 2
101 Y 1 2 -2
101 Y 1 3 2
101 K 4 1 2
101 K 4 2 1
101 K 4 3 1
;
run;
proc sql;
create table start1 as
select *,sum(sold) as _sum_sold2
from have
group by storeid,productcode,productsubcode
order by storeid,productsubcode,days desc;
data want;
set start1;
by storeid notsorted;
sum_sold + sold;
if days = 1 then sum_sold = _sum_sold2;
drop _:;
run;
proc sort data=want;by storeid productsubcode days;
Thanks for your prompt response. The above query does not work beyond the first product code and sub code. I have written the expected answer in brackets.
storeid | productcode | productsubcode | DAYS | sold | sum_sold |
100 | Y | 1 | 1 | 2 | 5 |
100 | Y | 1 | 2 | -1 | 3 |
100 | Y | 1 | 3 | 4 | 4 |
100 | Y | 2 | 1 | 2 | 3 (3 = 1 + 2) |
100 | Y | 2 | 2 | 1 | 6 (1 = 0 + 1) |
100 | Y | 2 | 3 | 0 | 5 (0 = 0) |
101 | Y | 1 | 1 | 2 | 2 (2) |
101 | Y | 1 | 2 | -2 | 3 (0) |
101 | Y | 1 | 3 | 2 | 5 (2) |
101 | K | 4 | 1 | 2 | 4 (4) |
101 | K | 4 | 2 | 1 | 4 (2) |
101 | K | 4 | 3 | 1 | 3 (1) |
I think we are there:
data have;
input storeid productcode$ productsubcode DAYS sold best12.;
datalines;
100 Y 1 1 2
100 Y 1 2 -1
100 Y 1 3 4
100 Y 2 1 2
100 Y 2 2 1
100 Y 2 3 0
101 Y 1 1 2
101 Y 1 2 -2
101 Y 1 3 2
101 K 4 1 2
101 K 4 2 1
101 K 4 3 1
;
run;
proc sort data=have;by storeid productsubcode descending days;
data want;
set have;
by productsubcode notsorted;
sum + sold;
if first.productsubcode then sum = sold;
run;
proc sort data=want;by storeid productsubcode days;
Thanks, that works perfectly well ... I think I understand what you have done .. By going to the lowest level (product subcode), you have used the sum function at that level.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.