Hi,
I would like to create a new dataset from an old dataset, so that the new dataset will have cumulative sum of variables from the old dataset. This should be done by byvar variables in the old dataset. An old data example is like this temp dataset:
data temp;
input Factor1 $ Factor2 $ rank x y;
datalines;
A B 1 1 2
A B 2 2 4
A B 3 3 6
A C 1 1 7
A C 2 3 8
A C 3 5 9
;
run
The new dataset should be like this temp2 dataset:
data temp2;
input Factor1 $ Factor2 $ rank x y csum_x csum_y;
datalines;
A B 1 1 2 1 2
A B 2 2 4 3 6
A B 3 3 6 6 12
A C 1 1 7 1 7
A C 2 3 8 4 15
A C 3 5 9 9 24
;
run
The cumulative sum of x and y are done by Factor1 and Factor2 combination, and along with the rank variable rank. The general factor variable combination could be more than 2 factor variables. Is there any data step or proc sql way to do this? Thanks.
Like this?
data WANT;
set HAVE;
by FACROR1 FACTOR2;
if first.FACTOR2 then call missing(SUMX, SUMY);
SUMX+X;
SUMY+Y;
run.
Like this?
data WANT;
set HAVE;
by FACROR1 FACTOR2;
if first.FACTOR2 then call missing(SUMX, SUMY);
SUMX+X;
SUMY+Y;
run.
Hi Chris,
Thanks for this solution. It is good to know there is a missing routine that reset cusum operation in data step. More generally, it could be
if first.factor1 or first.factor2 then call missing(sumx, sumy);
to take care of the level change of the factor combination. You provided the simplest solution.
I guess you meant the reverse of the statement is true.
data temp;
input Factor1 $ Factor2 $ rank x y;
datalines;
A B 1 1 2
A B 2 2 4
A B 3 3 6
A C 1 1 7
A C 2 3 8
A C 3 5 9
;
run;
proc sql;
create table temp2 as
select *,
(select sum(x) from temp where Factor1=a.Factor1 and Factor2=a.Factor2 and rank le a.rank) as csum_x,
(select sum(y) from temp where Factor1=a.Factor1 and Factor2=a.Factor2 and rank le a.rank) as csum_y
from temp as a;
quit;
Hi Ksharp,
Using a self-correlated subquery, this problem is solved by an elegant SQL way. I was thinking if this could be done by Group by statement. Probably not, as sql does not provide cusum summary statistic. I was also wondering if this could be generalized to a macro when one does not know in advance the number of factors. Say,
%let byvar_list=factor1 factor2;
%macro cusum_byvar(indata=temp, byvar=&byvar_list, outdata=);
%mend;
Thanks for your sql solution.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.