BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Macro
Obsidian | Level 7

Hi,

 

I would like to create a new dataset from an old dataset, so that the new dataset will have cumulative sum of variables from the old dataset. This should be done by byvar variables in the old dataset.  An old data example is like this temp dataset:

 

data temp;
input Factor1 $  Factor2 $  rank x y;
datalines;
A  B  1  1  2
A  B  2  2  4
A  B  3  3  6
A  C  1  1  7
A  C  2  3  8
A  C  3  5  9
;
run

The new dataset should be like this temp2 dataset:

data temp2;
input Factor1 $  Factor2 $  rank x y csum_x  csum_y;
datalines;
A  B  1  1  2  1  2
A  B  2  2  4  3  6
A  B  3  3  6  6  12
A  C  1  1  7  1  7
A  C  2  3  8  4  15
A  C  3  5  9  9  24
;
run

The cumulative sum of x and y are done by Factor1 and Factor2 combination, and along with the rank variable rank. The general factor variable combination could be  more than 2 factor variables. Is there any data step or proc sql way to do this? Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Like this?

data WANT;
  set HAVE;
  by FACROR1 FACTOR2;
  if first.FACTOR2 then call missing(SUMX, SUMY);
  SUMX+X;
  SUMY+Y;
 run.

 

View solution in original post

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

Like this?

data WANT;
  set HAVE;
  by FACROR1 FACTOR2;
  if first.FACTOR2 then call missing(SUMX, SUMY);
  SUMX+X;
  SUMY+Y;
 run.

 

Macro
Obsidian | Level 7

Hi Chris,

 

Thanks for this solution. It is good to know there is a missing routine that reset cusum operation in data step. More generally, it could be

 

if first.factor1 or first.factor2 then call missing(sumx, sumy);

to take care of the level change of the factor combination. You provided the simplest solution.

ChrisNZ
Tourmaline | Level 20

 

first.factor1   is always true if   first.factor2    is true

Macro
Obsidian | Level 7

I guess you meant the reverse of the statement is true.

Ksharp
Super User
data temp;
input Factor1 $  Factor2 $  rank x y;
datalines;
A  B  1  1  2
A  B  2  2  4
A  B  3  3  6
A  C  1  1  7
A  C  2  3  8
A  C  3  5  9
;
run;

proc sql;
create table temp2 as
select *,
(select sum(x) from temp where Factor1=a.Factor1 and Factor2=a.Factor2 and rank le a.rank) as csum_x,
(select sum(y) from temp where Factor1=a.Factor1 and Factor2=a.Factor2 and rank le a.rank) as csum_y
 from temp as a;
quit;
Macro
Obsidian | Level 7

Hi Ksharp,

 

Using a self-correlated subquery, this problem is solved by an elegant SQL way. I was thinking if this could be done by Group by statement. Probably not, as sql does not provide cusum summary statistic. I was also wondering if this could be generalized to a macro when one does not know in advance the number of factors. Say,

%let byvar_list=factor1 factor2;
%macro cusum_byvar(indata=temp, byvar=&byvar_list, outdata=);

%mend;

Thanks for your sql solution.

Ksharp
Super User
Sure . of course, you can make a macro .

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 764 views
  • 4 likes
  • 3 in conversation