SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Running total

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

Running total

Hello,

How to get the running total of a variable upto n records

group by some variable?

I want to create a new variable by doing the sum of first 4 records order by date value and group by color.

datecolorvalue
3/17/2014white3
3/18/2014white2
3/19/2014white1
3/20/2014white2
3/21/2014white0
3/24/2014white38
3/25/2014white38
3/26/2014white38
3/17/2014black1
3/18/2014black3
3/19/2014black2
3/20/2014black1
3/21/2014black0
3/24/2014black6
3/25/2014black7
3/26/2014black6

Accepted Solutions
Solution
‎06-28-2017 11:43 AM
Super User
Posts: 9,681

Re: Running total

data have;
input date : mmddyy10.     color $     value ;
format      date mmddyy10.;
cards;
3/17/2014     white     3
3/18/2014     white     2
3/19/2014     white     1
3/20/2014     white     2
3/21/2014     white     0
3/24/2014     white     38
3/25/2014     white     38
3/26/2014     white     38
3/17/2014     black     1
3/18/2014     black     3
3/19/2014     black     2
3/20/2014     black     1
3/21/2014     black     0
3/24/2014     black     6
3/25/2014     black     7
3/26/2014     black     6
;
run;
data want;
 set have;
 if color ne lag(color)  then do; n=0;sum=0;end;
 n+1;sum+value;
 if n in (1:4);
 drop n;
run;


Xia Keshan

View solution in original post


All Replies
Super User
Posts: 5,256

Re: Running total

You can use the data step with set, by, retain and first.-logic to accomplish this.

Data never sleeps
Super User
Posts: 5,082

Re: Running total

Judging by the language you use to describe the problem, you are more accustomed to using SQL than a DATA step.  SQL will not be a reliable tool for this application, as it does not guarantee the order of extracted records.  Instead, you can try:

proc sort data=have;

   by color date;

run;

data want;

   set have;

   by color date;

   if first.color then do;

      total = value;

      record_count=1;

   end;

   else do;

      total + value;

      record_count + 1;

   end;

   if record_count = 4;

run;

This approach outputs nothing for colors that have fewer than 4 records.  Slight changes to the logic can change that, but this DATA step illustrates the basic tools you will need to start with.


Solution
‎06-28-2017 11:43 AM
Super User
Posts: 9,681

Re: Running total

data have;
input date : mmddyy10.     color $     value ;
format      date mmddyy10.;
cards;
3/17/2014     white     3
3/18/2014     white     2
3/19/2014     white     1
3/20/2014     white     2
3/21/2014     white     0
3/24/2014     white     38
3/25/2014     white     38
3/26/2014     white     38
3/17/2014     black     1
3/18/2014     black     3
3/19/2014     black     2
3/20/2014     black     1
3/21/2014     black     0
3/24/2014     black     6
3/25/2014     black     7
3/26/2014     black     6
;
run;
data want;
 set have;
 if color ne lag(color)  then do; n=0;sum=0;end;
 n+1;sum+value;
 if n in (1:4);
 drop n;
run;


Xia Keshan

Trusted Advisor
Posts: 1,204

Re: Running total

proc sort data=have;
by color date;
run;

data want;
set have;
if mod(_n_,4)=1 then grp+1;
run;

proc sql;
select color, sum(value) from want
where grp in (1,3)
group by color,grp;
quit;

Super User
Posts: 17,829

Re: Running total

If you have SAS/ETS licensed look into proc expand.

Occasional Contributor
Posts: 12

Re: Running total

Thanks all for your suggestion. Is it possible to get the same result using PROC SQL?

Super User
Posts: 5,082

Re: Running total

No, it's not possible using PROC SQL.  SQL by its nature does not guarantee the order of the records that it retrieves.  It has no guaranteed way of determining which are the four records that you want for each COLOR.

Perhaps I shouldn't say it's impossible.  Perhaps you can devise a method to identify the records that you want to retrieve.  But nobody that I know has ever accomplished that.  There is an undocumented feature (MONOTONIC) that could help in theory ... if only it were guaranteed to work every time.  But it is not guaranteed so even if you got it to work properly once, it might not work properly the next time.

You just might be stuck having to learn to program in a DATA step for some types of problems.

Super User
Posts: 17,829

Re: Running total

Yes, but not recommended.

If you're familiar with SQL and working with a SQL database you can use pass-through logic and program as if you were in a native DB.

Otherwise create a row counter, via a data step and then merge via the row counter.  More steps and less efficient.

Trusted Advisor
Posts: 1,204

Re: Running total

You can try this. As mentioned in above responses MONOTONIC ( ) function is undocumented so results should be interpreted carefully.

proc sql;

create table one as

select *, monotonic () as cnt

from have

order by color,date;

create table want as

select *, (cnt - min(cnt) + 1 ) as cnt_grp

from one

group by color

order by cnt;

select color,sum(value) as total_4 from want

where cnt_grp<5

group by color;

quit;

Respected Advisor
Posts: 3,124

Re: Running total

if you have to resort to Proc SQL, there is a documented solution to get the sequence number:

ods _all_ close;

ods output sql_results=want;

proc sql number;

  select * from have;

quit;

ods output close;

ods listing;

Haikuo

Trusted Advisor
Posts: 1,204

Re: Running total

Thanks Haikuo.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 5742 views
  • 1 like
  • 7 in conversation