BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shivkr
Fluorite | Level 6

Hello,

How to get the running total of a variable upto n records

group by some variable?

I want to create a new variable by doing the sum of first 4 records order by date value and group by color.

datecolorvalue
3/17/2014white3
3/18/2014white2
3/19/2014white1
3/20/2014white2
3/21/2014white0
3/24/2014white38
3/25/2014white38
3/26/2014white38
3/17/2014black1
3/18/2014black3
3/19/2014black2
3/20/2014black1
3/21/2014black0
3/24/2014black6
3/25/2014black7
3/26/2014black6
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
input date : mmddyy10.     color $     value ;
format      date mmddyy10.;
cards;
3/17/2014     white     3
3/18/2014     white     2
3/19/2014     white     1
3/20/2014     white     2
3/21/2014     white     0
3/24/2014     white     38
3/25/2014     white     38
3/26/2014     white     38
3/17/2014     black     1
3/18/2014     black     3
3/19/2014     black     2
3/20/2014     black     1
3/21/2014     black     0
3/24/2014     black     6
3/25/2014     black     7
3/26/2014     black     6
;
run;
data want;
 set have;
 if color ne lag(color)  then do; n=0;sum=0;end;
 n+1;sum+value;
 if n in (1:4);
 drop n;
run;


Xia Keshan

View solution in original post

11 REPLIES 11
LinusH
Tourmaline | Level 20

You can use the data step with set, by, retain and first.-logic to accomplish this.

Data never sleeps
Astounding
PROC Star

Judging by the language you use to describe the problem, you are more accustomed to using SQL than a DATA step.  SQL will not be a reliable tool for this application, as it does not guarantee the order of extracted records.  Instead, you can try:

proc sort data=have;

   by color date;

run;

data want;

   set have;

   by color date;

   if first.color then do;

      total = value;

      record_count=1;

   end;

   else do;

      total + value;

      record_count + 1;

   end;

   if record_count = 4;

run;

This approach outputs nothing for colors that have fewer than 4 records.  Slight changes to the logic can change that, but this DATA step illustrates the basic tools you will need to start with.


Ksharp
Super User
data have;
input date : mmddyy10.     color $     value ;
format      date mmddyy10.;
cards;
3/17/2014     white     3
3/18/2014     white     2
3/19/2014     white     1
3/20/2014     white     2
3/21/2014     white     0
3/24/2014     white     38
3/25/2014     white     38
3/26/2014     white     38
3/17/2014     black     1
3/18/2014     black     3
3/19/2014     black     2
3/20/2014     black     1
3/21/2014     black     0
3/24/2014     black     6
3/25/2014     black     7
3/26/2014     black     6
;
run;
data want;
 set have;
 if color ne lag(color)  then do; n=0;sum=0;end;
 n+1;sum+value;
 if n in (1:4);
 drop n;
run;


Xia Keshan

stat_sas
Ammonite | Level 13

proc sort data=have;
by color date;
run;

data want;
set have;
if mod(_n_,4)=1 then grp+1;
run;

proc sql;
select color, sum(value) from want
where grp in (1,3)
group by color,grp;
quit;

Reeza
Super User

If you have SAS/ETS licensed look into proc expand.

shivkr
Fluorite | Level 6

Thanks all for your suggestion. Is it possible to get the same result using PROC SQL?

Astounding
PROC Star

No, it's not possible using PROC SQL.  SQL by its nature does not guarantee the order of the records that it retrieves.  It has no guaranteed way of determining which are the four records that you want for each COLOR.

Perhaps I shouldn't say it's impossible.  Perhaps you can devise a method to identify the records that you want to retrieve.  But nobody that I know has ever accomplished that.  There is an undocumented feature (MONOTONIC) that could help in theory ... if only it were guaranteed to work every time.  But it is not guaranteed so even if you got it to work properly once, it might not work properly the next time.

You just might be stuck having to learn to program in a DATA step for some types of problems.

Reeza
Super User

Yes, but not recommended.

If you're familiar with SQL and working with a SQL database you can use pass-through logic and program as if you were in a native DB.

Otherwise create a row counter, via a data step and then merge via the row counter.  More steps and less efficient.

stat_sas
Ammonite | Level 13

You can try this. As mentioned in above responses MONOTONIC ( ) function is undocumented so results should be interpreted carefully.

proc sql;

create table one as

select *, monotonic () as cnt

from have

order by color,date;

create table want as

select *, (cnt - min(cnt) + 1 ) as cnt_grp

from one

group by color

order by cnt;

select color,sum(value) as total_4 from want

where cnt_grp<5

group by color;

quit;

Haikuo
Onyx | Level 15

if you have to resort to Proc SQL, there is a documented solution to get the sequence number:

ods _all_ close;

ods output sql_results=want;

proc sql number;

  select * from have;

quit;

ods output close;

ods listing;

Haikuo

stat_sas
Ammonite | Level 13

Thanks Haikuo.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 17102 views
  • 1 like
  • 7 in conversation