Solved: Compute the number of observations and sum in a moving window

Thomas_mp · Posted 07-10-2020 04:19 PM

Kurt_Bremser · Posted 07-11-2020 05:41 AM

See this:

data have;
input X $ d :mmddyy10. y;
format d yymmddd10.;
cards;
a     01/01/96    5.1
a     10/27/96    2.3
a     12/16/96    8.1
a     01/05/97    9.6
a     01/11/97    1.6
b     02/03/97    6.2
b     03/25/97    5.8
b     11/15/98    2.6
b     01/14/99    4.9
b     01/29/99    3.7
b     03/10/99    4.6
b     03/10/99    2.3
c     03/16/99    9.6
;

proc sql;
/* first, consolidate per day */
create table times as
  select
    x,
    d,
    count(d) as times,
    sum(y) as y
  from have
  group by x, d
;
/* now, calculate the rolling window */
create table want as
  select
    a.x,
    a.d,
    sum(b.times) as times,
    sum(b.y) as sum
  from times a
  left join times b
  on a.x = b.x and a.d - 100 le b.d le a.d
  group by a.x, a.d
;
quit;

b has a slightly different result, because I show each date only once.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Thomas_mp · Posted 07-10-2020 04:20 PM

Good morning,

I have observations of thousands of individuals (variable X for individuals a, b, c, d.. ) overtime (d) and the money each individual spent on that date (Y). The data are sorted by X and d

data k;

input X $ d y ;

informat d mmddyy8. ;

cards ;

a 01/01/96 5.1

a 10/27/96 2.3

a 12/16/96 8.1

a 01/05/97 9.6

a 01/11/97 1.6

b 02/03/97 6.2

b 03/25/97 5.8

b 11/15/98 2.6

b 01/14/99 4.9

b 01/29/99 3.7

b 03/10/99 4.6

b 03/10/99 2.3

c 03/16/99 9.6

; run;

I am trying to resolve two questions:

How many times one individual spent money in a period of 100 days? (this is the variable “times” in the solution below)
How much money has she/he spent in a the last 100 days? (this is “sum” in the solutions below.

So the question is to find out “times” and “sum”: the times an individual appears in the data base in the last 100 days, and the money spent in the last by this individual in the last 100 days.

For instance, for individual b:

She appears on 2/3/1997 for the first time, so times = 1; sum = 6.2

She appears again on 3/25/1997 , or 50 days later, so times = 2 (2/3/1997 and 3/25/1997) and sum is = 12 (5.8+6.2).

She appears again on 11/15/1998, or 300 days later, so times = 1 (because 300 > 100)) and sum is = 2.6.

She appears again on 1/14/1999, or 60 days later, so times = 2 and sum is = 2.6 + 4.9 = 7.5.

She appears again on 1/29/1999, or 15 days later, so times = 3 (3 times in 100 days, on 11/15/1998, on 1/14/1999 and on 1/29/1999. Sum is the value of Y on these 3 dates (3.7+4.9+2.6)

She appears again on 3/10/1999, or 40 days later, so times = 3 (3 times in 100 days, on 1/14/1999 on 1/29/1999 and on 3/10/1999. Sum is the value of Y on these 3 dates (3.7+4.9+2.3)

I did the complete solution in excel, below (and in the attached file). Diff is the number of days between two dates (I did this to get to the solution in excel; it may not be necessary). This difference is -999 when we change to a different individual.

In this case the solution is:

x	d	y	diff	times	sum
a	1/1/1996	5.1	-999	1	5.1
a	10/27/1996	2.3	300	1	2.3
a	12/16/1996	8.1	50	1	10.4
a	1/5/1997	9.6	20	2	20
a	1/11/1997	1.6	6	3	21.6
b	2/3/1997	6.2	-999	1	6.2
b	3/25/1997	5.8	50	2	12
b	11/15/1998	2.6	600	1	2.6
b	1/14/1999	4.9	60	1	7.5
b	1/29/1999	3.7	15	2	11.2
b	3/10/1999	4.6	40	3	8.3
b	3/10/1999	2.3	0	4	10.6
c	3/16/1999	9.6	-999	1	9.6

Thank you for your help,

Tomas

Kurt_Bremser · Posted 07-11-2020 05:41 AM

See this:

data have;
input X $ d :mmddyy10. y;
format d yymmddd10.;
cards;
a     01/01/96    5.1
a     10/27/96    2.3
a     12/16/96    8.1
a     01/05/97    9.6
a     01/11/97    1.6
b     02/03/97    6.2
b     03/25/97    5.8
b     11/15/98    2.6
b     01/14/99    4.9
b     01/29/99    3.7
b     03/10/99    4.6
b     03/10/99    2.3
c     03/16/99    9.6
;

proc sql;
/* first, consolidate per day */
create table times as
  select
    x,
    d,
    count(d) as times,
    sum(y) as y
  from have
  group by x, d
;
/* now, calculate the rolling window */
create table want as
  select
    a.x,
    a.d,
    sum(b.times) as times,
    sum(b.y) as sum
  from times a
  left join times b
  on a.x = b.x and a.d - 100 le b.d le a.d
  group by a.x, a.d
;
quit;

b has a slightly different result, because I show each date only once.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Thomas_mp · Posted 07-11-2020 11:31 AM

Good morning Kurt,

Thank you for your time. Your program does what I was expecting.

However, the program below (sent by other member of the community) corrects my own mistakes. This is: for the same individual/date (last two observations for individual b) we should have the same values for time and sum.

Thank you again,

Have a good weekend

Tomas

data k;

input X $ d y ;

informat d mmddyy8. ;

cards ;

a 01/01/96 5.1

a 10/27/96 2.3

a 12/16/96 8.1

a 01/05/97 9.6

a 01/11/97 1.6

b 02/03/97 6.2

b 03/25/97 5.8

b 11/15/98 2.6

b 01/14/99 4.9

b 01/29/99 3.7

b 03/10/99 4.6

b 03/10/99 2.3

c 03/16/99 9.6

proc sql;

create table want as

select *,

(select count(*) from k where x=a.x and d between a.d-100 and a.d) as times,

(select sum(y) from k where x=a.x and d between a.d-100 and a.d) as sum

 from k as a;

quit;

Reeza · Posted 07-10-2020 04:32 PM

Is this different than your question here: https://communities.sas.com/t5/SAS-Procedures/Find-out-the-how-much-a-person-spent-in-the-last-100-d...?

Thomas_mp · Posted 07-10-2020 04:51 PM

Hi Reeza,
it is the same question but with a few more explanations .. and perhaps it does not belong to the procedures, and fits better in data management part of SAS?

Reeza · Posted 07-10-2020 05:02 PM

Are you using SAS DI studio or one of the tools listed below?

Otherwise, this does belong in general programming forum.

Forum description for SAS Data Management:

SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop, SAS Data Preparation and others

@Thomas_mp wrote:
Hi Reeza,
it is the same question but with a few more explanations .. and perhaps it does not belong to the procedures, and fits better in data management part of SAS?

ballardw · Posted 07-10-2020 05:20 PM

I hope you don't really want -999 in values in a SAS data set. If that "value" is supposed to indicate missing value then SAS has a perfectly good missing value (actually better, you can use . , .A through .Z and ._ to indicate different reasons why the value might be missing).

Unlike -999 the missing values would not be accidentally used in computations.

Compute the number of observations and sum in a moving window

Re: Compute the number of observations and sum in a moving window

Compute the number of observations and sum in a moving window

Re: Compute the number of observations and sum in a moving window

Re: Compute the number of observations and sum in a moving window

Re: Compute the number of observations and sum in a moving window

Re: Compute the number of observations and sum in a moving window

Re: Compute the number of observations and sum in a moving window

Re: Compute the number of observations and sum in a moving window

SAS Innovate 2025: Call for Content