BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

See this:

data have;
input X $ d :mmddyy10. y;
format d yymmddd10.;
cards;
a     01/01/96    5.1
a     10/27/96    2.3
a     12/16/96    8.1
a     01/05/97    9.6
a     01/11/97    1.6
b     02/03/97    6.2
b     03/25/97    5.8
b     11/15/98    2.6
b     01/14/99    4.9
b     01/29/99    3.7
b     03/10/99    4.6
b     03/10/99    2.3
c     03/16/99    9.6
;

proc sql;
/* first, consolidate per day */
create table times as
  select
    x,
    d,
    count(d) as times,
    sum(y) as y
  from have
  group by x, d
;
/* now, calculate the rolling window */
create table want as
  select
    a.x,
    a.d,
    sum(b.times) as times,
    sum(b.y) as sum
  from times a
  left join times b
  on a.x = b.x and a.d - 100 le b.d le a.d
  group by a.x, a.d
;
quit;

b has a slightly different result, because I show each date only once.

View solution in original post

7 REPLIES 7
Thomas_mp
Obsidian | Level 7

Good morning,

I have observations of thousands of individuals (variable X  for individuals  a, b, c, d.. ) overtime (d) and the money each individual spent on that date (Y). The data are sorted by X and d

 

data k;

input X $ d y  ;

informat d mmddyy8.  ;

cards ;

a     01/01/96    5.1

a     10/27/96    2.3

a     12/16/96    8.1

a     01/05/97    9.6

a     01/11/97    1.6

b     02/03/97    6.2

b     03/25/97    5.8

b     11/15/98    2.6

b     01/14/99    4.9

b     01/29/99    3.7

b     03/10/99    4.6

b     03/10/99    2.3

c     03/16/99    9.6

 

 

; run;

I am trying to resolve two questions:

  1. How many times one individual spent money in a period of 100 days?  (this is the variable “times” in the solution below)
  2. How much money has she/he spent in a the last 100 days? (this is “sum” in the solutions below.

So the question is to find out “times” and “sum”: the times an individual appears in the data base in the last 100 days, and the money spent in the last by this individual in the last 100 days.

For instance, for individual b:

She appears on 2/3/1997 for the first time, so times  =  1; sum = 6.2

She appears again on 3/25/1997 , or 50 days later, so times = 2 (2/3/1997 and 3/25/1997) and sum is = 12 (5.8+6.2).

She appears again on 11/15/1998, or 300 days later, so times = 1 (because 300 > 100)) and sum is = 2.6.

She appears again on 1/14/1999, or 60 days later, so times = 2  and sum is = 2.6 + 4.9 = 7.5.

She appears again on 1/29/1999, or 15 days later, so times = 3 (3 times in 100 days, on 11/15/1998, on  1/14/1999 and on 1/29/1999.  Sum is the value of Y on these 3 dates (3.7+4.9+2.6)

She appears again on 3/10/1999, or 40 days later, so times = 3 (3 times in 100 days, on 1/14/1999 on 1/29/1999 and on 3/10/1999.  Sum is the value of Y on these 3 dates (3.7+4.9+2.3)

I did the complete solution in excel, below (and in the attached file).  Diff is the number of days between two dates (I did this to get to the solution in excel; it may not be necessary).  This difference is -999 when we change to a different individual. 

In this case the solution is:

x

d

y

diff

times

sum

a

1/1/1996

5.1

-999

1

5.1

a

10/27/1996

2.3

300

1

2.3

a

12/16/1996

8.1

50

1

10.4

a

1/5/1997

9.6

20

2

20

a

1/11/1997

1.6

6

3

21.6

b

2/3/1997

6.2

-999

1

6.2

b

3/25/1997

5.8

50

2

12

b

11/15/1998

2.6

600

1

2.6

b

1/14/1999

4.9

60

1

7.5

b

1/29/1999

3.7

15

2

11.2

b

3/10/1999

4.6

40

3

8.3

b

3/10/1999

2.3

0

4

10.6

c

3/16/1999

9.6

-999

1

9.6

 

 

Thank you for your help,

 

Tomas

Kurt_Bremser
Super User

See this:

data have;
input X $ d :mmddyy10. y;
format d yymmddd10.;
cards;
a     01/01/96    5.1
a     10/27/96    2.3
a     12/16/96    8.1
a     01/05/97    9.6
a     01/11/97    1.6
b     02/03/97    6.2
b     03/25/97    5.8
b     11/15/98    2.6
b     01/14/99    4.9
b     01/29/99    3.7
b     03/10/99    4.6
b     03/10/99    2.3
c     03/16/99    9.6
;

proc sql;
/* first, consolidate per day */
create table times as
  select
    x,
    d,
    count(d) as times,
    sum(y) as y
  from have
  group by x, d
;
/* now, calculate the rolling window */
create table want as
  select
    a.x,
    a.d,
    sum(b.times) as times,
    sum(b.y) as sum
  from times a
  left join times b
  on a.x = b.x and a.d - 100 le b.d le a.d
  group by a.x, a.d
;
quit;

b has a slightly different result, because I show each date only once.

Thomas_mp
Obsidian | Level 7

Good morning Kurt,

Thank you for your time.  Your program does what I was expecting. 

However, the program below (sent by other member of the community) corrects my own mistakes. This is: for the same individual/date (last two observations for individual b) we should have the same values for  time and sum.

Thank you again,

Have a good weekend

Tomas

data k;
input X $ d y ;
informat d mmddyy8. ;
cards ;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
;
 
proc sql;
create table want as
select *,
(select count(*) from k where x=a.x and d between a.d-100 and a.d) as times,
(select sum(y) from k where x=a.x and d between a.d-100 and a.d) as sum
 from k as a;
quit;

 

Thomas_mp
Obsidian | Level 7
Hi Reeza,
it is the same question but with a few more explanations .. and perhaps it does not belong to the procedures, and fits better in data management part of SAS?
Reeza
Super User

Are you using SAS DI studio or one of the tools listed below?

Otherwise, this does belong in general programming forum. 

 

Forum description for SAS Data Management:

SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop, SAS Data Preparation and others




@Thomas_mp wrote:
Hi Reeza,
it is the same question but with a few more explanations .. and perhaps it does not belong to the procedures, and fits better in data management part of SAS?

 

ballardw
Super User

I hope you don't really want -999 in values in a SAS data set. If that "value" is supposed to indicate missing value then SAS has a perfectly good missing value (actually better, you can use . , .A through .Z and ._ to indicate different reasons why the value might be missing).

 

Unlike -999 the missing values would not be accidentally used in computations.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2682 views
  • 1 like
  • 4 in conversation