See this:
data have;
input X $ d :mmddyy10. y;
format d yymmddd10.;
cards;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
;
proc sql;
/* first, consolidate per day */
create table times as
select
x,
d,
count(d) as times,
sum(y) as y
from have
group by x, d
;
/* now, calculate the rolling window */
create table want as
select
a.x,
a.d,
sum(b.times) as times,
sum(b.y) as sum
from times a
left join times b
on a.x = b.x and a.d - 100 le b.d le a.d
group by a.x, a.d
;
quit;
b has a slightly different result, because I show each date only once.
Good morning,
I have observations of thousands of individuals (variable X for individuals a, b, c, d.. ) overtime (d) and the money each individual spent on that date (Y). The data are sorted by X and d
data k;
input X $ d y ;
informat d mmddyy8. ;
cards ;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
; run;
I am trying to resolve two questions:
So the question is to find out “times” and “sum”: the times an individual appears in the data base in the last 100 days, and the money spent in the last by this individual in the last 100 days.
For instance, for individual b:
She appears on 2/3/1997 for the first time, so times = 1; sum = 6.2
She appears again on 3/25/1997 , or 50 days later, so times = 2 (2/3/1997 and 3/25/1997) and sum is = 12 (5.8+6.2).
She appears again on 11/15/1998, or 300 days later, so times = 1 (because 300 > 100)) and sum is = 2.6.
She appears again on 1/14/1999, or 60 days later, so times = 2 and sum is = 2.6 + 4.9 = 7.5.
She appears again on 1/29/1999, or 15 days later, so times = 3 (3 times in 100 days, on 11/15/1998, on 1/14/1999 and on 1/29/1999. Sum is the value of Y on these 3 dates (3.7+4.9+2.6)
She appears again on 3/10/1999, or 40 days later, so times = 3 (3 times in 100 days, on 1/14/1999 on 1/29/1999 and on 3/10/1999. Sum is the value of Y on these 3 dates (3.7+4.9+2.3)
I did the complete solution in excel, below (and in the attached file). Diff is the number of days between two dates (I did this to get to the solution in excel; it may not be necessary). This difference is -999 when we change to a different individual.
In this case the solution is:
x | d | y | diff | times | sum |
a | 1/1/1996 | 5.1 | -999 | 1 | 5.1 |
a | 10/27/1996 | 2.3 | 300 | 1 | 2.3 |
a | 12/16/1996 | 8.1 | 50 | 1 | 10.4 |
a | 1/5/1997 | 9.6 | 20 | 2 | 20 |
a | 1/11/1997 | 1.6 | 6 | 3 | 21.6 |
b | 2/3/1997 | 6.2 | -999 | 1 | 6.2 |
b | 3/25/1997 | 5.8 | 50 | 2 | 12 |
b | 11/15/1998 | 2.6 | 600 | 1 | 2.6 |
b | 1/14/1999 | 4.9 | 60 | 1 | 7.5 |
b | 1/29/1999 | 3.7 | 15 | 2 | 11.2 |
b | 3/10/1999 | 4.6 | 40 | 3 | 8.3 |
b | 3/10/1999 | 2.3 | 0 | 4 | 10.6 |
c | 3/16/1999 | 9.6 | -999 | 1 | 9.6 |
Thank you for your help,
Tomas
See this:
data have;
input X $ d :mmddyy10. y;
format d yymmddd10.;
cards;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
;
proc sql;
/* first, consolidate per day */
create table times as
select
x,
d,
count(d) as times,
sum(y) as y
from have
group by x, d
;
/* now, calculate the rolling window */
create table want as
select
a.x,
a.d,
sum(b.times) as times,
sum(b.y) as sum
from times a
left join times b
on a.x = b.x and a.d - 100 le b.d le a.d
group by a.x, a.d
;
quit;
b has a slightly different result, because I show each date only once.
Good morning Kurt,
Thank you for your time. Your program does what I was expecting.
However, the program below (sent by other member of the community) corrects my own mistakes. This is: for the same individual/date (last two observations for individual b) we should have the same values for time and sum.
Thank you again,
Have a good weekend
Tomas
data k;
input X $ d y ;
informat d mmddyy8. ;
cards ;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
;
proc sql;
create table want as
select *,
(select count(*) from k where x=a.x and d between a.d-100 and a.d) as times,
(select sum(y) from k where x=a.x and d between a.d-100 and a.d) as sum
from k as a;
quit;
Are you using SAS DI studio or one of the tools listed below?
Otherwise, this does belong in general programming forum.
Forum description for SAS Data Management:
SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop, SAS Data Preparation and others
@Thomas_mp wrote:
Hi Reeza,
it is the same question but with a few more explanations .. and perhaps it does not belong to the procedures, and fits better in data management part of SAS?
I hope you don't really want -999 in values in a SAS data set. If that "value" is supposed to indicate missing value then SAS has a perfectly good missing value (actually better, you can use . , .A through .Z and ._ to indicate different reasons why the value might be missing).
Unlike -999 the missing values would not be accidentally used in computations.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.