- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
See this:
data have;
input X $ d :mmddyy10. y;
format d yymmddd10.;
cards;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
;
proc sql;
/* first, consolidate per day */
create table times as
select
x,
d,
count(d) as times,
sum(y) as y
from have
group by x, d
;
/* now, calculate the rolling window */
create table want as
select
a.x,
a.d,
sum(b.times) as times,
sum(b.y) as sum
from times a
left join times b
on a.x = b.x and a.d - 100 le b.d le a.d
group by a.x, a.d
;
quit;
b has a slightly different result, because I show each date only once.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Good morning,
I have observations of thousands of individuals (variable X for individuals a, b, c, d.. ) overtime (d) and the money each individual spent on that date (Y). The data are sorted by X and d
data k;
input X $ d y ;
informat d mmddyy8. ;
cards ;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
; run;
I am trying to resolve two questions:
- How many times one individual spent money in a period of 100 days? (this is the variable “times” in the solution below)
- How much money has she/he spent in a the last 100 days? (this is “sum” in the solutions below.
So the question is to find out “times” and “sum”: the times an individual appears in the data base in the last 100 days, and the money spent in the last by this individual in the last 100 days.
For instance, for individual b:
She appears on 2/3/1997 for the first time, so times = 1; sum = 6.2
She appears again on 3/25/1997 , or 50 days later, so times = 2 (2/3/1997 and 3/25/1997) and sum is = 12 (5.8+6.2).
She appears again on 11/15/1998, or 300 days later, so times = 1 (because 300 > 100)) and sum is = 2.6.
She appears again on 1/14/1999, or 60 days later, so times = 2 and sum is = 2.6 + 4.9 = 7.5.
She appears again on 1/29/1999, or 15 days later, so times = 3 (3 times in 100 days, on 11/15/1998, on 1/14/1999 and on 1/29/1999. Sum is the value of Y on these 3 dates (3.7+4.9+2.6)
She appears again on 3/10/1999, or 40 days later, so times = 3 (3 times in 100 days, on 1/14/1999 on 1/29/1999 and on 3/10/1999. Sum is the value of Y on these 3 dates (3.7+4.9+2.3)
I did the complete solution in excel, below (and in the attached file). Diff is the number of days between two dates (I did this to get to the solution in excel; it may not be necessary). This difference is -999 when we change to a different individual.
In this case the solution is:
x | d | y | diff | times | sum |
a | 1/1/1996 | 5.1 | -999 | 1 | 5.1 |
a | 10/27/1996 | 2.3 | 300 | 1 | 2.3 |
a | 12/16/1996 | 8.1 | 50 | 1 | 10.4 |
a | 1/5/1997 | 9.6 | 20 | 2 | 20 |
a | 1/11/1997 | 1.6 | 6 | 3 | 21.6 |
b | 2/3/1997 | 6.2 | -999 | 1 | 6.2 |
b | 3/25/1997 | 5.8 | 50 | 2 | 12 |
b | 11/15/1998 | 2.6 | 600 | 1 | 2.6 |
b | 1/14/1999 | 4.9 | 60 | 1 | 7.5 |
b | 1/29/1999 | 3.7 | 15 | 2 | 11.2 |
b | 3/10/1999 | 4.6 | 40 | 3 | 8.3 |
b | 3/10/1999 | 2.3 | 0 | 4 | 10.6 |
c | 3/16/1999 | 9.6 | -999 | 1 | 9.6 |
Thank you for your help,
Tomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
See this:
data have;
input X $ d :mmddyy10. y;
format d yymmddd10.;
cards;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
;
proc sql;
/* first, consolidate per day */
create table times as
select
x,
d,
count(d) as times,
sum(y) as y
from have
group by x, d
;
/* now, calculate the rolling window */
create table want as
select
a.x,
a.d,
sum(b.times) as times,
sum(b.y) as sum
from times a
left join times b
on a.x = b.x and a.d - 100 le b.d le a.d
group by a.x, a.d
;
quit;
b has a slightly different result, because I show each date only once.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Good morning Kurt,
Thank you for your time. Your program does what I was expecting.
However, the program below (sent by other member of the community) corrects my own mistakes. This is: for the same individual/date (last two observations for individual b) we should have the same values for time and sum.
Thank you again,
Have a good weekend
Tomas
data k;
input X $ d y ;
informat d mmddyy8. ;
cards ;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
;
proc sql;
create table want as
select *,
(select count(*) from k where x=a.x and d between a.d-100 and a.d) as times,
(select sum(y) from k where x=a.x and d between a.d-100 and a.d) as sum
from k as a;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
it is the same question but with a few more explanations .. and perhaps it does not belong to the procedures, and fits better in data management part of SAS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Are you using SAS DI studio or one of the tools listed below?
Otherwise, this does belong in general programming forum.
Forum description for SAS Data Management:
SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop, SAS Data Preparation and others
@Thomas_mp wrote:
Hi Reeza,
it is the same question but with a few more explanations .. and perhaps it does not belong to the procedures, and fits better in data management part of SAS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I hope you don't really want -999 in values in a SAS data set. If that "value" is supposed to indicate missing value then SAS has a perfectly good missing value (actually better, you can use . , .A through .Z and ._ to indicate different reasons why the value might be missing).
Unlike -999 the missing values would not be accidentally used in computations.