Good morning,
I have observations of thousands of individuals (variable X for individuals a, b, c, d.. ) overtime (d) and the money each individual spent on that date (Y). The data are sorted by X and d
data k;
input X d y ;
cards ;
a 1/1/1996 5.1
a 10/27/1996 2.3
a 12/16/1996 8.1
a 1/5/1997 9.6
a 1/11/1997 1.6
b 2/3/1997 6.2
b 3/25/1997 5.8
b 11/15/1998 2.6
b 1/14/1999 4.9
b 4/14/1999 3.7
b 5/7/1999 4.6
b 5/7/1999 2.3
c 5/13/1999 9.6
I am trying to resolve two questions:
So the question is to find out “times” and “sum”: the times an individual appears in the data base in the last 100 days, and the money spent in the last by this individual in the last 100 days.
The solution in this case is below (and in the attached file). Diff is the number of days between two dates (I did this to get to the solution in excel; it may not be necessary). This difference is -999 when we change to a different individual.
In this case the solution is:
x | d | y | diff | times | sum |
a | 1/1/1996 | 5.1 | -999 | 1 | 5.1 |
a | 10/27/1996 | 2.3 | 300 | 1 | 2.3 |
a | 12/16/1996 | 8.1 | 50 | 1 | 10.4 |
a | 1/5/1997 | 9.6 | 20 | 2 | 20 |
a | 1/11/1997 | 1.6 | 6 | 3 | 21.6 |
b | 2/3/1997 | 6.2 | -999 | 1 | 6.2 |
b | 3/25/1997 | 5.8 | 50 | 2 | 12 |
b | 11/15/1998 | 2.6 | 600 | 1 | 2.6 |
b | 1/14/1999 | 4.9 | 60 | 1 | 7.5 |
b | 1/29/1999 | 3.7 | 15 | 2 | 11.2 |
b | 3/10/1999 | 4.6 | 40 | 3 | 8.3 |
b | 3/10/1999 | 2.3 | 0 | 4 | 10.6 |
c | 3/16/1999 | 9.6 | -999 | 1 | 9.6 |
Thank you for your help,
Tomas
If you don't have a big table.
data k;
input X $ d y ;
informat d mmddyy8. ;
cards ;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
;
proc sql;
create table want as
select *,
(select count(*) from k where x=a.x and d between a.d-100 and a.d) as times,
(select sum(y) from k where x=a.x and d between a.d-100 and a.d) as sum
from k as a;
quit;
Please do a test of your data step code, and fix it.
Sorry Kurt:
data k;
input X $ d y ;
informat d mmddyy8. ;
cards ;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
How do you expect the "last 100 days" to be specified? From a given date, the date the program runs or something else?
A where clause with the start and end dates will subset the data and counts + sums points towards Proc Means or Summary with a Class or By statement with the individual identification variable(s)
If you don't have a big table.
data k;
input X $ d y ;
informat d mmddyy8. ;
cards ;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
;
proc sql;
create table want as
select *,
(select count(*) from k where x=a.x and d between a.d-100 and a.d) as times,
(select sum(y) from k where x=a.x and d between a.d-100 and a.d) as sum
from k as a;
quit;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.