Good morning,
I have observations of thousands of individuals (variable X for individuals a, b, c, d.. ) overtime (d) and the money each individual spent on that date (Y). The data are sorted by X and d
data k;
input X d y ;
cards ;
a 1/1/1996 5.1
a 10/27/1996 2.3
a 12/16/1996 8.1
a 1/5/1997 9.6
a 1/11/1997 1.6
b 2/3/1997 6.2
b 3/25/1997 5.8
b 11/15/1998 2.6
b 1/14/1999 4.9
b 4/14/1999 3.7
b 5/7/1999 4.6
b 5/7/1999 2.3
c 5/13/1999 9.6
I am trying to resolve two questions:
So the question is to find out “times” and “sum”: the times an individual appears in the data base in the last 100 days, and the money spent in the last by this individual in the last 100 days.
The solution in this case is below (and in the attached file). Diff is the number of days between two dates (I did this to get to the solution in excel; it may not be necessary). This difference is -999 when we change to a different individual.
In this case the solution is:
x | d | y | diff | times | sum |
a | 1/1/1996 | 5.1 | -999 | 1 | 5.1 |
a | 10/27/1996 | 2.3 | 300 | 1 | 2.3 |
a | 12/16/1996 | 8.1 | 50 | 1 | 10.4 |
a | 1/5/1997 | 9.6 | 20 | 2 | 20 |
a | 1/11/1997 | 1.6 | 6 | 3 | 21.6 |
b | 2/3/1997 | 6.2 | -999 | 1 | 6.2 |
b | 3/25/1997 | 5.8 | 50 | 2 | 12 |
b | 11/15/1998 | 2.6 | 600 | 1 | 2.6 |
b | 1/14/1999 | 4.9 | 60 | 1 | 7.5 |
b | 1/29/1999 | 3.7 | 15 | 2 | 11.2 |
b | 3/10/1999 | 4.6 | 40 | 3 | 8.3 |
b | 3/10/1999 | 2.3 | 0 | 4 | 10.6 |
c | 3/16/1999 | 9.6 | -999 | 1 | 9.6 |
Thank you for your help,
Tomas
If you don't have a big table.
data k;
input X $ d y ;
informat d mmddyy8. ;
cards ;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
;
proc sql;
create table want as
select *,
(select count(*) from k where x=a.x and d between a.d-100 and a.d) as times,
(select sum(y) from k where x=a.x and d between a.d-100 and a.d) as sum
from k as a;
quit;
Please do a test of your data step code, and fix it.
Sorry Kurt:
data k;
input X $ d y ;
informat d mmddyy8. ;
cards ;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
How do you expect the "last 100 days" to be specified? From a given date, the date the program runs or something else?
A where clause with the start and end dates will subset the data and counts + sums points towards Proc Means or Summary with a Class or By statement with the individual identification variable(s)
If you don't have a big table.
data k;
input X $ d y ;
informat d mmddyy8. ;
cards ;
a 01/01/96 5.1
a 10/27/96 2.3
a 12/16/96 8.1
a 01/05/97 9.6
a 01/11/97 1.6
b 02/03/97 6.2
b 03/25/97 5.8
b 11/15/98 2.6
b 01/14/99 4.9
b 01/29/99 3.7
b 03/10/99 4.6
b 03/10/99 2.3
c 03/16/99 9.6
;
proc sql;
create table want as
select *,
(select count(*) from k where x=a.x and d between a.d-100 and a.d) as times,
(select sum(y) from k where x=a.x and d between a.d-100 and a.d) as sum
from k as a;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.