Hello;
I have a data set which shows the sales data over a period of time in different zip codes. In a second data set, I have the purchase date of the product for each customer.
I want to calculate the total sales from the beginning of time until the month before each customer has purchased the product in each zip code; in other words, I want to know the aggregate number of people had purchased the product in the vicinity (zip code) of each customer until the month before he/she purchased the product. ; I appreciate if anyone can give me a hint of how I can do this.
Thanks for reading this post.
the data sets are like this:
Sales data set:
*Please note that all of the missing observations in this data set should be considered zero;
data sales;
input obs zipcode jan10 feb10 mar10 apr10 mar10 jun10 july10 aug10 sep10 oct10 nov10 dec10 jan11 feb11;
data lines;
1 14001 1 . . . . . . . . . . . . . . . . .
2 14009 1 . . . . . . . . . . . . . . . . .
3 14011 . 1 . . . . . . . . . . . . . . . .
4 14020 . . 1 . . . . . . . . . . . . . . .
5 14026 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 14031 . . . . . . . . . . . . . . . . . .
7 14036 . . 1 . . . . . . . . . . . . . . .
8 14043 . . . . . . . . . . . . 1 1 . . . .
9 14047 . . . . . . . . . . . . . . . . . .
10 14048 . . . . . . . . . . . . . . . . . .;
run;
purchase data set:
data purchasedate;
input customername $ customerzipcode purchasedate $;
datalines;
Customer1 14001 July10
customer2 14043 Jan11
....
run;
My code assumes that the columns in the sales table are in chronological order, as they were in your example. - PG
It can be done this way :
data sales;
input obs zipcode jan10 feb10 mar10 apr10 may10 jun10 july10 aug10 sep10
oct10 nov10 dec10 jan11 feb11 mar11 apr11 may11 jun11;
datalines;
1 14001 1 . . . . . . . . . . . . . . . . .
2 14009 1 . . . . . . . . . . . . . . . . .
3 14011 . 1 . . . . . . . . . . . . . . . .
4 14020 . . 1 . . . . . . . . . . . . . . .
5 14026 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 14031 . . . . . . . . . . . . . . . . . .
7 14036 . . 1 . . . . . . . . . . . . . . .
8 14043 . . . . 1 1 . . . . . . . . . . . .
9 14047 . . . . . . . . . . . . . . . . . .
10 14048 . . . . . . . . . . . . . . . . . .
;
proc sort data=sales; by zipcode; run;
data purchasedate;
length customerName $12;
input customerName $ customerZipcode purchaseDate $;
datalines;
Customer1 14001 July10
customer2 14043 Jan11
;
proc sort data=purchaseDate; by customerZipcode customerName; run;
data previousSales(keep=customerName zipcode purchaseDate cumSales);
if 0 then set sales; /* Do nothing, get the variable names from sales to form array s */
array s{*} _numeric_;
merge purchaseDate(in=ok rename=customerZipcode=zipcode) sales;
by zipcode;
if ok then do;
cumSales = 0;
do i = 3 to dim(s) while (upcase(vname(s{i})) ne upcase(purchaseDate));
cumSales + s{i};
end;
output;
end;
run;
proc print; run;
PG
Thanks for the quick reply.
I tried your method, but I think something is going wrong with the sas date values, the results do not seem correct.
This is a part of the real data, maybe it helps;
| customer | purchasedate | Zip |
| customer1 | 4/22/2009 | 14227 |
| customer2 | 11/29/2010 | 14001 |
| customer3 | 11/3/2009 | 14217 |
| customer4 | 11/18/2010 | 14051 |
| customer5 | 4/19/2010 | 14224 |
| customer6 | 4/2/2009 | 14120 |
| customer7 | 2/8/2011 | 14011 |
| customer8 | 6/3/2010 | 14052 |
| customer9 | 6/19/2009 | 14569 |
| customer10 | 4/6/2010 | 14221 |
| customer11 | 6/11/2010 | 14226 |
| customer12 | 10/5/2010 | 14221 |
| customer13 | 11/11/2010 | 14224 |
| customer14 | 6/12/2009 | 14221 |
| customer15 | 11/1/2010 | 14215 |
| customer16 | 5/10/2010 | 14226 |
| customer17 | 10/28/2009 | 14224 |
| customer18 | 6/3/2010 | 14215 |
| customer19 | 5/15/2009 | 14221 |
| customer20 | 7/16/2010 | 14209 |
| customer21 | 8/27/2007 | 14215 |
| customer22 | 5/9/2009 | 14225 |
| customer23 | 9/16/2009 | 14141 |
| customer24 | 11/19/2010 | 14569 |
| customer25 | 6/18/2009 | 14224 |
| customer26 | 4/30/2009 | 14224 |
| customer27 | 2/17/2011 | 14127 |
| customer28 | 5/1/2009 | 14224 |
| customer29 | 12/10/2010 | 14209 |
| customer30 | 7/17/2009 | 14216 |
| Zip | NOV2010 | FEB2011 | JUL2010 | JUL2009 | AUG2009 | SEP2009 | OCT2009 | NOV2009 | DEC2009 | JAN2010 | FEB2010 | MAR2010 | APR2010 | MAY2010 |
| 14001 | 1 | |||||||||||||
| 14009 | 1 | |||||||||||||
| 14011 | 1 | |||||||||||||
| 14020 | 1 | |||||||||||||
| 14026 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 14031 | ||||||||||||||
| 14036 | 1 | |||||||||||||
| 14043 | 1 | 1 | ||||||||||||
| 14047 | ||||||||||||||
| 14048 | ||||||||||||||
| 14051 | 1 | 0 | 1 | 0 | 0 | 0 | ||||||||
| 14052 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 14063 | ||||||||||||||
| 14070 | 1 | |||||||||||||
| 14075 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 14086 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 14092 | ||||||||||||||
| 14094 | 0 | 1 | ||||||||||||
| 14103 | ||||||||||||||
| 14120 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | |
| 14127 | 1 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 1 | 0 | 0 | 0 |
| 14141 | 1 | |||||||||||||
| 14150 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| 14172 | 1 | |||||||||||||
| 14201 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | ||||||
| 14202 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 14203 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 14204 | ||||||||||||||
| 14207 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
My code assumes that the columns in the sales table are in chronological order, as they were in your example. - PG
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.