Hello;
I have a data set which shows the sales data over a period of time in different zip codes. In a second data set, I have the purchase date of the product for each customer.
I want to calculate the total sales from the beginning of time until the month before each customer has purchased the product in each zip code; in other words, I want to know the aggregate number of people had purchased the product in the vicinity (zip code) of each customer until the month before he/she purchased the product. ; I appreciate if anyone can give me a hint of how I can do this.
Thanks for reading this post.
the data sets are like this:
Sales data set:
*Please note that all of the missing observations in this data set should be considered zero;
data sales;
input obs zipcode jan10 feb10 mar10 apr10 mar10 jun10 july10 aug10 sep10 oct10 nov10 dec10 jan11 feb11;
data lines;
1 14001 1 . . . . . . . . . . . . . . . . .
2 14009 1 . . . . . . . . . . . . . . . . .
3 14011 . 1 . . . . . . . . . . . . . . . .
4 14020 . . 1 . . . . . . . . . . . . . . .
5 14026 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 14031 . . . . . . . . . . . . . . . . . .
7 14036 . . 1 . . . . . . . . . . . . . . .
8 14043 . . . . . . . . . . . . 1 1 . . . .
9 14047 . . . . . . . . . . . . . . . . . .
10 14048 . . . . . . . . . . . . . . . . . .;
run;
purchase data set:
data purchasedate;
input customername $ customerzipcode purchasedate $;
datalines;
Customer1 14001 July10
customer2 14043 Jan11
....
run;
My code assumes that the columns in the sales table are in chronological order, as they were in your example. - PG
It can be done this way :
data sales;
input obs zipcode jan10 feb10 mar10 apr10 may10 jun10 july10 aug10 sep10
oct10 nov10 dec10 jan11 feb11 mar11 apr11 may11 jun11;
datalines;
1 14001 1 . . . . . . . . . . . . . . . . .
2 14009 1 . . . . . . . . . . . . . . . . .
3 14011 . 1 . . . . . . . . . . . . . . . .
4 14020 . . 1 . . . . . . . . . . . . . . .
5 14026 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 14031 . . . . . . . . . . . . . . . . . .
7 14036 . . 1 . . . . . . . . . . . . . . .
8 14043 . . . . 1 1 . . . . . . . . . . . .
9 14047 . . . . . . . . . . . . . . . . . .
10 14048 . . . . . . . . . . . . . . . . . .
;
proc sort data=sales; by zipcode; run;
data purchasedate;
length customerName $12;
input customerName $ customerZipcode purchaseDate $;
datalines;
Customer1 14001 July10
customer2 14043 Jan11
;
proc sort data=purchaseDate; by customerZipcode customerName; run;
data previousSales(keep=customerName zipcode purchaseDate cumSales);
if 0 then set sales; /* Do nothing, get the variable names from sales to form array s */
array s{*} _numeric_;
merge purchaseDate(in=ok rename=customerZipcode=zipcode) sales;
by zipcode;
if ok then do;
cumSales = 0;
do i = 3 to dim(s) while (upcase(vname(s{i})) ne upcase(purchaseDate));
cumSales + s{i};
end;
output;
end;
run;
proc print; run;
PG
Thanks for the quick reply.
I tried your method, but I think something is going wrong with the sas date values, the results do not seem correct.
This is a part of the real data, maybe it helps;
customer | purchasedate | Zip |
customer1 | 4/22/2009 | 14227 |
customer2 | 11/29/2010 | 14001 |
customer3 | 11/3/2009 | 14217 |
customer4 | 11/18/2010 | 14051 |
customer5 | 4/19/2010 | 14224 |
customer6 | 4/2/2009 | 14120 |
customer7 | 2/8/2011 | 14011 |
customer8 | 6/3/2010 | 14052 |
customer9 | 6/19/2009 | 14569 |
customer10 | 4/6/2010 | 14221 |
customer11 | 6/11/2010 | 14226 |
customer12 | 10/5/2010 | 14221 |
customer13 | 11/11/2010 | 14224 |
customer14 | 6/12/2009 | 14221 |
customer15 | 11/1/2010 | 14215 |
customer16 | 5/10/2010 | 14226 |
customer17 | 10/28/2009 | 14224 |
customer18 | 6/3/2010 | 14215 |
customer19 | 5/15/2009 | 14221 |
customer20 | 7/16/2010 | 14209 |
customer21 | 8/27/2007 | 14215 |
customer22 | 5/9/2009 | 14225 |
customer23 | 9/16/2009 | 14141 |
customer24 | 11/19/2010 | 14569 |
customer25 | 6/18/2009 | 14224 |
customer26 | 4/30/2009 | 14224 |
customer27 | 2/17/2011 | 14127 |
customer28 | 5/1/2009 | 14224 |
customer29 | 12/10/2010 | 14209 |
customer30 | 7/17/2009 | 14216 |
Zip | NOV2010 | FEB2011 | JUL2010 | JUL2009 | AUG2009 | SEP2009 | OCT2009 | NOV2009 | DEC2009 | JAN2010 | FEB2010 | MAR2010 | APR2010 | MAY2010 |
14001 | 1 | |||||||||||||
14009 | 1 | |||||||||||||
14011 | 1 | |||||||||||||
14020 | 1 | |||||||||||||
14026 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
14031 | ||||||||||||||
14036 | 1 | |||||||||||||
14043 | 1 | 1 | ||||||||||||
14047 | ||||||||||||||
14048 | ||||||||||||||
14051 | 1 | 0 | 1 | 0 | 0 | 0 | ||||||||
14052 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
14063 | ||||||||||||||
14070 | 1 | |||||||||||||
14075 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
14086 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
14092 | ||||||||||||||
14094 | 0 | 1 | ||||||||||||
14103 | ||||||||||||||
14120 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | |
14127 | 1 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 1 | 0 | 0 | 0 |
14141 | 1 | |||||||||||||
14150 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
14172 | 1 | |||||||||||||
14201 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | ||||||
14202 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
14203 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
14204 | ||||||||||||||
14207 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
My code assumes that the columns in the sales table are in chronological order, as they were in your example. - PG
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.