Hello;
I have a data set which shows the sales data over a period of time in different zip codes. In a second data set, I have the purchase date of the product for each customer.
I want to calculate the total sales from the beginning of time until the month before each customer has purchased the product in each zip code; in other words, I want to know the aggregate number of people had purchased the product in the vicinity (zip code) of each customer until the month before he/she purchased the product. ; I appreciate if anyone can give me a hint of how I can do this.
Thanks for reading this post.
the data sets are like this:
Sales data set:
*Please note that all of the missing observations in this data set should be considered zero;
data sales;
input obs zipcode jan10 feb10 mar10 apr10 mar10 jun10 july10 aug10 sep10 oct10 nov10 dec10 jan11 feb11;
data lines;
1 14001 1 . . . . . . . . . . . . . . . . .
2 14009 1 . . . . . . . . . . . . . . . . .
3 14011 . 1 . . . . . . . . . . . . . . . .
4 14020 . . 1 . . . . . . . . . . . . . . .
5 14026 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 14031 . . . . . . . . . . . . . . . . . .
7 14036 . . 1 . . . . . . . . . . . . . . .
8 14043 . . . . . . . . . . . . 1 1 . . . .
9 14047 . . . . . . . . . . . . . . . . . .
10 14048 . . . . . . . . . . . . . . . . . .;
run;
purchase data set:
data purchasedate;
input customername $ customerzipcode purchasedate $;
datalines;
Customer1 14001 July10
customer2 14043 Jan11
....
run;
My code assumes that the columns in the sales table are in chronological order, as they were in your example. - PG
It can be done this way :
data sales;
input obs zipcode jan10 feb10 mar10 apr10 may10 jun10 july10 aug10 sep10
oct10 nov10 dec10 jan11 feb11 mar11 apr11 may11 jun11;
datalines;
1 14001 1 . . . . . . . . . . . . . . . . .
2 14009 1 . . . . . . . . . . . . . . . . .
3 14011 . 1 . . . . . . . . . . . . . . . .
4 14020 . . 1 . . . . . . . . . . . . . . .
5 14026 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 14031 . . . . . . . . . . . . . . . . . .
7 14036 . . 1 . . . . . . . . . . . . . . .
8 14043 . . . . 1 1 . . . . . . . . . . . .
9 14047 . . . . . . . . . . . . . . . . . .
10 14048 . . . . . . . . . . . . . . . . . .
;
proc sort data=sales; by zipcode; run;
data purchasedate;
length customerName $12;
input customerName $ customerZipcode purchaseDate $;
datalines;
Customer1 14001 July10
customer2 14043 Jan11
;
proc sort data=purchaseDate; by customerZipcode customerName; run;
data previousSales(keep=customerName zipcode purchaseDate cumSales);
if 0 then set sales; /* Do nothing, get the variable names from sales to form array s */
array s{*} _numeric_;
merge purchaseDate(in=ok rename=customerZipcode=zipcode) sales;
by zipcode;
if ok then do;
cumSales = 0;
do i = 3 to dim(s) while (upcase(vname(s{i})) ne upcase(purchaseDate));
cumSales + s{i};
end;
output;
end;
run;
proc print; run;
PG
Thanks for the quick reply.
I tried your method, but I think something is going wrong with the sas date values, the results do not seem correct.
This is a part of the real data, maybe it helps;
customer | purchasedate | Zip |
customer1 | 4/22/2009 | 14227 |
customer2 | 11/29/2010 | 14001 |
customer3 | 11/3/2009 | 14217 |
customer4 | 11/18/2010 | 14051 |
customer5 | 4/19/2010 | 14224 |
customer6 | 4/2/2009 | 14120 |
customer7 | 2/8/2011 | 14011 |
customer8 | 6/3/2010 | 14052 |
customer9 | 6/19/2009 | 14569 |
customer10 | 4/6/2010 | 14221 |
customer11 | 6/11/2010 | 14226 |
customer12 | 10/5/2010 | 14221 |
customer13 | 11/11/2010 | 14224 |
customer14 | 6/12/2009 | 14221 |
customer15 | 11/1/2010 | 14215 |
customer16 | 5/10/2010 | 14226 |
customer17 | 10/28/2009 | 14224 |
customer18 | 6/3/2010 | 14215 |
customer19 | 5/15/2009 | 14221 |
customer20 | 7/16/2010 | 14209 |
customer21 | 8/27/2007 | 14215 |
customer22 | 5/9/2009 | 14225 |
customer23 | 9/16/2009 | 14141 |
customer24 | 11/19/2010 | 14569 |
customer25 | 6/18/2009 | 14224 |
customer26 | 4/30/2009 | 14224 |
customer27 | 2/17/2011 | 14127 |
customer28 | 5/1/2009 | 14224 |
customer29 | 12/10/2010 | 14209 |
customer30 | 7/17/2009 | 14216 |
Zip | NOV2010 | FEB2011 | JUL2010 | JUL2009 | AUG2009 | SEP2009 | OCT2009 | NOV2009 | DEC2009 | JAN2010 | FEB2010 | MAR2010 | APR2010 | MAY2010 |
14001 | 1 | |||||||||||||
14009 | 1 | |||||||||||||
14011 | 1 | |||||||||||||
14020 | 1 | |||||||||||||
14026 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
14031 | ||||||||||||||
14036 | 1 | |||||||||||||
14043 | 1 | 1 | ||||||||||||
14047 | ||||||||||||||
14048 | ||||||||||||||
14051 | 1 | 0 | 1 | 0 | 0 | 0 | ||||||||
14052 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | |
14063 | ||||||||||||||
14070 | 1 | |||||||||||||
14075 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
14086 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
14092 | ||||||||||||||
14094 | 0 | 1 | ||||||||||||
14103 | ||||||||||||||
14120 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | |
14127 | 1 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 1 | 0 | 0 | 0 |
14141 | 1 | |||||||||||||
14150 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||
14172 | 1 | |||||||||||||
14201 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | ||||||
14202 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
14203 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
14204 | ||||||||||||||
14207 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
My code assumes that the columns in the sales table are in chronological order, as they were in your example. - PG
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.