Solved: conditional sum of observations over time

niam · Posted 07-15-2012 06:25 PM

Hello;

I have a data set which shows the sales data over a period of time in different zip codes. In a second data set, I have the purchase date of the product for each customer.

I want to calculate the total sales from the beginning of time until the month before each customer has purchased the product in each zip code; in other words, I want to know the aggregate number of people had purchased the product in the vicinity (zip code) of each customer until the month before he/she purchased the product. ; I appreciate if anyone can give me a hint of how I can do this.

Thanks for reading this post.

the data sets are like this:

Sales data set:

*Please note that all of the missing observations in this data set should be considered zero;

data sales;

input obs zipcode jan10 feb10 mar10 apr10 mar10 jun10 july10 aug10 sep10 oct10 nov10 dec10 jan11 feb11;

data lines;

1 14001 1 . . . . . . . . . . . . . . . . .

2 14009 1 . . . . . . . . . . . . . . . . .

3 14011 . 1 . . . . . . . . . . . . . . . .

4 14020 . . 1 . . . . . . . . . . . . . . .

5 14026 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

6 14031 . . . . . . . . . . . . . . . . . .

7 14036 . . 1 . . . . . . . . . . . . . . .

8 14043 . . . . . . . . . . . . 1 1 . . . .

9 14047 . . . . . . . . . . . . . . . . . .

10 14048 . . . . . . . . . . . . . . . . . .;

run;

purchase data set:

data purchasedate;

input customername $ customerzipcode purchasedate $;

datalines;

Customer1 14001 July10

customer2 14043 Jan11

....

run;

PGStats · Posted 07-16-2012 07:44 AM

My code assumes that the columns in the sales table are in chronological order, as they were in your example. - PG

PG

View solution in original post

PGStats · Posted 07-15-2012 08:26 PM

It can be done this way :

data sales;
input obs zipcode jan10 feb10 mar10 apr10 may10 jun10 july10 aug10 sep10
     oct10 nov10 dec10 jan11 feb11 mar11 apr11 may11 jun11;
datalines;
    1         14001 1 . . . . . . . . . . . . . . . . .
    2         14009 1 . . . . . . . . . . . . . . . . .
    3         14011 . 1 . . . . . . . . . . . . . . . .
    4         14020 . . 1 . . . . . . . . . . . . . . .
    5         14026 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    6         14031 . . . . . . . . . . . . . . . . . .
    7         14036 . . 1 . . . . . . . . . . . . . . .
    8         14043 . . . . 1 1 . . . . . . . . . . . .
    9         14047 . . . . . . . . . . . . . . . . . .
   10         14048 . . . . . . . . . . . . . . . . . .
;

proc sort data=sales; by zipcode; run;

data purchasedate;
length customerName $12;
input customerName $ customerZipcode purchaseDate $;
datalines;
Customer1 14001 July10
customer2 14043 Jan11
;

proc sort data=purchaseDate; by customerZipcode customerName; run;

data previousSales(keep=customerName zipcode purchaseDate cumSales);
if 0 then set sales; /* Do nothing, get the variable names from sales to form array s */
array s{*} _numeric_;
merge purchaseDate(in=ok rename=customerZipcode=zipcode) sales;
by zipcode;
if ok then do;
     cumSales = 0;
     do i = 3 to dim(s) while (upcase(vname(s{i})) ne upcase(purchaseDate));
          cumSales + s{i};
          end;
     output;
     end;
run;

proc print; run;

PG

niam · Posted 07-15-2012 11:59 PM

Thanks for the quick reply.

I tried your method, but I think something is going wrong with the sas date values, the results do not seem correct.

This is a part of the real data, maybe it helps;

customer	purchasedate	Zip
customer1	4/22/2009	14227
customer2	11/29/2010	14001
customer3	11/3/2009	14217
customer4	11/18/2010	14051
customer5	4/19/2010	14224
customer6	4/2/2009	14120
customer7	2/8/2011	14011
customer8	6/3/2010	14052
customer9	6/19/2009	14569
customer10	4/6/2010	14221
customer11	6/11/2010	14226
customer12	10/5/2010	14221
customer13	11/11/2010	14224
customer14	6/12/2009	14221
customer15	11/1/2010	14215
customer16	5/10/2010	14226
customer17	10/28/2009	14224
customer18	6/3/2010	14215
customer19	5/15/2009	14221
customer20	7/16/2010	14209
customer21	8/27/2007	14215
customer22	5/9/2009	14225
customer23	9/16/2009	14141
customer24	11/19/2010	14569
customer25	6/18/2009	14224
customer26	4/30/2009	14224
customer27	2/17/2011	14127
customer28	5/1/2009	14224
customer29	12/10/2010	14209
customer30	7/17/2009	14216

Zip	NOV2010	FEB2011	JUL2010	JUL2009	AUG2009	SEP2009	OCT2009	NOV2009	DEC2009	JAN2010	FEB2010	MAR2010	APR2010	MAY2010
14001	1
14009	1
14011		1
14020			1
14026	0	0	0	1	0	0	0	0	0	0	0	0	0	0
14031
14036			1
14043													1	1
14047
14048
14051	1		0								1	0	0	0
14052	1		0	0	0	0	0	1	0	0	0	0	0	0
14063
14070									1
14075	0	0	0	1	0	0	0	0	0	0	0	0	0	0
14086	1		0	1	0	0	0	0	0	0	0	0	0	0
14092
14094	0	1
14103
14120	0		0	0	0	1	0	0	0	0	0	0	2	0
14127	1	1	0	0	0	0	2	0	1	0	1	0	0	0
14141						1
14150			1	0	0	0	0	0	0	0	0	0	0	0
14172													1
14201				0	0	0	0	0	0	0	1
14202	0	0	0	0	0	0	0	0	0	0	0	0	1	0
14203	0	0	0	0	1	0	0	0	0	1	0	0	0	0
14204
14207			1			1	0	0	0	0	0	0	0	0

PGStats · Posted 07-16-2012 07:44 AM

My code assumes that the columns in the sales table are in chronological order, as they were in your example. - PG

PG

conditional sum of observations over time

Re: conditional sum of observations over time

Re: conditional sum of observations over time

Re: conditional sum of observations over time

Re: conditional sum of observations over time

conditional sum of observations over time

Re: conditional sum of observations over time

Re: conditional sum of observations over time

Re: conditional sum of observations over time

Re: conditional sum of observations over time

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!