turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Calculations and counters with time-series data in...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-02-2014 09:09 AM

I'm trying to calculate a value, let's call it lambda, that is an average of the inter-arrival times of sales within each customer in each region in my time-series dataset. I am using SAS Enterprise Guide. Here is an example of how I'm trying to calculate this value:

Here is a sample of a smaller dataset that is similar to my larger one that I've been working on:

data sample;

input customer region qtr year sales;

informat qtr yyq6.;

format qtr yyq6.;

datalines;

1 2 2003Q1 2003 0

1 2 2003Q2 2003 0

1 2 2003Q3 2003 3

1 2 2003Q4 2003 5

1 2 2004Q1 2004 0

1 2 2004Q2 2004 0

1 2 2004Q3 2004 6

1 2 2004Q4 2004 0

1 2 2005Q1 2005 2

1 2 2005Q2 2005 0

1 2 2005Q3 2005 3

1 2 2005Q4 2005 0

2 6 2003Q1 2003 0

2 6 2003Q2 2003 0

2 6 2003Q3 2003 3

2 6 2003Q4 2003 5

2 6 2004Q1 2004 0

2 6 2004Q2 2004 0

2 6 2004Q3 2004 0

2 6 2004Q4 2004 2

2 6 2005Q1 2005 0

2 6 2005Q2 2005 4

2 6 2005Q3 2005 0

2 6 2005Q4 2005 0

;

run;

proc sort data=sample;

by customer region;

run;

I've attached an image below that shows the method I've been attempting and what the output would look like if I could get the correct results, but I'm not sure how to get this done in SAS. I am open to any suggestions if there is an easier or more effective method.

Anyway, I've been trying to create a variable count_sales that counts the number of sales within each customer and region. Then I was trying to calculate a variable count_gap that would count the number of quarters between sales (starting at the first sale). I was then going to have a variable sum_gaps that would keep track of the sum of these inter-arrival times (gaps between sales), so it would basically add the value of count_gap at each sale. I then thought I could calculate lambda by taking sum_gaps and dividing that by (count_sales - 1). I'm not sure if this is the most effective method, but it's all I could come up with at the time, and I'm still not sure how to do the calculations in SAS. Any help is greatly appreciated.

Accepted Solutions

Solution

06-03-2014
11:38 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-03-2014 11:38 AM

If I understand what you mean.

data sample; input customer region qtr year sales; informat qtr yyq6.; format qtr yyq6.; datalines; 1 2 2003Q1 2003 0 1 2 2003Q2 2003 0 1 2 2003Q3 2003 3 1 2 2003Q4 2003 5 1 2 2004Q1 2004 0 1 2 2004Q2 2004 0 1 2 2004Q3 2004 6 1 2 2004Q4 2004 0 1 2 2005Q1 2005 2 1 2 2005Q2 2005 0 1 2 2005Q3 2005 3 1 2 2005Q4 2005 0 2 6 2003Q1 2003 0 2 6 2003Q2 2003 0 2 6 2003Q3 2003 3 2 6 2003Q4 2003 5 2 6 2004Q1 2004 0 2 6 2004Q2 2004 0 2 6 2004Q3 2004 0 2 6 2004Q4 2004 2 2 6 2005Q1 2005 0 2 6 2005Q2 2005 4 2 6 2005Q3 2005 0 2 6 2005Q4 2005 0 ; run; proc sort data=sample; by customer region; run; data sample; set sample; by customer region; if first.region then do; count_sales=0; n=0;end; if sales ne 0 then count_sales+1; if (lag(sales) ne 0 and sales eq 0) or (lag(customer) ne customer) or (lag(region) ne region) then n+1; run; data sample; set sample; by customer region n; retain start 0; if first.n then do;count_gap=0; start=0;end; if n=1 then do; if start and sales ne 0 then count_gap+1; if sales ne 0 then start=1; end; else count_gap+1; drop start; run; data sample; set sample; by customer region n; retain temp sum_gaps; if first.region then temp=0; if last.n and sales ne 0 then temp+count_gap; sum_gaps=temp; drop n temp; run; data sample; set sample; by customer region ; if last.region then lambda=sum_gaps/(count_sales-1); run;

Xia Keshan

Message was edited by: xia keshan

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-02-2014 09:24 AM

Count sales seems pretty straightforward:

proc sql;

select count(*) as count_sales from have group by customer,region having sales >0;

quit;

Not sure how you derive gaps though, what is this gap?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-02-2014 09:26 AM

The gaps are the number of quarters between each sale. So in my example above, the count_gaps variable would be zero until the first sale of each customer, then it would start counting the number of quarters between each sale.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-02-2014 09:55 AM

data want1;

set have;

by cust;

retain counter;

if first.cust then counter = -1;

if sales ne 0

then do;

if counter > 0 then output;

counter = 1;

end;

else if counter > 0 then counter + 1;

run;

proc summary data=want1 noprint;

by cust;

var counter;

output

out=want (drop=_type_ _freq_)

mean=

;

run;

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-02-2014 10:53 AM

KurtBremser,

This appears to work on my smaller sample dataset, but for some reason is not working on my larger one. It seems that on my larger dataset, the counter is not starting over at the beginning of each customer, and I'm not sure why.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-03-2014 01:59 AM

I see that you are sorting by region. If that is an additional criterion, then you should sort by customer, region and quarter, and use first.region instead of first.cust in my data step.

If you add additional output statements in the data step so that all records are written to want1, you can follow the development of the counter and see where the logic goes astray.

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-02-2014 12:26 PM

Are there any other approaches that may be more efficient than mine?

Solution

06-03-2014
11:38 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-03-2014 11:38 AM

If I understand what you mean.

data sample; input customer region qtr year sales; informat qtr yyq6.; format qtr yyq6.; datalines; 1 2 2003Q1 2003 0 1 2 2003Q2 2003 0 1 2 2003Q3 2003 3 1 2 2003Q4 2003 5 1 2 2004Q1 2004 0 1 2 2004Q2 2004 0 1 2 2004Q3 2004 6 1 2 2004Q4 2004 0 1 2 2005Q1 2005 2 1 2 2005Q2 2005 0 1 2 2005Q3 2005 3 1 2 2005Q4 2005 0 2 6 2003Q1 2003 0 2 6 2003Q2 2003 0 2 6 2003Q3 2003 3 2 6 2003Q4 2003 5 2 6 2004Q1 2004 0 2 6 2004Q2 2004 0 2 6 2004Q3 2004 0 2 6 2004Q4 2004 2 2 6 2005Q1 2005 0 2 6 2005Q2 2005 4 2 6 2005Q3 2005 0 2 6 2005Q4 2005 0 ; run; proc sort data=sample; by customer region; run; data sample; set sample; by customer region; if first.region then do; count_sales=0; n=0;end; if sales ne 0 then count_sales+1; if (lag(sales) ne 0 and sales eq 0) or (lag(customer) ne customer) or (lag(region) ne region) then n+1; run; data sample; set sample; by customer region n; retain start 0; if first.n then do;count_gap=0; start=0;end; if n=1 then do; if start and sales ne 0 then count_gap+1; if sales ne 0 then start=1; end; else count_gap+1; drop start; run; data sample; set sample; by customer region n; retain temp sum_gaps; if first.region then temp=0; if last.n and sales ne 0 then temp+count_gap; sum_gaps=temp; drop n temp; run; data sample; set sample; by customer region ; if last.region then lambda=sum_gaps/(count_sales-1); run;

Xia Keshan

Message was edited by: xia keshan

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-03-2014 01:26 PM

Ksharp,

Exactly! Thanks so much.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-03-2014 02:44 PM

I must have missed something, do we have to be so complicated? Proc SQL seems to be much more straightforward?

**data** sample;

input customer region qtr year sales;

informat qtr yyq6.;

format qtr yyq6.;

datalines;

1 2 2003Q1 2003 0

1 2 2003Q2 2003 0

1 2 2003Q3 2003 3

1 2 2003Q4 2003 5

1 2 2004Q1 2004 0

1 2 2004Q2 2004 0

1 2 2004Q3 2004 6

1 2 2004Q4 2004 0

1 2 2005Q1 2005 2

1 2 2005Q2 2005 0

1 2 2005Q3 2005 3

1 2 2005Q4 2005 0

2 6 2003Q1 2003 0

2 6 2003Q2 2003 0

2 6 2003Q3 2003 3

2 6 2003Q4 2003 5

2 6 2004Q1 2004 0

2 6 2004Q2 2004 0

2 6 2004Q3 2004 0

2 6 2004Q4 2004 2

2 6 2005Q1 2005 0

2 6 2005Q2 2005 4

2 6 2005Q3 2005 0

2 6 2005Q4 2005 0

;

**run**;

**proc** **sql**;

select customer,intck('qtr',min(qtr),max(qtr))/(count(*)-**1**) as lamda from sample where sales >**0** group by customer;

**quit**;

Now you have your lamda, merge it back if needed.

Regards,

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-03-2014 05:00 PM

Hai.kuo,

Yes this seems to be very effective! Thanks