Re: How to aggregate data under EG using like ETS procedure

Crubal · Posted 01-13-2017 08:53 AM

Hi,

I have a question on aggregating the data, and the data structure looks like an example below: (Table A)

Background: rental car industry, a customer rent a car from 17JAN2017 to 19JAN2017, then the 'Start_Date' is 17JAN2017; length of use is 3.

ID is just a sequence number; Probability is the percentage for each length_of_use falls in each Start_Date, means that how much percentage of the same Start_Date will rent the car for a certain length. The first row means 1/3 of the demand starting from 17JAN2016 will use for 1 day; and 1/3 will use for 2 days; the rest 1/3 will be used for 3 days.

ID | Start _Date | Length_of_Use | Probability

1 | 17JAN2016 | 1 | 0.333

2 | 17JAN2016 | 2 | 0.334

3 | 17JAN2016 | 3 | 0.333

4 | 18JAN2016 | 1 | 0.5

5 | 18JAN2016 | 4 | 0.5

6 | 19JAN2016 | 2 | 0.5

7 | 19JAN2016 | 3 | 0.5

8 | 20JAN2016 | 2 | 1

9 | 21JAN2016 | 1 | 1

And I do have Demand by Use date level as below: (Table B)

'Use_Demand' means on that day, how many cars are being used by customers on that 'Use_Date', and it is also known information.

ID | Use_Date | Use_Demand

1 | 17JAN2016 | 3

2 | 18JAN2016 | 4

3 | 19JAN2016 | 4

4 | 20JAN2016 | 4

5 | 21JAN2016 | 5

If we have the above 2 tables as known information,

how could I merge table A and table B into a table that could have demand by Start Date & Length of Use level as below?

We would like to figure out demand by 'Start_Date' & 'Length_of_Use' level (Table C)

based on demand by 'Use_Date' level (table B) and Probability for each 'Starting_Date' and 'Length_of_Use'. Which means how Table A and Table B could obtain Table C.

More detailed example may help:

As the example in Table B, 17JAN2017 has use_demand of 3, and according to the probability, 1 of them has length of use 1, whcih means the 'use_date' is 17JAN2017; 1 of them has length of use 2, whcih means the 'use_date' is 17JAN2017 and 18JAN2017; the rest 1 has length of use 3, whcih means the 'use_date' is 17JAN2017, 18JAN2017 and 19JAN2017.

As in table B, 18JAN2017 has use_demand of 4, 2 of them come from 17JAN2017, hence, the rest 2 of them should come from 18JAN2017. Based on the probability table (table A), Starting_Date 18JAN2017 has 2 types of length of stay, both have probability 0.5; hence Start_Date '18JAN2017' with length of use 1 has demand 1, and '18JAN2017' with length of use 4 has demand 1; whcih makes 4th and 5th row of table C.

Thank you!

ID | Start _Date | Length_of_Use | Demand

1 | 17JAN2016 | 1 | 1

2 | 17JAN2016 | 2 | 1

3 | 17JAN2016 | 3 | 1

4 | 18JAN2016 | 1 | 1

5 | 18JAN2016 | 4 | 1

6 | 19JAN2016 | 2 | 1

7 | 19JAN2016 | 3 | 1

8 | 20JAN2016 | 2 | 1

9 | 21JAN2016 | 1 | 2

TomKari · Posted 01-13-2017 10:33 AM

I'd been following this one, but was also confused about what was required.

Based on your much clearer description, give the following a try:

data UseDistribution;
informat Start_Date date9.;
format Start_Date date9.;
input ID Start_Date Length_of_Use Probability;
cards;
1 17JAN2016 1 0.333
2 17JAN2016 2 0.334
3 17JAN2016 3 0.333
4 18JAN2016 1 0.5
5 18JAN2016 4 0.5
6 19JAN2016 2 0.5
7 19JAN2016 3 0.5
8 20JAN2016 2 1
9 21JAN2016 1 1
run;

data Demand;
informat Use_Date date9.;
format Use_Date date9.;
input ID Use_Date Use_Demand;
cards;
1 17JAN2016 3
2 18JAN2016 4
3 19JAN2016 4
4 20JAN2016 4
5 21JAN2016 5
run;

proc sql noprint;
create table Inter1 as
select *
from UseDistribution U join Demand D on U.Start_Date = D.Use_Date;
quit;

data Inter2;
set Inter1;
do DateForSummary = Use_Date to Use_Date + Length_of_Use - 1;
DemandForSummary = round(Use_Demand * Probability);
format DateForSummary date9.;
output;
end;
run;

proc means data=Inter2 nway noprint;
class DateForSummary;
var DemandForSummary;
output out=Want(drop=_TYPE_ _FREQ_) sum()=TotalDemand;
run;

Tom

Crubal · Posted 01-13-2017 11:03 AM

Hi Tom,

Thank you!

It is insightful and I think table inter2 is almost there.

I make a bit change to that: (Change the 3rd line, Use_Date to Start_Date)

data Inter2;
set Inter1;
do DateForSummary = Start_Date to Start_Date + Length_of_Use - 1;
DemandForSummary = round(Use_Demand * Probability);
format DateForSummary date9.;
output;
end;
run;

And obtains:

Start_Date ID Length_of_Use Probability Use_Date Use_Demand DateForSummary DemandForSummary
17JAN2016 1 1 0.333 17JAN2016 3 17JAN2016 1
17JAN2016 2 2 0.334 17JAN2016 3 17JAN2016 1
17JAN2016 2 2 0.334 17JAN2016 3 18JAN2016 1
17JAN2016 3 3 0.333 17JAN2016 3 17JAN2016 1
17JAN2016 3 3 0.333 17JAN2016 3 18JAN2016 1
17JAN2016 3 3 0.333 17JAN2016 3 19JAN2016 1
18JAN2016 4 1 0.5 18JAN2016 4 18JAN2016 2
18JAN2016 5 4 0.5 18JAN2016 4 18JAN2016 2
18JAN2016 5 4 0.5 18JAN2016 4 19JAN2016 2
18JAN2016 5 4 0.5 18JAN2016 4 20JAN2016 2
18JAN2016 5 4 0.5 18JAN2016 4 21JAN2016 2
19JAN2016 6 2 0.5 19JAN2016 4 19JAN2016 2
19JAN2016 6 2 0.5 19JAN2016 4 20JAN2016 2
19JAN2016 7 3 0.5 19JAN2016 4 19JAN2016 2
19JAN2016 7 3 0.5 19JAN2016 4 20JAN2016 2
19JAN2016 7 3 0.5 19JAN2016 4 21JAN2016 2
20JAN2016 8 2 1 20JAN2016 4 20JAN2016 4
20JAN2016 8 2 1 20JAN2016 4 21JAN2016 4
21JAN2016 9 1 1 21JAN2016 5 21JAN2016 5

As I want the output to be demand by 'Start_Date' and 'Length_of_Use' level.

And the 4th line you mentioned: DemandForSummary = round(Use_Demand * Probability);

I am thinking about changing 'Use Demand' to something like 'Use_Demand' - 'Demand coming from prioe day' ... to get the output.

Thank you!

TomKari · Posted 01-13-2017 01:21 PM

Since the tables are inner joined in the PROC SQL step on dates being equal, I think you'll get the same results using either. But definitely use whichever you feel is clearer from a business perspective.

Yes, I threw the ROUND function in there just to have something to show some results. You may definitely want to change how that little bit of logic works.

Tom

How to aggregate data under EG using like ETS procedure