Hi,
I have a question on aggregating the data, and the data structure looks like an example below: (Table A)
Background: rental car industry, a customer rent a car from 17JAN2017 to 19JAN2017, then the 'Start_Date' is 17JAN2017; length of use is 3.
ID is just a sequence number; Probability is the percentage for each length_of_use falls in each Start_Date, means that how much percentage of the same Start_Date will rent the car for a certain length. The first row means 1/3 of the demand starting from 17JAN2016 will use for 1 day; and 1/3 will use for 2 days; the rest 1/3 will be used for 3 days.
ID | Start _Date | Length_of_Use | Probability
1 | 17JAN2016 | 1 | 0.333
2 | 17JAN2016 | 2 | 0.334
3 | 17JAN2016 | 3 | 0.333
4 | 18JAN2016 | 1 | 0.5
5 | 18JAN2016 | 4 | 0.5
6 | 19JAN2016 | 2 | 0.5
7 | 19JAN2016 | 3 | 0.5
8 | 20JAN2016 | 2 | 1
9 | 21JAN2016 | 1 | 1
And I do have Demand by Use date level as below: (Table B)
'Use_Demand' means on that day, how many cars are being used by customers on that 'Use_Date', and it is also known information.
ID | Use_Date | Use_Demand
1 | 17JAN2016 | 3
2 | 18JAN2016 | 4
3 | 19JAN2016 | 4
4 | 20JAN2016 | 4
5 | 21JAN2016 | 5
If we have the above 2 tables as known information,
how could I merge table A and table B into a table that could have demand by Start Date & Length of Use level as below?
We would like to figure out demand by 'Start_Date' & 'Length_of_Use' level (Table C)
based on demand by 'Use_Date' level (table B) and Probability for each 'Starting_Date' and 'Length_of_Use'. Which means how Table A and Table B could obtain Table C.
More detailed example may help:
As the example in Table B, 17JAN2017 has use_demand of 3, and according to the probability, 1 of them has length of use 1, whcih means the 'use_date' is 17JAN2017; 1 of them has length of use 2, whcih means the 'use_date' is 17JAN2017 and 18JAN2017; the rest 1 has length of use 3, whcih means the 'use_date' is 17JAN2017, 18JAN2017 and 19JAN2017.
As in table B, 18JAN2017 has use_demand of 4, 2 of them come from 17JAN2017, hence, the rest 2 of them should come from 18JAN2017. Based on the probability table (table A), Starting_Date 18JAN2017 has 2 types of length of stay, both have probability 0.5; hence Start_Date '18JAN2017' with length of use 1 has demand 1, and '18JAN2017' with length of use 4 has demand 1; whcih makes 4th and 5th row of table C.
Thank you!
ID | Start _Date | Length_of_Use | Demand
1 | 17JAN2016 | 1 | 1
2 | 17JAN2016 | 2 | 1
3 | 17JAN2016 | 3 | 1
4 | 18JAN2016 | 1 | 1
5 | 18JAN2016 | 4 | 1
6 | 19JAN2016 | 2 | 1
7 | 19JAN2016 | 3 | 1
8 | 20JAN2016 | 2 | 1
9 | 21JAN2016 | 1 | 2
I'd been following this one, but was also confused about what was required.
Based on your much clearer description, give the following a try:
data UseDistribution;
informat Start_Date date9.;
format Start_Date date9.;
input ID Start_Date Length_of_Use Probability;
cards;
1 17JAN2016 1 0.333
2 17JAN2016 2 0.334
3 17JAN2016 3 0.333
4 18JAN2016 1 0.5
5 18JAN2016 4 0.5
6 19JAN2016 2 0.5
7 19JAN2016 3 0.5
8 20JAN2016 2 1
9 21JAN2016 1 1
run;
data Demand;
informat Use_Date date9.;
format Use_Date date9.;
input ID Use_Date Use_Demand;
cards;
1 17JAN2016 3
2 18JAN2016 4
3 19JAN2016 4
4 20JAN2016 4
5 21JAN2016 5
run;
proc sql noprint;
create table Inter1 as
select *
from UseDistribution U join Demand D on U.Start_Date = D.Use_Date;
quit;
data Inter2;
set Inter1;
do DateForSummary = Use_Date to Use_Date + Length_of_Use - 1;
DemandForSummary = round(Use_Demand * Probability);
format DateForSummary date9.;
output;
end;
run;
proc means data=Inter2 nway noprint;
class DateForSummary;
var DemandForSummary;
output out=Want(drop=_TYPE_ _FREQ_) sum()=TotalDemand;
run;
Tom
Hi Tom,
Thank you!
It is insightful and I think table inter2 is almost there.
I make a bit change to that: (Change the 3rd line, Use_Date to Start_Date)
data Inter2;
set Inter1;
do DateForSummary = Start_Date to Start_Date + Length_of_Use - 1;
DemandForSummary = round(Use_Demand * Probability);
format DateForSummary date9.;
output;
end;
run;
And obtains:
Start_Date ID Length_of_Use Probability Use_Date Use_Demand DateForSummary DemandForSummary
17JAN2016 1 1 0.333 17JAN2016 3 17JAN2016 1
17JAN2016 2 2 0.334 17JAN2016 3 17JAN2016 1
17JAN2016 2 2 0.334 17JAN2016 3 18JAN2016 1
17JAN2016 3 3 0.333 17JAN2016 3 17JAN2016 1
17JAN2016 3 3 0.333 17JAN2016 3 18JAN2016 1
17JAN2016 3 3 0.333 17JAN2016 3 19JAN2016 1
18JAN2016 4 1 0.5 18JAN2016 4 18JAN2016 2
18JAN2016 5 4 0.5 18JAN2016 4 18JAN2016 2
18JAN2016 5 4 0.5 18JAN2016 4 19JAN2016 2
18JAN2016 5 4 0.5 18JAN2016 4 20JAN2016 2
18JAN2016 5 4 0.5 18JAN2016 4 21JAN2016 2
19JAN2016 6 2 0.5 19JAN2016 4 19JAN2016 2
19JAN2016 6 2 0.5 19JAN2016 4 20JAN2016 2
19JAN2016 7 3 0.5 19JAN2016 4 19JAN2016 2
19JAN2016 7 3 0.5 19JAN2016 4 20JAN2016 2
19JAN2016 7 3 0.5 19JAN2016 4 21JAN2016 2
20JAN2016 8 2 1 20JAN2016 4 20JAN2016 4
20JAN2016 8 2 1 20JAN2016 4 21JAN2016 4
21JAN2016 9 1 1 21JAN2016 5 21JAN2016 5
As I want the output to be demand by 'Start_Date' and 'Length_of_Use' level.
And the 4th line you mentioned: DemandForSummary = round(Use_Demand * Probability);
I am thinking about changing 'Use Demand' to something like 'Use_Demand' - 'Demand coming from prioe day' ... to get the output.
Thank you!
Since the tables are inner joined in the PROC SQL step on dates being equal, I think you'll get the same results using either. But definitely use whichever you feel is clearer from a business perspective.
Yes, I threw the ROUND function in there just to have something to show some results. You may definitely want to change how that little bit of logic works.
Tom
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.