BookmarkSubscribeRSS Feed
Crubal
Quartz | Level 8

Hi,

 

I have a question on aggregating the data, and the data structure looks like an example below: (Table A)

 

Background: rental car industry, a customer rent a car from 17JAN2017 to 19JAN2017, then the 'Start_Date' is 17JAN2017; length of use is 3. 

ID is just a sequence number; Probability is the percentage for each length_of_use falls in each Start_Date, means that how much percentage of the same Start_Date will rent the car for a certain length. The first row means 1/3 of the demand starting from 17JAN2016 will use for 1 day; and 1/3 will use for  2 days; the rest 1/3 will be used for 3 days. 

 

ID  | Start _Date    | Length_of_Use | Probability  

1    | 17JAN2016   |      1                  |  0.333

2    | 17JAN2016   |      2                  |  0.334

3    | 17JAN2016   |      3                  |  0.333

4    | 18JAN2016   |      1                  |  0.5

5    | 18JAN2016   |      4                  |  0.5

6    | 19JAN2016   |      2                  |  0.5

7    | 19JAN2016   |      3                  |  0.5

8    | 20JAN2016   |      2                  |  1

9    | 21JAN2016   |      1                  |  1

 

And I do have Demand by Use date level as below: (Table B)

'Use_Demand' means on that day, how many cars are being used by customers on that 'Use_Date', and it is also known information.

 

ID  | Use_Date      | Use_Demand  

1    | 17JAN2016   |      3                  

2    | 18JAN2016   |      4                  

3    | 19JAN2016   |      4                  

4    | 20JAN2016   |      4                 

5    | 21JAN2016   |      5                  

 

If we have the above 2 tables as known information,

how could I merge table A and table B into a table that could have demand by Start Date & Length of Use level as below?

 

We would like to figure out demand by 'Start_Date' & 'Length_of_Use' level (Table C) 

based on demand by 'Use_Date' level (table B) and Probability for each 'Starting_Date' and 'Length_of_Use'. Which means how Table A and Table B could obtain Table C. 

 

More detailed example may help: 

As the example in Table B, 17JAN2017 has use_demand of 3, and according to the probability, 1 of them has length of use 1, whcih means the 'use_date' is 17JAN2017; 1 of them has length of use 2, whcih means the 'use_date' is 17JAN2017 and 18JAN2017; the rest 1 has  length of use 3, whcih means the 'use_date' is 17JAN2017, 18JAN2017 and 19JAN2017.

As in table B, 18JAN2017 has use_demand of 4, 2 of them come from 17JAN2017, hence, the rest 2 of them should come from 18JAN2017. Based on the probability table (table A), Starting_Date 18JAN2017 has 2 types of length of stay, both have probability 0.5; hence Start_Date '18JAN2017' with length of use 1 has demand 1, and '18JAN2017' with length of use 4 has demand 1; whcih makes 4th and 5th row of table C.

 

Thank you!

 

ID  | Start _Date    | Length_of_Use | Demand 

1    | 17JAN2016   |      1                  |  1

2    | 17JAN2016   |      2                  |  1

3    | 17JAN2016   |      3                  |  1

4    | 18JAN2016   |      1                  |  1

5    | 18JAN2016   |      4                  |  1

6    | 19JAN2016   |      2                  |  1

7    | 19JAN2016   |      3                  |  1

8    | 20JAN2016   |      2                  |  1

9    | 21JAN2016   |      1                  |  2

3 REPLIES 3
TomKari
Onyx | Level 15

I'd been following this one, but was also confused about what was required.

 

Based on your much clearer description, give the following a try:

 

data UseDistribution;
informat Start_Date date9.;
format Start_Date date9.;
input ID Start_Date Length_of_Use Probability;
cards;
1 17JAN2016 1 0.333
2 17JAN2016 2 0.334
3 17JAN2016 3 0.333
4 18JAN2016 1 0.5
5 18JAN2016 4 0.5
6 19JAN2016 2 0.5
7 19JAN2016 3 0.5
8 20JAN2016 2 1
9 21JAN2016 1 1
run;

data Demand;
informat Use_Date date9.;
format Use_Date date9.;
input ID Use_Date Use_Demand;
cards;
1 17JAN2016 3
2 18JAN2016 4
3 19JAN2016 4
4 20JAN2016 4
5 21JAN2016 5
run;

proc sql noprint;
create table Inter1 as
select *
from UseDistribution U join Demand D on U.Start_Date = D.Use_Date;
quit;

data Inter2;
set Inter1;
do DateForSummary = Use_Date to Use_Date + Length_of_Use - 1;
  DemandForSummary = round(Use_Demand * Probability);
  format DateForSummary date9.;
  output;
end;
run;

proc means data=Inter2 nway noprint;
class DateForSummary;
var DemandForSummary;
output out=Want(drop=_TYPE_ _FREQ_) sum()=TotalDemand;
run;

 

Tom

Crubal
Quartz | Level 8

Hi Tom,

 

Thank you! 

 

It is insightful and I think table inter2 is almost there. 

 

I make a bit change to that: (Change the 3rd line, Use_Date to Start_Date)

 

data Inter2;
set Inter1;
do DateForSummary = Start_Date to Start_Date + Length_of_Use - 1;
DemandForSummary = round(Use_Demand * Probability);
format DateForSummary date9.;
output;
end;
run;

 

And obtains:

 

Start_Date    ID   Length_of_Use   Probability   Use_Date   Use_Demand   DateForSummary   DemandForSummary
17JAN2016   1              1                  0.333       17JAN2016           3                  17JAN2016               1
17JAN2016   2              2                  0.334       17JAN2016           3                  17JAN2016               1
17JAN2016   2              2                  0.334       17JAN2016           3                  18JAN2016               1
17JAN2016   3              3                  0.333        17JAN2016          3                   17JAN2016              1 
17JAN2016   3              3                  0.333        17JAN2016          3                   18JAN2016              1
17JAN2016   3              3                  0.333        17JAN2016          3                   19JAN2016              1
18JAN2016   4              1                  0.5            18JAN2016          4                   18JAN2016              2
18JAN2016   5              4                  0.5            18JAN2016          4                   18JAN2016              2
18JAN2016 5 4 0.5 18JAN2016 4 19JAN2016 2
18JAN2016 5 4 0.5 18JAN2016 4 20JAN2016 2
18JAN2016 5 4 0.5 18JAN2016 4 21JAN2016 2
19JAN2016 6 2 0.5 19JAN2016 4 19JAN2016 2
19JAN2016 6 2 0.5 19JAN2016 4 20JAN2016 2
19JAN2016 7 3 0.5 19JAN2016 4 19JAN2016 2
19JAN2016 7 3 0.5 19JAN2016 4 20JAN2016 2
19JAN2016 7 3 0.5 19JAN2016 4 21JAN2016 2
20JAN2016 8 2 1 20JAN2016 4 20JAN2016 4
20JAN2016 8 2 1 20JAN2016 4 21JAN2016 4
21JAN2016 9 1 1 21JAN2016 5 21JAN2016 5

 

As I want the output to be demand by 'Start_Date' and 'Length_of_Use' level. 

 

And the 4th line you mentioned: DemandForSummary = round(Use_Demand * Probability);

I am thinking about changing 'Use Demand' to something like 'Use_Demand' - 'Demand coming from prioe day' ... to get the output. 

 

Thank you!

TomKari
Onyx | Level 15

Since the tables are inner joined in the PROC SQL step on dates being equal, I think you'll get the same results using either. But definitely use whichever you feel is clearer from a business perspective.

 

Yes, I threw the ROUND function in there just to have something to show some results. You may definitely want to change how that little bit of logic works.

 

Tom

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1026 views
  • 0 likes
  • 2 in conversation