Re: How to aggregate data for SAS EG?

Crubal · Posted 01-11-2017 03:19 PM

Hi,

I have a question on aggregating the data, and the data structure looks like an example below: (Table A)

For example, the first row means 1/3 of the demand starting from 17JAN2016 will use for 1 day; and 1/3 will use for 2 days; the rest 1/3 will be used for 3 days.

ID | Start _Date | Length_of_Use | Percentage

1 | 17JAN2016 | 1 | 0.333

2 | 17JAN2016 | 2 | 0.334

3 | 17JAN2016 | 3 | 0.333

4 | 18JAN2016 | 1 | 0.5

5 | 18JAN2016 | 4 | 0.5

6 | 19JAN2016 | 2 | 0.5

7 | 19JAN2016 | 3 | 0.5

8 | 20JAN2016 | 2 | 1

9 | 21JAN2016 | 1 | 1

And I do have Demand by Use date level as below: (Table B)

ID | Use_Date | Use_Demand

1 | 17JAN2016 | 3

2 | 18JAN2016 | 4

3 | 19JAN2016 | 4

4 | 20JAN2016 | 4

5 | 21JAN2016 | 5

If we have the above 2 tables as known information,

how could I merge table A and table B into a table that could have demand by Start Date & Length of Use level as below?

ID is just a sequence number, no key information.

Thank you!

ID | Start _Date | Length_of_Use | Demand

1 | 17JAN2016 | 1 | 1

2 | 17JAN2016 | 2 | 1

3 | 17JAN2016 | 3 | 1

4 | 18JAN2016 | 1 | 1

5 | 18JAN2016 | 4 | 1

6 | 19JAN2016 | 2 | 1

7 | 19JAN2016 | 3 | 1

8 | 20JAN2016 | 2 | 1

9 | 21JAN2016 | 1 | 2

Shmuel · Posted 01-11-2017 03:30 PM

It is difficult to guess the rules of creating the output.

Please explain how to define DEMAND on output?

Does ID have any impact on combining data from the two tables ? or is it just a sequence number ?

Crubal · Posted 01-11-2017 03:46 PM

Hi Shmuel,

ID is just a sequence number, I will edit my question.

The first row of the output is: 3* 0.333 = 1 (Arrival Date 17JAN2017 & Length of Stay 1);

And the second row of the output is: 3* 0.333 = 1 (Arrival Date 17JAN2017 & Length of Stay 2);

And the third row of the output is: 3* 0.333 = 1 (Arrival Date 17JAN2017 & Length of Stay 3);

Shmuel · Posted 01-11-2017 05:05 PM

according to your answer, then for start_date=18JAN2016

the demand should be 4*0.5=2 but on your output you have demand=1 ???

Crubal · Posted 01-11-2017 05:10 PM

Hi,

Demand 4 should be calculated as 4 = 3* 0.333 (Arrive 17JAN & Length_of_use =2) + 3* 0.333 (Arrive 17JAN & Length_of_use =3)

+ 1 (Arrive 18JAN & Length_of_use =1) + 1 (Arrive 18JAN & Length_of_use =4)

So that demand Arriving at 18JAN2017 = 1 +1 = 2

Thanks!

Shmuel · Posted 01-12-2017 06:38 AM

Sorry, your logic is still not clear enough to me.

I can't guess what algorithm or formula to use in order to calculate the anted demand.

Try explain the logic in words or formula.

Crubal · Posted 01-12-2017 09:06 AM

Hi Shmuel,

Thanks for reply!

I would like to figure out demand by 'Start_Date' & 'Length_of_Use' level

based on demand by 'Use_Date' level (table B) and Probability for each 'Starting_Date' and 'Length_of_Use' (table A).

As the example in Table B, 17JAN2017 has use_demand of 3, and according to the probability, 1 of them has length of use 1, whcih means the 'use_date' is 17JAN2017; 1 of them has length of use 2, whcih means the 'use_date' is 17JAN2017 and 18JAN2017; the rest 1 has length of use 3, whcih means the 'use_date' is 17JAN2017, 18JAN2017 and 19JAN2017.

As in table B, 18JAN2017 has use_demand of 4, 2 of them come from 17JAN2017, hence, the rest 2 of them should come from 18JAN2017. Based on the probability table (table A), Starting_Date 18JAN2017 has 2 types of length of stay, both have probability 0.5; hence Start_Date '18JAN2017' with length of use 1 has demand 1, and '18JAN2017' with length of use 4 has demand 1; whcih makes 4th and 5th row of table C.

Hope it will help.

Shmuel · Posted 01-12-2017 12:04 PM

@Crubal I'm very sory. I feel you use profecional terms which are not clear enough to me.

Just for example, you wrote:

"As in table B, 18JAN2017 has use_demand of 4, 2 of them come from 17JAN2017, hence, the rest 2 of them should come from 18JAN2017."

Where in the datasets is the information or how do you calculate and come to the conclusion that - 2 of them come from 17JAN2017 etc. - Why 2 ? Why from 17JAN2017 and not from 18JAN2017 ?

I give up and hope someone else in the forum will help you more. Wish you quick response.

Crubal · Posted 01-12-2017 01:56 PM

No problem Shmuel,

I apologize for my bad explanation.

In the example I used, 17JAN2017 has Use_Demand of 3, accoding to probability table, 1 of them will only be used for 1 day, 1 will be used for 2 days, and 1 will be used for 3 days. Hence 2 of the 3 will still be in use on date 18JAN2017.

If we deduct the demand of 2 from the Use_demand for '18JAN2017', we could find out the rest 2 demand will come from 18JAN2017.

Thank you, hope this time it is clearer.

Shmuel · Posted 01-12-2017 03:12 PM

Trying to understand the "story" I created a table attached as a PDF file.

I don't understand:

1) Is length of use is days count from start_use?

2) Start_use = 18JAN2017 - how can 50% occupy 4 days length_of_use ?

What does it mean ?

3) Table 2 case 2 define use_demand = 4 ? What does it mean ?

If there are 2 days (18-19/JAN) how can it fill demand of 4 ?

4) Alldata written in RED - I do not understand what does it mean and how can it be.

5) Does my table tels the story correctly ?

As I said, I'm not statistian and those terms I'm absolutely not famialiar with.

Crubal · Posted 01-12-2017 04:41 PM

Hi Shmuel,

Thanks for your patience!

Maybe I should tell a little bit about background ahead, I apologize.

For example, rental car industry, a customer rent a car from 17JAN2017 to 19JAN2017, then the 'Start_Date' is 17JAN2017; length of use is 3.

1) Length of use is counted from Start date.

Table A, probability table, means that how much percentage of the same Start_Date will rent the car for a certain length. And it is known information.

2) Hence, customer starts from 18JAN2017 will either rent the car for 1 day, or for 4 days; either has 50% probability. (In this simple example, we just assume customer will rent a car for 1 day or 4 days)

3) Table B, the column 'Use_Demand' means on that day, how many cars are being used by customers, and it is also known information.

4) As the red numbers you mentioned are 'Use_Demand' by 'Use_Date', it is known information, means how many cars are currently being used by customers, and our aim is to deduct how many customers come from Day 1, Day 2 .... and how long are they going to rent a car? (Hence we need the probability table)

5) I understand what you mean, sorry for my explanation.

Thank you very much!

Shmuel · Posted 01-13-2017 07:03 AM

I should apologize.

You posted your question under "Data Management" community.

My knowledge enables me help at "SAS Programing" community.

I came upon your question exidentally.

I think you should post again your question under "Analytics" community.

It seems to me that you need SAS ETS procedures, but I'm not sure and not familiar with that model.

Doing the work with SAS Base is to reinvent the wheel.

Sorry for holding you that long.

Crubal · Posted 01-13-2017 08:44 AM

Hi Shmuel,

No worries, it really helps me explain it better. Thank you and I would post it somewhere else.