Hi,
I want to create a data model within SAS based on percents and data I have from Prior Month.
Eg I have day percents and totals orders by day from May and I want to create a data drop model for June to estimate the orders that will close by day in June.
I know that for 31st May I have 900 orders.
I want to take 900 and split it by day (for June) based on the percents by Day(Day1-30).
So, 115 orders would close on 1Jun, 111 on 2Jun and so forth until end of June.
I want to do this for the past 10 days (from May). and then get the totals per day for June.
.Attached is the excel for explaining the calculation. A1-D32 is what I have.
Column E and Q is my expected Output.
Can this be done in SAS?
Thank you in advance!
Some of the values in the dataset "want" created by the following code don't match the values you have posted, unfortunately i can spend any more time to debug the code. And yes, the last data step does not read any data from work.have, all values are generated by looking up percent-value and number of orders using the formats.
data DayPercentFmt;
set have(keep=Day Percent);
length
FmtName $ 32
Type $ 1
;
retain
FmtName 'Day2Percent'
Type "i"
;
format Percent;
rename
Day = Start
Percent = Label
;
run;
proc format cntlin=DayPercentFmt;
run;
proc sort data=have out=sorted(keep= Date Orders);
by descending Date;
run;
data DayOrdersFmt;
set Sorted(drop=Date);
length
FmtName $ 32
Start 8
Type $ 1
;
retain
FmtName "Day2Orders"
Type "i"
;
Start = _n_;
rename Orders = Label;
run;
proc format cntlin=DayOrdersFmt;
run;
data want;
length
Date 8
Orders1-Orders10 8
FinishedOrders 8
_start _end _i _z _p _o 8
;
format Date date9.;
array allOrders[10] Orders1-Orders10;
do Date = '01Jun2020'd to '30Jun2020'd;
call missing(of Orders:);
_start = 1;
_end = min(10, day(Date));
do _i = _start to _end;
_z = day(Date) - _i + 1;
_p = input(cats(_z), Day2Percent.);
_o = input(cats(_i), Day2Orders.);
allOrders[_i] = int(_o * _p);
end;
FinishedOrders = sum(of Orders:);
output;
end;
run;
Please post example data in data steps with datalines.
Start by reading the xlsx-file and as soon as all variables have the correct type and length, post the data as data step with datalines statement, so we actually have data to use. See the links below the post of @Kurt_Bremser if you need help to create the data step.
Since your example "data" did not show any dates, I just see XXX, in the dates prior to Jun there is no way do tell "previous month" properly especially considering that May has 31 days and June has 30 there is almost always going to be some issue with this "previous month" percentage approach.
Assuming anything about non-shown values is poor modeling. And the percentages appear to be rounded, so questionable as well. July/August and December/January are the only previous month/current month where the number of days are the same.
Here are the datalines for May:
In the atttched excel, E32-P62 is my expected output. I want the totals by each day as in Column P(showing -the splits in SAS(column F-O is optional)).
I want to split the order numbers for last 10days of May based on the percents.
For May we have 31 percent values, but since June has only 30days we will do the split for each order from past 10 days in May for 30days in June.
data have ;
input Day Percent Date:date9. Orders;
format Percent percent10. Date date9.;
datalines;
1 .13 01May2020 56
2 .12 02May2020 21
3 .12 03May2020 26
4 0.09 04May2020 44
5 0.08 05May2020 62
6 0.07 06May2020 57
7 0.06 07May2020 92
8 0.04 08May2020 142
9 0.05 09May2020 71
10 0.03 10May2020 45
11 0.02 11May2020 68
12 0.02 12May2020 57
13 0.02 13May2020 113
14 0.01 14May2020 83
15 0.01 15May2020 117
16 0.01 16May2020 100
17 0.01 17May2020 100
18 0.01 18May2020 150
19 0.01 19May2020 200
20 0.01 20May2020 250
21 0.01 21May2020 300
22 0.01 22May2020 350
23 0.01 23May2020 100
24 0.01 24May2020 200
25 0.00 25May2020 300
26 0.00 26May2020 400
27 0.00 27May2020 500
28 0.00 28May2020 600
29 0.01 29May2020 700
30 0.01 30May2020 800
31 0.01 31May2020 900
;
Thank you 🙂
So, for the output, I want the first column to be dates for June. and then taking the last day for May (31May2020 900 orders), multiplying 900 with 13% (percent from day 1 ) , I get 115 for 01June2020, multiplying 900 with 12%(percent from day 2), I get 111 (for 02June2020) and continuing this way for 900 until 30June2020.
Similarly, I want to do this calculation for the Orders number from 30May2020 which is 800 and split it based on percents from Day1-30 for June 02-June30.
Continuing this way for the last 10 orders from May.
The last column below is just the sum of the row for each date from 01June-30June.
The first column (dates) and the last column (Finished Orders) are my desired output.
Finished Orders | |||||||||||
01Jun2020 | 115 | 115 | |||||||||
02Jun2020 | 111 | 102 | 212 | ||||||||
03Jun2020 | 104 | 98 | 89 | 291 | |||||||
04Jun2020 | 84 | 92 | 86 | 76 | 338 | ||||||
05Jun2020 | 74 | 75 | 81 | 74 | 64 | 367 | |||||
06Jun2020 | 64 | 66 | 65 | 69 | 61 | 51 | 377 | ||||
07Jun2020 | 50 | 57 | 58 | 56 | 58 | 49 | 38 | 366 | |||
08Jun2020 | 39 | 45 | 50 | 49 | 47 | 46 | 37 | 25 | 338 | ||
09Jun2020 | 43 | 35 | 39 | 43 | 41 | 37 | 35 | 25 | 13 | 310 | |
10Jun2020 | 26 | 38 | 30 | 33 | 36 | 33 | 28 | 23 | 12 | 45 | 304 |
11Jun2020 | 20 | 23 | 34 | 26 | 28 | 29 | 25 | 19 | 12 | 43 | 257 |
12Jun2020 | 16 | 18 | 20 | 29 | 22 | 22 | 21 | 16 | 9 | 40 | 214 |
13Jun2020 | 16 | 14 | 16 | 17 | 24 | 17 | 17 | 14 | 8 | 33 | 175 |
14Jun2020 | 13 | 14 | 12 | 13 | 14 | 19 | 13 | 11 | 7 | 29 | 146 |
15Jun2020 | 12 | 12 | 12 | 10 | 11 | 11 | 14 | 9 | 6 | 25 | 123 |
16Jun2020 | 9 | 11 | 10 | 10 | 9 | 9 | 9 | 10 | 4 | 20 | 100 |
17Jun2020 | 10 | 8 | 10 | 9 | 9 | 7 | 7 | 6 | 5 | 15 | 84 |
18Jun2020 | 6 | 8 | 7 | 8 | 7 | 7 | 5 | 4 | 3 | 17 | 74 |
19Jun2020 | 8 | 5 | 7 | 6 | 7 | 6 | 5 | 3 | 2 | 10 | 61 |
20Jun2020 | 5 | 7 | 5 | 6 | 5 | 6 | 4 | 3 | 2 | 8 | 52 |
21Jun2020 | 6 | 5 | 6 | 4 | 5 | 4 | 4 | 3 | 2 | 6 | 45 |
22Jun2020 | 7 | 5 | 4 | 5 | 3 | 4 | 3 | 3 | 1 | 6 | 42 |
23Jun2020 | 6 | 6 | 4 | 4 | 4 | 3 | 3 | 2 | 1 | 5 | 38 |
24Jun2020 | 9 | 5 | 5 | 4 | 3 | 4 | 2 | 2 | 1 | 5 | 39 |
25Jun2020 | 3 | 8 | 4 | 4 | 3 | 2 | 3 | 1 | 1 | 4 | 34 |
26Jun2020 | 4 | 3 | 7 | 4 | 4 | 2 | 2 | 2 | 1 | 4 | 32 |
27Jun2020 | 4 | 4 | 2 | 6 | 3 | 3 | 2 | 1 | 1 | 2 | 29 |
28Jun2020 | 4 | 4 | 3 | 2 | 5 | 3 | 2 | 1 | 1 | 3 | 27 |
29Jun2020 | 5 | 3 | 3 | 3 | 2 | 4 | 2 | 1 | 1 | 2 | 26 |
30Jun2020 | 5 | 5 | 3 | 3 | 2 | 1 | 3 | 1 | 1 | 2 | 26 |
Interesting approach. What would you do with the data of February when you have to create the data for March?
There is another problem with your description: all calculations seem to be wrong:
900 * 0.13 = 117, not 115
900 * 0.12 = 108, not 111
and later on for 25th June you should have 0, not 3, simply because the percent-value for day 25 is zero.
Please re-check the logic!
Please see below the revised numbers for the last 10 days from May split for June(using percents from May1-30):
Can this be created in SAS?
I want the first column(dates for June) and the last column (Finished Orders) which is a sum of that row as my output.
Finished Orders | |||||||||||
01Jun2020 | 117 | 117 | |||||||||
02Jun2020 | 108 | 104 | 212 | ||||||||
03Jun2020 | 108 | 96 | 91 | 295 | |||||||
04Jun2020 | 84 | 96 | 84 | 78 | 342 | ||||||
05Jun2020 | 74 | 75 | 84 | 72 | 65 | 370 | |||||
06Jun2020 | 64 | 66 | 65 | 72 | 60 | 52 | 379 | ||||
07Jun2020 | 50 | 57 | 58 | 56 | 60 | 48 | 39 | 368 | |||
08Jun2020 | 39 | 45 | 50 | 49 | 47 | 48 | 36 | 26 | 339 | ||
09Jun2020 | 43 | 35 | 39 | 43 | 41 | 37 | 36 | 24 | 13 | 311 | |
10Jun2020 | 26 | 38 | 30 | 33 | 36 | 33 | 28 | 24 | 12 | 46 | 306 |
11Jun2020 | 20 | 23 | 34 | 26 | 28 | 29 | 25 | 19 | 12 | 42 | 256 |
12Jun2020 | 16 | 18 | 20 | 29 | 22 | 22 | 21 | 16 | 9 | 42 | 215 |
13Jun2020 | 16 | 14 | 16 | 17 | 24 | 17 | 17 | 14 | 8 | 33 | 175 |
14Jun2020 | 13 | 14 | 12 | 13 | 14 | 19 | 13 | 11 | 7 | 29 | 146 |
15Jun2020 | 12 | 12 | 12 | 10 | 11 | 11 | 14 | 9 | 6 | 25 | 123 |
16Jun2020 | 9 | 11 | 10 | 10 | 9 | 9 | 9 | 10 | 4 | 20 | 100 |
17Jun2020 | 10 | 8 | 10 | 9 | 9 | 7 | 7 | 6 | 5 | 15 | 84 |
18Jun2020 | 6 | 8 | 7 | 8 | 7 | 7 | 5 | 4 | 3 | 17 | 74 |
19Jun2020 | 8 | 5 | 7 | 6 | 7 | 6 | 5 | 3 | 2 | 10 | 61 |
20Jun2020 | 5 | 7 | 5 | 6 | 5 | 6 | 4 | 3 | 2 | 8 | 52 |
21Jun2020 | 6 | 5 | 6 | 4 | 5 | 4 | 4 | 3 | 2 | 6 | 45 |
22Jun2020 | 7 | 5 | 4 | 5 | 3 | 4 | 3 | 3 | 1 | 6 | 42 |
23Jun2020 | 6 | 6 | 4 | 4 | 4 | 3 | 3 | 2 | 1 | 5 | 38 |
24Jun2020 | 9 | 5 | 5 | 4 | 3 | 4 | 2 | 2 | 1 | 5 | 39 |
25Jun2020 | - | 8 | 4 | 4 | 3 | 2 | 3 | 1 | 1 | 4 | 31 |
26Jun2020 | - | - | 7 | 4 | 4 | 2 | 2 | 2 | 1 | 4 | 25 |
27Jun2020 | - | - | - | 6 | 3 | 3 | 2 | 1 | 1 | 2 | 18 |
28Jun2020 | - | - | - | - | 5 | 3 | 2 | 1 | 1 | 3 | 15 |
29Jun2020 | 5 | - | - | - | - | 4 | 2 | 1 | 1 | 2 | 15 |
30Jun2020 | 5 | 5 | - | - | - | - | 3 | 1 | 1 | 2 | 17 |
@new_sas_user_4 wrote:
Do you think it could be done using two dimensional arrays?
I do understand 1-dimensional arrays but don't have knowledge about 2-D arrays!
Thanks!
An array may be useful to store the results, but arrays in sas are somewhat different from arrays in other languages, so i don't see a 2d array in the solution. I don't have the time to work on this right now, maybe later during the week; i would start by creating a format mapping day to percent, making it easier to get the percent-value while creating the wanted dataset. I don't think that it is possible to create the desired form in one step. Maybe starting with a dataset that has
is easier to code.
Transposing the data afterwards is no problem.
Thanks a lot!!
I was starting with a structure like this:
%let start='01Jun2020'd;
data Distribution(drop=i);
format date date9. ;
do i = &start. to intnx('month',&start,0,'e');
Date=i ;
array Days{30} Day1-Day30;
output;
end;
run;
@andreas_lds Would really appreciate your help 🙂
Some of the values in the dataset "want" created by the following code don't match the values you have posted, unfortunately i can spend any more time to debug the code. And yes, the last data step does not read any data from work.have, all values are generated by looking up percent-value and number of orders using the formats.
data DayPercentFmt;
set have(keep=Day Percent);
length
FmtName $ 32
Type $ 1
;
retain
FmtName 'Day2Percent'
Type "i"
;
format Percent;
rename
Day = Start
Percent = Label
;
run;
proc format cntlin=DayPercentFmt;
run;
proc sort data=have out=sorted(keep= Date Orders);
by descending Date;
run;
data DayOrdersFmt;
set Sorted(drop=Date);
length
FmtName $ 32
Start 8
Type $ 1
;
retain
FmtName "Day2Orders"
Type "i"
;
Start = _n_;
rename Orders = Label;
run;
proc format cntlin=DayOrdersFmt;
run;
data want;
length
Date 8
Orders1-Orders10 8
FinishedOrders 8
_start _end _i _z _p _o 8
;
format Date date9.;
array allOrders[10] Orders1-Orders10;
do Date = '01Jun2020'd to '30Jun2020'd;
call missing(of Orders:);
_start = 1;
_end = min(10, day(Date));
do _i = _start to _end;
_z = day(Date) - _i + 1;
_p = input(cats(_z), Day2Percent.);
_o = input(cats(_i), Day2Orders.);
allOrders[_i] = int(_o * _p);
end;
FinishedOrders = sum(of Orders:);
output;
end;
run;
Thank you so much @andreas_lds
How does SAS refer to the percent and order data through the below statements? Is it becoz of proc format cntlin= ? I am sorry, I haven't used the cntlin yet.
_p = input(cats(_z), Day2Percent.);
_o = input(cats(_i), Day2Orders.);
Thanks again 🙂
@new_sas_user_4 wrote:
Thank you so much @andreas_lds
How does SAS refer to the percent and order data through the below statements? Is it becoz of proc format cntlin= ? I am sorry, I haven't used the cntlin yet.
_p = input(cats(_z), Day2Percent.);
_o = input(cats(_i), Day2Orders.);
Thanks again 🙂
With cntlin you can create a format using a dataset, so have a look at the datasets named SomethingFmt: they define which day is mapped to which percent-value (and order-data). Instead of the formats using hash-objects would have been another way to process the data.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.