Hello community,
I have a table of data of different stores buys, these buys have a start date and an end date;
so the amount bought in each operation, will last only between the start date and end date for every store,
Below is an example of the data I have:
data test; input start_date :date9. end_date :date9. Store Amount ; format start_date end_date date9.; /*format Store $CHARw13.;*/ format Amount dollar8.1; datalines; 01jan2019 09jan2019 LA 200 01jan2019 09jan2019 LB 1200 01jan2019 09jan2019 LC 500 01jan2019 09jan2019 LD 2500 01jan2019 09jan2019 LE 2300 01jan2019 09jan2019 LF 2100 01jan2019 09jan2019 LG 2900 01jan2019 09jan2019 LI 1900 01jan2019 09jan2019 LJ 3200 01jan2019 09jan2019 LK 4200 01jan2019 09jan2019 LY 5200 05jan2019 09jan2019 LO 2000 05jan2019 09jan2019 LB 120 05jan2019 09jan2019 LC 5000 05jan2019 09jan2019 LD 250 05jan2019 09jan2019 LR 230 05jan2019 09jan2019 LE 210 05jan2019 09jan2019 LP 290 05jan2019 09jan2019 LQ 190 05jan2019 09jan2019 LM 320 05jan2019 09jan2019 LA 420 05jan2019 09jan2019 LF 500 09jan2019 16jan2019 LO 200 09jan2019 16jan2019 LA 1200 09jan2019 16jan2019 LQ 500 09jan2019 16jan2019 LW 2500 09jan2019 16jan2019 LZ 2300 09jan2019 16jan2019 LD 2100 09jan2019 16jan2019 LR 2900 09jan2019 16jan2019 LT 1900 09jan2019 16jan2019 LG 3200 09jan2019 16jan2019 LL 4200 09jan2019 16jan2019 LB 5200 ; Run;
From this data, I want to calculate the daily available amount for each store,
So my data will need to look like the table below, and for each store I need to have the amount outstanding per date:
the amount outstanding for each store, is the sum of all amounts in dataset test, that have a start date before the row date, and an end date after the row date;
Date | Store1 | Store2 | Store3 | Store4 |
01-janv-19 | , | , | , | , |
02-janv-19 | , | , | , | , |
03-janv-19 | , | , | , | , |
04-janv-19 | , | , | , | , |
05-janv-19 | , | , | , | , |
06-janv-19 | , | , | , | , |
07-janv-19 | , | , | , | , |
08-janv-19 | , | , | , | , |
09-janv-19 | , | , | , | , |
10-janv-19 | , | , | , | , |
11-janv-19 | , | , | , | , |
12-janv-19 | , | , | , | , |
13-janv-19 | , | , | , | , |
14-janv-19 | , | , | , | , |
15-janv-19 | , | , | , | , |
16-janv-19 | , | , | , | , |
17-janv-19 | , | , | , | , |
18-janv-19 | , | , | , | , |
19-janv-19 | , | , | , | , |
20-janv-19 | , | , | , | , |
21-janv-19 | , | , | , | , |
22-janv-19 | , | , | , | , |
23-janv-19 | , | , | , | , |
24-janv-19 | , | , | , | , |
25-janv-19 | , | , | , | , |
26-janv-19 | , | , | , | , |
27-janv-19 | , | , | , | , |
28-janv-19 | , | , | , | , |
29-janv-19 | , | , | , | , |
Please excuse my question formatting, as I believe I have a problem formatting the store name;
Thank you in advance.
I think this makes a REPORT in the form the requested.
Note that your posted data step does not read the stores as character and throws errors of invalid data.
Placing values into a data set with different variables for each store does create an ugly extremely hard to work with structure. Basic approach, create daily values in a data step, the manipulate. I used a format for the SUM similar to your data step. If you don't actually need the decimals and currency then consider a BEST8. or BEST10. depending on range of actual values.
data test; input start_date :date9. end_date :date9. Store $ Amount ; format start_date end_date date9.; /*format Store $CHARw13.;*/ format Amount dollar8.1; datalines; 01jan2019 09jan2019 LA 200 01jan2019 09jan2019 LB 1200 01jan2019 09jan2019 LC 500 01jan2019 09jan2019 LD 2500 01jan2019 09jan2019 LE 2300 01jan2019 09jan2019 LF 2100 01jan2019 09jan2019 LG 2900 01jan2019 09jan2019 LI 1900 01jan2019 09jan2019 LJ 3200 01jan2019 09jan2019 LK 4200 01jan2019 09jan2019 LY 5200 05jan2019 09jan2019 LO 2000 05jan2019 09jan2019 LB 120 05jan2019 09jan2019 LC 5000 05jan2019 09jan2019 LD 250 05jan2019 09jan2019 LR 230 05jan2019 09jan2019 LE 210 05jan2019 09jan2019 LP 290 05jan2019 09jan2019 LQ 190 05jan2019 09jan2019 LM 320 05jan2019 09jan2019 LA 420 05jan2019 09jan2019 LF 500 09jan2019 16jan2019 LO 200 09jan2019 16jan2019 LA 1200 09jan2019 16jan2019 LQ 500 09jan2019 16jan2019 LW 2500 09jan2019 16jan2019 LZ 2300 09jan2019 16jan2019 LD 2100 09jan2019 16jan2019 LR 2900 09jan2019 16jan2019 LT 1900 09jan2019 16jan2019 LG 3200 09jan2019 16jan2019 LL 4200 09jan2019 16jan2019 LB 5200 ; Run; data need; set test; do repdate = start_date to end_date; output; end; format repdate date9.; run; proc tabulate data=need; class repdate store; var amount; table repdate='', store*amount=''*sum=''*f=dollar10.1 /box='Date' misstext=' '; run;
Can you explain more about how the starting dates and end dates are used and how this applies to the output table? Please walk us through an example of this, step-by-step.
Is the desired output a SAS data set, or a report, or something else?
Thank you for your prompt response,
My output should be a SAS data table as well,
So in explanation to the data set, for each store there is a buying operation that lasts from the start_date to the end_date;
in each buying operation we can have different stores ( not necessarily all stores will buy)
also, we can have multiple buying operations overlapping;
What I want to do is to calculate, for each store, the daily amount that have been bought, and is always withstanding;
for this I will need to use a conditional sum, by store, and by date
for example, in 01jan2019, for store1, the withstanding amount is the sum of the amount variable, for all operations that have a starting date before 01jan2019, and an end_date after 01jan2019
if I were to write this in excel for example, i would use a sumifs with multiple conditions as follow :
for each date in the want data set
=sumif(amount ; store ; store1 ; start_date ; "<="&"date" ; end_date ; ">"&"date" )
I hope this is clear,
Thank you again for your response
Thank you for your help,
The data want for store LA should look like this :
LA | |
01/01/2019 | 200 |
02/01/2019 | 200 |
03/01/2019 | 200 |
04/01/2019 | 200 |
05/01/2019 | 620 |
06/01/2019 | 620 |
07/01/2019 | 620 |
08/01/2019 | 620 |
09/01/2019 | 1200 |
10/01/2019 | 1200 |
11/01/2019 | 1200 |
12/01/2019 | 1200 |
13/01/2019 | 1200 |
14/01/2019 | 1200 |
15/01/2019 | 1200 |
16/01/2019 | 0 |
The amount shown in each date in the want data set is : the sum of amount in have, if the start_date is less or equal to date, and if the end_date is greater than date
in other words, the amount for a given date is the sum of operations that are alive at that date
an operation is alive if its end_date is after the current date, if not it's a finished operation.
for example, in the 8th January, the amount shown is the sum of 200 and 420, because the start date for these two operations is before the 8th, and their end date is after the 8th,
the next day ( the 9th january), these twp operations will die, and the amount will become 1200 ( equal to the sum of operations alive : 3rd operation)
Let's make this much simpler, and look at just one store:
data test;
input start_date :date9. end_date :date9. Store :$2. Amount;
format start_date end_date date9.;
format Amount dollar8.1;
datalines;
01jan2019 09jan2019 LA 200
05jan2019 09jan2019 LA 420
9jan2019 16jan2019 LA 1200
;
How would your intended "want" dataset look like for the dates 2019-01-01 to 2019-01-16? And please explain for at least the time from 2019-01-01 to 2019-01-06 how the numbers are calculated.
Thank you for your help,
The data want for store LA should look like this :
LA | |
01/01/2019 | 200 |
02/01/2019 | 200 |
03/01/2019 | 200 |
04/01/2019 | 200 |
05/01/2019 | 620 |
06/01/2019 | 620 |
07/01/2019 | 620 |
08/01/2019 | 620 |
09/01/2019 | 1200 |
10/01/2019 | 1200 |
11/01/2019 | 1200 |
12/01/2019 | 1200 |
13/01/2019 | 1200 |
14/01/2019 | 1200 |
15/01/2019 | 1200 |
16/01/2019 | 0 |
The amount shown in each date in the want data set is : the sum of amount in have, if the start_date is less or equal to date, and if the end_date is greater than date
in other words, the amount for a given date is the sum of operations that are alive at that date
an operation is alive if its end_date is after the current date, if not it's a finished operation.
for example, in the 8th January, the amount shown is the sum of 200 and 420, because the start date for these two operations is before the 8th, and their end date is after the 8th, the next day ( the 9th january), these twp operations will die, and the amount will become 1200 ( equal to the sum of operations alive : 3rd operation)
Since you want a SAS data set as output, do not create different columns for each Store. Create different rows for each store. This is not only easier to program but much more useful if you have additional analyses to perform on this data. (I get the feeling that it will be helpful to know what the next steps are, what additional things you plan to do with this data)
data intermediate;
set test;
do date=start_date to end_date-1 by 1;
output;
end;
run;
proc summary data=intermediate nway;
class store date;
var amount;
format date date9.;
output out=want sum=;
run;
Thank you for your response,
Unfortunately, this solution is giving me the sum of all operations for each store, and not only the outstanding living amount per date,
I'll try explaining the problem differently,
let's think of each operation as a loan, in which each store toke an amount from the bank, that goes from start_date to end_date;
the data I have is this :
Operation | Star_date | End_Date | Store | Amount |
operation 1 | 01/01/2019 | 09/01/2019 | LA | 200 |
operation 1 | 01/01/2019 | 09/01/2019 | LB | 1500 |
operation 1 | 01/01/2019 | 09/01/2019 | LZ | 1600 |
operation 1 | 01/01/2019 | 09/01/2019 | LO | 1300 |
operation 1 | 01/01/2019 | 09/01/2019 | LU | 1400 |
operation 1 | 01/01/2019 | 09/01/2019 | LF | 600 |
operation 2 | 05/01/2019 | 09/01/2019 | LH | 800 |
operation 2 | 05/01/2019 | 09/01/2019 | LC | 2200 |
operation 2 | 05/01/2019 | 09/01/2019 | LG | 2200 |
operation 2 | 05/01/2019 | 09/01/2019 | LL | 2000 |
operation 2 | 05/01/2019 | 09/01/2019 | LD | 2000 |
operation 2 | 05/01/2019 | 09/01/2019 | LK | 1300 |
600 | ||||
operation 3 | 10/01/2019 | 19/01/2019 | LB | 1500 |
operation 3 | 10/01/2019 | 19/01/2019 | LG | 2800 |
operation 3 | 10/01/2019 | 19/01/2019 | LC | 2900 |
operation 3 | 10/01/2019 | 19/01/2019 | LD | 2600 |
operation 3 | 10/01/2019 | 19/01/2019 | LF | 2700 |
operation 3 | 10/01/2019 | 19/01/2019 | LL | 1900 |
Now, I want to calculate, for each store, the outstanding loan amount per day between 01JAN2019 and 31DEC2019;
so the Data I want should look like this:
Date | LA | LB | LC |
01/01/2019 | 200,00 | 1 500,00 | - |
02/01/2019 | 200,00 | 1 500,00 | - |
03/01/2019 | 200,00 | 1 500,00 | - |
04/01/2019 | 200,00 | 1 500,00 | - |
05/01/2019 | 200,00 | 1 500,00 | 2 200,00 |
06/01/2019 | 200,00 | 1 500,00 | 2 200,00 |
07/01/2019 | 200,00 | 1 500,00 | 2 200,00 |
08/01/2019 | 200,00 | 1 500,00 | 2 200,00 |
09/01/2019 | - | - | - |
10/01/2019 | - | 1 500,00 | 2 900,00 |
11/01/2019 | - | 1 500,00 | 2 900,00 |
12/01/2019 | - | 1 500,00 | 2 900,00 |
13/01/2019 | - | 1 500,00 | 2 900,00 |
14/01/2019 | - | 1 500,00 | 2 900,00 |
15/01/2019 | - | 1 500,00 | 2 900,00 |
16/01/2019 | - | 1 500,00 | 2 900,00 |
17/01/2019 | - | 1 500,00 | 2 900,00 |
18/01/2019 | - | 1 500,00 | 2 900,00 |
19/01/2019 | - | - | - |
20/01/2019 | - | - | - |
21/01/2019 | - | - | - |
22/01/2019 | - | - | - |
I provided an excel file below with the calculations done using formulas, I hope it helps;
Thank you again.
Isn't this what my code is giving you except that the store values are in rows instead of columns?
First, expand the range to individual dates.
Then run PROC SUMMARY.
Then, create a report with PROC REPORT or a dataset with PROC TRANSPOSE (although I do not recommend the creation of a wide dataset).
data have;
infile datalines dlm="," dsd;
input Operation :$20. Start_date :ddmmyy10. End_Date :ddmmyy10. Store :$2. Amount;
format start_date end_date yymmdd10.;
datalines;
operation 1,01/01/2019,09/01/2019,LA,200
operation 1,01/01/2019,09/01/2019,LB,1500
operation 1,01/01/2019,09/01/2019,LZ,1600
operation 1,01/01/2019,09/01/2019,LO,1300
operation 1,01/01/2019,09/01/2019,LU,1400
operation 1,01/01/2019,09/01/2019,LF,600
operation 2,05/01/2019,09/01/2019,LH,800
operation 2,05/01/2019,09/01/2019,LC,2200
operation 2,05/01/2019,09/01/2019,LG,2200
operation 2,05/01/2019,09/01/2019,LL,2000
operation 2,05/01/2019,09/01/2019,LD,2000
operation 2,05/01/2019,09/01/2019,LK,1300
operation 3,10/01/2019,19/01/2019,LB,1500
operation 3,10/01/2019,19/01/2019,LG,2800
operation 3,10/01/2019,19/01/2019,LC,2900
operation 3,10/01/2019,19/01/2019,LD,2600
operation 3,10/01/2019,19/01/2019,LF,2700
operation 3,10/01/2019,19/01/2019,LL,1900
;
data long;
set have;
do Date = start_date to end_date - 1;
output;
end;
drop start_date end_date;
format Date yymmdd10.;
run;
proc summary data=long nway;
class date store;
var amount;
output out=want sum()=;
run;
proc report data=want;
column date amount,store;
define date / group;
define amount / "" sum;
define store / "" across;
run;
proc transpose data=want out=wide (drop=_name_);
by date;
id store;
var amount;
run;
I think this makes a REPORT in the form the requested.
Note that your posted data step does not read the stores as character and throws errors of invalid data.
Placing values into a data set with different variables for each store does create an ugly extremely hard to work with structure. Basic approach, create daily values in a data step, the manipulate. I used a format for the SUM similar to your data step. If you don't actually need the decimals and currency then consider a BEST8. or BEST10. depending on range of actual values.
data test; input start_date :date9. end_date :date9. Store $ Amount ; format start_date end_date date9.; /*format Store $CHARw13.;*/ format Amount dollar8.1; datalines; 01jan2019 09jan2019 LA 200 01jan2019 09jan2019 LB 1200 01jan2019 09jan2019 LC 500 01jan2019 09jan2019 LD 2500 01jan2019 09jan2019 LE 2300 01jan2019 09jan2019 LF 2100 01jan2019 09jan2019 LG 2900 01jan2019 09jan2019 LI 1900 01jan2019 09jan2019 LJ 3200 01jan2019 09jan2019 LK 4200 01jan2019 09jan2019 LY 5200 05jan2019 09jan2019 LO 2000 05jan2019 09jan2019 LB 120 05jan2019 09jan2019 LC 5000 05jan2019 09jan2019 LD 250 05jan2019 09jan2019 LR 230 05jan2019 09jan2019 LE 210 05jan2019 09jan2019 LP 290 05jan2019 09jan2019 LQ 190 05jan2019 09jan2019 LM 320 05jan2019 09jan2019 LA 420 05jan2019 09jan2019 LF 500 09jan2019 16jan2019 LO 200 09jan2019 16jan2019 LA 1200 09jan2019 16jan2019 LQ 500 09jan2019 16jan2019 LW 2500 09jan2019 16jan2019 LZ 2300 09jan2019 16jan2019 LD 2100 09jan2019 16jan2019 LR 2900 09jan2019 16jan2019 LT 1900 09jan2019 16jan2019 LG 3200 09jan2019 16jan2019 LL 4200 09jan2019 16jan2019 LB 5200 ; Run; data need; set test; do repdate = start_date to end_date; output; end; format repdate date9.; run; proc tabulate data=need; class repdate store; var amount; table repdate='', store*amount=''*sum=''*f=dollar10.1 /box='Date' misstext=' '; run;
Thank you all for your responses,
All the solution provided are 100% working.
Thank you again @PaigeMiller @Kurt_Bremser @ballardw
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.