Hi,
My problem is about calculating the cumulative sum for each id and for each date taking into account a sliding period of 15 previous days. If the cumulative sum exceeds 10k, the variable top is incremented.
The treatment is done for June only.
Here is an example of the desired result :
id | app_date | price | cum | top |
1 | 29-juin-20 | 4000 | 4 000 | . |
1 | 13-juin-20 | 5000 | 45 000 | 1 |
1 | 13-juin-20 | 6000 | 40 000 | 2 |
1 | 11-juin-20 | 7000 | 34 000 | 3 |
1 | 10-juin-20 | 8000 | 27 000 | 4 |
1 | 01-juin-20 | 9000 | 19 000 | 5 |
1 | 30-mai-20 | 10000 | 10 000 | . |
proc sort data = tab out= tab1;
by id descending app_date;
run;
data tab2;
set tab1;
%let annee=2020;
%let month=06;
by id;
retain last_date date_last_d CUM;
if first.id then do;
last_date =app_date;
date_last_dem = app_date;
CUM=0;
end;
if month(date_last_d) =&month. then do ;
diff= date_last_d -app_date;
CUM= price+ CUM;
end;
if diff>15 then do;
diff = .;
CUM =.;
last_date =app_date;
date_last_d = app_date;
end;
if last.id and CUM>10000 then top= top+1 ;
output;
last_date=app_date;
format last_date DDMMYY10.;
format date_last_d DDMMYY10.;
format CUM 14.2;
run;
I can do it for the first iteration but I cannot do it for all the lines.
Thank you,
I interpret your correction like this: For ID 101666, you want all prices to be involved in the calculation of cum in the first iteration. However, for the second iteration, the first value should not be included, even though the date range is the same as the one above. Correct?
In that case, it is a small correction. Let me know if it works for you.
data have;
input Cnt Price ID App_date :ddmmyy10.;
format App_date ddmmyy10.;
datalines;
1 2265 534 30/05/2020
2 2330 4594 27/06/2020
3 1360 723 14/05/2020
4 1393 723 14/05/2020
5 2400 101666 12/06/2020
6 2411 101666 12/06/2020
7 2400 101666 11/06/2020
8 2400 101666 11/06/2020
9 2527 101666 10/06/2020
10 2536 101666 10/06/2020
11 2458 101666 04/06/2020
12 2758 1088 30/05/2020
13 4412 1056 13/06/2020
14 1870 1255 30/06/2020
15 4198 1255 14/05/2020
;
data want(drop = c k p dt);
dcl hash h(ordered : "Y");
h.definekey("c");
h.definedata("c", "p", "dt");
h.definedone();
dcl hiter i("h");
do c = 1 by 1 until (last.ID);
set have(rename=(App_Date=dt Price=p));
by ID notsorted;
h.add();
end;
do k = 1 by 1 until (last.ID);
set have;
by ID notsorted;
cum = 0;
do while (i.next() = 0);
if App_Date - 15 <= dt <= App_Date & k <= c then cum + p;
end;
if cum > 10000 then top + 1;
else top = .;
output;
end;
h.clear();
run;
Result:
Cnt ID Price App_date cum top 1 534 2265 30/05/2020 2265 . 2 4594 2330 27/06/2020 2330 . 3 723 1360 14/05/2020 2753 . 4 723 1393 14/05/2020 1393 . 5 101666 2400 12/06/2020 17132 1 6 101666 2411 12/06/2020 14732 2 7 101666 2400 11/06/2020 12321 3 8 101666 2400 11/06/2020 9921 . 9 101666 2527 10/06/2020 7521 . 10 101666 2536 10/06/2020 4994 . 11 101666 2458 04/06/2020 2458 . 12 1088 2758 30/05/2020 2758 . 13 1056 4412 13/06/2020 4412 . 14 1255 1870 30/06/2020 1870 . 15 1255 4198 14/05/2020 4198 .
Why does cum start at 4000 but jump to 45000?
Thanks for sharing code and the expected output, but to fully understand the problem, i need to see the dataset tab1 (usable form preferred).
There is a sample of data tab1 :
Cnt | Price | ID | App_date |
1 | 2265 | 534 | 30/05/2020 |
2 | 2330 | 4594 | 27/06/2020 |
3 | 1360 | 723 | 14/05/2020 |
4 | 1393 | 723 | 14/05/2020 |
5 | 2400 | 101666 | 12/06/2020 |
6 | 2411 | 101666 | 12/06/2020 |
7 | 2400 | 101666 | 11/06/2020 |
8 | 2400 | 101666 | 11/06/2020 |
9 | 2527 | 101666 | 10/06/2020 |
10 | 2536 | 101666 | 10/06/2020 |
11 | 2458 | 101666 | 04/06/2020 |
12 | 2758 | 1088 | 30/05/2020 |
13 | 4412 | 1056 | 13/06/2020 |
14 | 1870 | 1255 | 30/06/2020 |
15 | 4198 | 1255 | 14/05/2020 |
For example, for the id 101666 the top should be equal to 3.
Iteration 1 : from cnt5 to cnt11--> CUM>10k and top=1
Iteration 2 : from cnt6 to cnt11--> CUM>10k and top=2
Iteration 3 : from cnt7 to cnt11--> CUM>10k and top=3
Note that for each iteration, i calculate the CUM over a period of 15 days.
Thanks,
How about
data have;
input Cnt Price ID App_date :ddmmyy10.;
format App_date ddmmyy10.;
datalines;
1 2265 534 30/05/2020
2 2330 4594 27/06/2020
3 1360 723 14/05/2020
4 1393 723 14/05/2020
5 2400 101666 12/06/2020
6 2411 101666 12/06/2020
7 2400 101666 11/06/2020
8 2400 101666 11/06/2020
9 2527 101666 10/06/2020
10 2536 101666 10/06/2020
11 2458 101666 04/06/2020
12 2758 1088 30/05/2020
13 4412 1056 13/06/2020
14 1870 1255 30/06/2020
15 4198 1255 14/05/2020
;
data want(drop = c p dt);
dcl hash h(ordered : "Y");
h.definekey("c");
h.definedata("p", "dt");
h.definedone();
dcl hiter i("h");
do c = 1 by 1 until (last.ID);
set have(rename=(App_Date=dt Price=p));
by ID notsorted;
h.add();
end;
do until (last.ID);
set have;
by ID notsorted;
cum = 0;
do while (i.next() = 0);
if App_Date - 15 <= dt <= App_Date then cum + p;
end;
if cum > 10000 then top + 1;
else top = .;
output;
end;
h.clear();
run;
Result:
Cnt ID Price App_date cum top 1 534 2265 30/05/2020 2265 . 2 4594 2330 27/06/2020 2330 . 3 723 1360 14/05/2020 2753 . 4 723 1393 14/05/2020 2753 . 5 101666 2400 12/06/2020 17132 1 6 101666 2411 12/06/2020 17132 2 7 101666 2400 11/06/2020 12321 3 8 101666 2400 11/06/2020 12321 4 9 101666 2527 10/06/2020 7521 . 10 101666 2536 10/06/2020 7521 . 11 101666 2458 04/06/2020 2458 . 12 1088 2758 30/05/2020 2758 . 13 1056 4412 13/06/2020 4412 . 14 1255 1870 30/06/2020 1870 . 15 1255 4198 14/05/2020 4198 .
thank you very much for your answer.
Indeed, in terms of treatment, that's exactly what it is expected.
But, for the id 101666, top must be stopped at 3.
The first iteration, CUM = 2400+2411+2400+2400+2527+2536+2458 = 17131>10000 --> top=1
The second iteration, CUM = 2411+2400+2400+2527+2536+2458 = 14732 >10000 --> top=2
The third iteration, CUM = 2400+2400+2527+2536+2458 = 12321>10000 --> top=3
The fourth one, CUM =+2400+2527+2536+2458 = 12321<10000
I interpret your correction like this: For ID 101666, you want all prices to be involved in the calculation of cum in the first iteration. However, for the second iteration, the first value should not be included, even though the date range is the same as the one above. Correct?
In that case, it is a small correction. Let me know if it works for you.
data have;
input Cnt Price ID App_date :ddmmyy10.;
format App_date ddmmyy10.;
datalines;
1 2265 534 30/05/2020
2 2330 4594 27/06/2020
3 1360 723 14/05/2020
4 1393 723 14/05/2020
5 2400 101666 12/06/2020
6 2411 101666 12/06/2020
7 2400 101666 11/06/2020
8 2400 101666 11/06/2020
9 2527 101666 10/06/2020
10 2536 101666 10/06/2020
11 2458 101666 04/06/2020
12 2758 1088 30/05/2020
13 4412 1056 13/06/2020
14 1870 1255 30/06/2020
15 4198 1255 14/05/2020
;
data want(drop = c k p dt);
dcl hash h(ordered : "Y");
h.definekey("c");
h.definedata("c", "p", "dt");
h.definedone();
dcl hiter i("h");
do c = 1 by 1 until (last.ID);
set have(rename=(App_Date=dt Price=p));
by ID notsorted;
h.add();
end;
do k = 1 by 1 until (last.ID);
set have;
by ID notsorted;
cum = 0;
do while (i.next() = 0);
if App_Date - 15 <= dt <= App_Date & k <= c then cum + p;
end;
if cum > 10000 then top + 1;
else top = .;
output;
end;
h.clear();
run;
Result:
Cnt ID Price App_date cum top 1 534 2265 30/05/2020 2265 . 2 4594 2330 27/06/2020 2330 . 3 723 1360 14/05/2020 2753 . 4 723 1393 14/05/2020 1393 . 5 101666 2400 12/06/2020 17132 1 6 101666 2411 12/06/2020 14732 2 7 101666 2400 11/06/2020 12321 3 8 101666 2400 11/06/2020 9921 . 9 101666 2527 10/06/2020 7521 . 10 101666 2536 10/06/2020 4994 . 11 101666 2458 04/06/2020 2458 . 12 1088 2758 30/05/2020 2758 . 13 1056 4412 13/06/2020 4412 . 14 1255 1870 30/06/2020 1870 . 15 1255 4198 14/05/2020 4198 .
Exactly!
Yes it works, thank you!
Have a good day!
@ZSB wrote:
thank you very much for your answer.
Indeed, in terms of treatment, that's exactly what it is expected.
But, for the id 101666, top must be stopped at 3.
The first iteration, CUM = 2400+2411+2400+2400+2527+2536+2458 = 17131>10000 --> top=1
The second iteration, CUM = 2411+2400+2400+2527+2536+2458 = 14732 >10000 --> top=2
The third iteration, CUM = 2400+2400+2527+2536+2458 = 12321>10000 --> top=3
The fourth one, CUM =+2400+2527+2536+2458 = 12321<10000
I don't see anything in your original post about "top must be stopped at 3".
My problem is about calculating the cumulative sum for each id and for each date taking into account a sliding period of 15 previous days. If the cumulative sum exceeds 10k, the variable top is incremented.
The treatment is done for June only.
Do you have a bunch of arbitrary rules like that not shared yet? Such as possibly the "top" gets reset for each calendar month (or other period)? Not providing all the requirements up front complicates things because it may be the additional information requires a different approach.
Just for an example of something else not explicitly covered, you say " exceeds 10k," What would you have the result be if a single value added resulted in a sum of much more than 10K? Such as 100 + 100000? That crosses multiple 10K boundaries.
And do you have any negative values? If you do, would they be subtracted from the cumulative total or not?
Thank you for answer!
I have already the solution for my problem.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.