BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ZSB
Fluorite | Level 6 ZSB
Fluorite | Level 6

Hi, 

My problem is about calculating the cumulative sum for each id and for each date taking into account a sliding period of 15 previous days. If the cumulative sum exceeds 10k, the variable top is incremented.

The treatment is done for June only.

 

Here is an example of the desired result : 

idapp_datepricecumtop 
129-juin-204000             4 000.
113-juin-205000           45 0001
113-juin-206000           40 0002
111-juin-207000           34 0003
110-juin-208000           27 0004
101-juin-209000           19 0005
130-mai-2010000           10 000.

proc sort data = tab out= tab1;
by id descending app_date;

run;

 

data tab2;

set tab1;

%let annee=2020;

%let month=06;

by  id;

retain last_date date_last_d CUM;

if first.id then do;

      last_date =app_date;

      date_last_dem = app_date;

      CUM=0;

end;

if month(date_last_d) =&month. then do ;

diff= date_last_d -app_date;

CUM= price+ CUM;

end;

if diff>15 then do;

      diff = .;

      CUM =.;

      last_date =app_date;

      date_last_d = app_date;

end;

if last.id and CUM>10000 then top= top+1 ;

output;

last_date=app_date;

format last_date DDMMYY10.;

format date_last_d DDMMYY10.;

format CUM 14.2;

 

run;

 

I can do it for the first iteration but I cannot do it for all the lines.

 

Thank you,

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

I interpret your correction like this: For ID 101666, you want all prices to be involved in the calculation of cum in the first iteration. However, for the second iteration, the first value should not be included, even though the date range is the same as the one above. Correct?

 

In that case, it is a small correction. Let me know if it works for you.

 

data have;
input Cnt Price ID App_date :ddmmyy10.;
format App_date ddmmyy10.;
datalines;
1  2265 534    30/05/2020
2  2330 4594   27/06/2020
3  1360 723    14/05/2020
4  1393 723    14/05/2020
5  2400 101666 12/06/2020
6  2411 101666 12/06/2020
7  2400 101666 11/06/2020
8  2400 101666 11/06/2020
9  2527 101666 10/06/2020
10 2536 101666 10/06/2020
11 2458 101666 04/06/2020
12 2758 1088   30/05/2020
13 4412 1056   13/06/2020
14 1870 1255   30/06/2020
15 4198 1255   14/05/2020
;

data want(drop = c k p dt);

   dcl hash h(ordered : "Y");
   h.definekey("c");
   h.definedata("c", "p", "dt");
   h.definedone();
   dcl hiter i("h");

   do c = 1 by 1 until (last.ID);
      set have(rename=(App_Date=dt Price=p));
      by ID notsorted;
      h.add();
   end;

   do k = 1 by 1 until (last.ID);
      set have;
      by ID notsorted;
      cum = 0;
      do while (i.next() = 0);
         if App_Date - 15 <= dt <= App_Date & k <= c then cum + p;
      end;
      if cum > 10000 then top + 1;
      else top = .;
      output;
   end;

   h.clear();
run;

 

Result:

 

Cnt  ID      Price  App_date    cum    top 
1    534     2265   30/05/2020  2265   . 
2    4594    2330   27/06/2020  2330   . 
3    723     1360   14/05/2020  2753   . 
4    723     1393   14/05/2020  1393   . 
5    101666  2400   12/06/2020  17132  1 
6    101666  2411   12/06/2020  14732  2 
7    101666  2400   11/06/2020  12321  3 
8    101666  2400   11/06/2020  9921   . 
9    101666  2527   10/06/2020  7521   . 
10   101666  2536   10/06/2020  4994   . 
11   101666  2458   04/06/2020  2458   . 
12   1088    2758   30/05/2020  2758   . 
13   1056    4412   13/06/2020  4412   . 
14   1255    1870   30/06/2020  1870   . 
15   1255    4198   14/05/2020  4198   . 

 

View solution in original post

9 REPLIES 9
PeterClemmensen
Tourmaline | Level 20

Why does cum start at 4000 but jump to 45000?

andreas_lds
Jade | Level 19

Thanks for sharing code and the expected output, but to fully understand the problem, i need to see the dataset tab1 (usable form preferred).

ZSB
Fluorite | Level 6 ZSB
Fluorite | Level 6

There is a sample of data tab1 : 

 

CntPriceIDApp_date
1226553430/05/2020
22330459427/06/2020
3136072314/05/2020
4139372314/05/2020
5240010166612/06/2020
6241110166612/06/2020
7240010166611/06/2020
8240010166611/06/2020
9252710166610/06/2020
10253610166610/06/2020
11245810166604/06/2020
122758108830/05/2020
134412105613/06/2020
141870125530/06/2020
154198125514/05/2020

For example, for the id 101666 the top should be equal to 3.

Iteration 1 : from cnt5 to cnt11--> CUM>10k and top=1

Iteration 2 : from cnt6 to cnt11--> CUM>10k and top=2

Iteration 3 : from cnt7 to cnt11--> CUM>10k and top=3

Note that for each iteration, i calculate the CUM over a period of 15 days.

 

Thanks,

PeterClemmensen
Tourmaline | Level 20

How about

 

data have;
input Cnt Price ID App_date :ddmmyy10.;
format App_date ddmmyy10.;
datalines;
1  2265 534    30/05/2020
2  2330 4594   27/06/2020
3  1360 723    14/05/2020
4  1393 723    14/05/2020
5  2400 101666 12/06/2020
6  2411 101666 12/06/2020
7  2400 101666 11/06/2020
8  2400 101666 11/06/2020
9  2527 101666 10/06/2020
10 2536 101666 10/06/2020
11 2458 101666 04/06/2020
12 2758 1088   30/05/2020
13 4412 1056   13/06/2020
14 1870 1255   30/06/2020
15 4198 1255   14/05/2020
;

data want(drop = c p dt);

   dcl hash h(ordered : "Y");
   h.definekey("c");
   h.definedata("p", "dt");
   h.definedone();
   dcl hiter i("h");

   do c = 1 by 1 until (last.ID);
      set have(rename=(App_Date=dt Price=p));
      by ID notsorted;
      h.add();
   end;

   do until (last.ID);
      set have;
      by ID notsorted;
      cum = 0;
      do while (i.next() = 0);
         if App_Date - 15 <= dt <= App_Date then cum + p;
      end;
      if cum > 10000 then top + 1;
      else top = .;
      output;
   end;

   h.clear();
run;

 

Result:

 

Cnt  ID      Price  App_date    cum    top 
1    534     2265   30/05/2020  2265   . 
2    4594    2330   27/06/2020  2330   . 
3    723     1360   14/05/2020  2753   . 
4    723     1393   14/05/2020  2753   . 
5    101666  2400   12/06/2020  17132  1 
6    101666  2411   12/06/2020  17132  2 
7    101666  2400   11/06/2020  12321  3 
8    101666  2400   11/06/2020  12321  4 
9    101666  2527   10/06/2020  7521   . 
10   101666  2536   10/06/2020  7521   . 
11   101666  2458   04/06/2020  2458   . 
12   1088    2758   30/05/2020  2758   . 
13   1056    4412   13/06/2020  4412   . 
14   1255    1870   30/06/2020  1870   . 
15   1255    4198   14/05/2020  4198   . 

 

ZSB
Fluorite | Level 6 ZSB
Fluorite | Level 6

thank you very much for your answer.

Indeed, in terms of treatment, that's exactly what it is expected.

But, for the id 101666, top must be stopped at 3. 

The first iteration, CUM = 2400+2411+2400+2400+2527+2536+2458 = 17131>10000 --> top=1 

The second iteration, CUM = 2411+2400+2400+2527+2536+2458 = 14732 >10000  --> top=2

The third iteration, CUM = 2400+2400+2527+2536+2458 = 12321>10000 --> top=3

The fourth one, CUM =+2400+2527+2536+2458 = 12321<10000

 

 

 

PeterClemmensen
Tourmaline | Level 20

I interpret your correction like this: For ID 101666, you want all prices to be involved in the calculation of cum in the first iteration. However, for the second iteration, the first value should not be included, even though the date range is the same as the one above. Correct?

 

In that case, it is a small correction. Let me know if it works for you.

 

data have;
input Cnt Price ID App_date :ddmmyy10.;
format App_date ddmmyy10.;
datalines;
1  2265 534    30/05/2020
2  2330 4594   27/06/2020
3  1360 723    14/05/2020
4  1393 723    14/05/2020
5  2400 101666 12/06/2020
6  2411 101666 12/06/2020
7  2400 101666 11/06/2020
8  2400 101666 11/06/2020
9  2527 101666 10/06/2020
10 2536 101666 10/06/2020
11 2458 101666 04/06/2020
12 2758 1088   30/05/2020
13 4412 1056   13/06/2020
14 1870 1255   30/06/2020
15 4198 1255   14/05/2020
;

data want(drop = c k p dt);

   dcl hash h(ordered : "Y");
   h.definekey("c");
   h.definedata("c", "p", "dt");
   h.definedone();
   dcl hiter i("h");

   do c = 1 by 1 until (last.ID);
      set have(rename=(App_Date=dt Price=p));
      by ID notsorted;
      h.add();
   end;

   do k = 1 by 1 until (last.ID);
      set have;
      by ID notsorted;
      cum = 0;
      do while (i.next() = 0);
         if App_Date - 15 <= dt <= App_Date & k <= c then cum + p;
      end;
      if cum > 10000 then top + 1;
      else top = .;
      output;
   end;

   h.clear();
run;

 

Result:

 

Cnt  ID      Price  App_date    cum    top 
1    534     2265   30/05/2020  2265   . 
2    4594    2330   27/06/2020  2330   . 
3    723     1360   14/05/2020  2753   . 
4    723     1393   14/05/2020  1393   . 
5    101666  2400   12/06/2020  17132  1 
6    101666  2411   12/06/2020  14732  2 
7    101666  2400   11/06/2020  12321  3 
8    101666  2400   11/06/2020  9921   . 
9    101666  2527   10/06/2020  7521   . 
10   101666  2536   10/06/2020  4994   . 
11   101666  2458   04/06/2020  2458   . 
12   1088    2758   30/05/2020  2758   . 
13   1056    4412   13/06/2020  4412   . 
14   1255    1870   30/06/2020  1870   . 
15   1255    4198   14/05/2020  4198   . 

 

ZSB
Fluorite | Level 6 ZSB
Fluorite | Level 6

Exactly!

 

Yes it works, thank you!

 

Have a good day!

ballardw
Super User

@ZSB wrote:

thank you very much for your answer.

Indeed, in terms of treatment, that's exactly what it is expected.

But, for the id 101666, top must be stopped at 3. 

The first iteration, CUM = 2400+2411+2400+2400+2527+2536+2458 = 17131>10000 --> top=1 

The second iteration, CUM = 2411+2400+2400+2527+2536+2458 = 14732 >10000  --> top=2

The third iteration, CUM = 2400+2400+2527+2536+2458 = 12321>10000 --> top=3

The fourth one, CUM =+2400+2527+2536+2458 = 12321<10000

 


I don't see anything in your original post about "top must be stopped at 3".

My problem is about calculating the cumulative sum for each id and for each date taking into account a sliding period of 15 previous days. If the cumulative sum exceeds 10k, the variable top is incremented.

The treatment is done for June only.

 

Do you have a bunch of arbitrary rules like that not shared yet? Such as possibly the "top" gets reset for each calendar month (or other period)? Not providing all the requirements up front complicates things because it may be the additional information requires a different approach.

 

Just for an example of something else not explicitly covered, you say " exceeds 10k," What would you have the result be if a single value added resulted in a sum of much more than 10K? Such as 100 + 100000? That crosses multiple 10K boundaries.

 

And do you have any negative values? If you do, would they be subtracted from the cumulative total or not?

ZSB
Fluorite | Level 6 ZSB
Fluorite | Level 6

Thank you for answer!

I have already the solution for my problem.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1403 views
  • 2 likes
  • 4 in conversation