Solved: Re: Finding the maximum value in the upcoming 12 months

inid · Posted 09-24-2021 04:31 PM

Hello

I have a starting table as below where I have a rank for each ID on each month.

ID	Data	Rank
a	31/12/2020	1
a	31/01/2021	1
a	28/02/2021	1
a	31/03/2021	1
a	30/04/2021	1
a	31/05/2021	1
a	30/06/2021	1
a	31/07/2021	3
a	31/08/2021	1
a	30/09/2021	1
a	31/10/2021	1
a	30/11/2021	1
a	31/12/2021	1
b	31/12/2020	1
b	31/01/2021	1
b	31/05/2021	1
b	30/06/2021	1
b	31/07/2021	3
b	31/08/2021	1
b	30/09/2021	1
b	31/10/2021	1
b	30/11/2021	1
b	31/12/2021	1
c	31/12/2020	1
c	31/01/2021	1
c	28/02/2021	1
c	31/03/2021	1
c	30/04/2021	1
c	31/07/2021	1
c	31/08/2021	1
c	30/09/2021	1
c	31/10/2021	1
c	30/11/2021	1
c	31/12/2021	1

I would like to have add a new Value (Max Rank) having the below conditions (Either Proc SQL or SAS Base):

1. If in the next 12 months (E.g. for 31 December 2020 it should check until 31 December 2021) Max Rank is higher than 2 then show Max rank for each ID.

2. If in the next 12 months Max rank is 2 or lower and there is no missing month (so the ID should be present in all consecutive months (From 31 December 2020 to 31 December 2021) then show Max Rank for each ID.

3. If in the next 12 months Max rank is 2 or lower but there at least one missing month (From 31 December 2020 to 31 December 2021 the ID is not present in one of the months) then Max Rank should be empty for each ID.

A summary of want table is presented below:

ID	Data	Rank	Max Rank
a	31/12/2020	1	3
a	31/01/2021	1	3
a	28/02/2021	1	3
a	31/03/2021	1	3
a	30/04/2021	1	3
a	31/05/2021	1	3
a	30/06/2021	1	3
a	31/07/2021	3	3
a	31/08/2021	1	1
a	30/09/2021	1	1
a	31/10/2021	1	1
a	30/11/2021	1	1
a	31/12/2021	1	1
b	31/12/2020	1	3
b	31/01/2021	1	3
b	31/05/2021	1	3
b	30/06/2021	1	3
b	31/07/2021	3	3
b	31/08/2021	1	1
b	30/09/2021	1	1
b	31/10/2021	1	1
b	30/11/2021	1	1
b	31/12/2021	1	1
c	31/12/2020	1
c	31/01/2021	1
c	28/02/2021	1
c	31/03/2021	1
c	30/04/2021	1
c	31/07/2021	1
c	31/08/2021	1
c	30/09/2021	1
c	31/10/2021	1
c	30/11/2021	1
c	31/12/2021	1

Thank you in advance for your help and apologize if I have skipped anything.

Patrick · Posted 09-24-2021 11:13 PM

Can you please explain why in your desired result you've got a desired max rank of 1 for below highlighted months?

Given your rule 3 shouldn't there be missings because there aren't any future 12 months?:

If in the next 12 months Max rank is 2 or lower but there at least one missing month (From 31 December 2020 to 31 December 2021 the ID is not present in one of the months) then Max Rank should be empty for each ID.

The way I interpret your rules a SQL could look like below. But this logic doesn't return the result as per your post.

Please also tell us what data volumes you're dealing with. I wouldn't use below SQL for big volumes.

data have;
  infile datalines truncover;
  input ID $ Date:ddmmyy10. Rank max_expected;
  format date date9.;
  datalines;
a 31/12/2020 1 3
a 31/01/2021 1 3
a 28/02/2021 1 3
a 31/03/2021 1 3
a 30/04/2021 1 3
a 31/05/2021 1 3
a 30/06/2021 1 3
a 31/07/2021 3 3
a 31/08/2021 1 1
a 30/09/2021 1 1
a 31/10/2021 1 1
a 30/11/2021 1 1
a 31/12/2021 1 1
b 31/12/2020 1 3
b 31/01/2021 1 3
b 31/05/2021 1 3
b 30/06/2021 1 3
b 31/07/2021 3 3
b 31/08/2021 1 1
b 30/09/2021 1 1
b 31/10/2021 1 1
b 30/11/2021 1 1
b 31/12/2021 1 1
c 31/12/2020 1  
c 31/01/2021 1  
c 28/02/2021 1  
c 31/03/2021 1  
c 30/04/2021 1  
c 31/07/2021 1  
c 31/08/2021 1  
c 30/09/2021 1  
c 31/10/2021 1  
c 30/11/2021 1  
c 31/12/2021 1  
;

proc sql;
  select 
    o.*, 
    (
      select
        case
          when max_calculated>2 then max_calculated
          when n_months=13 then max_calculated
          else .
          end as max_calculated
      from
        (
          select max(Rank) as max_calculated, count(*) as n_months
          from have i
          where i.id=o.id and i.date between o.date and intnx('month',o.date,12,'e')
          group by i.id
        )
    ) as max_calculated
  from 
    have o 
  ;
quit;

View solution in original post

mkeintz · Posted 09-24-2021 11:13 PM

If rank for 31/12/2020 is 3, but is 1 for Jan 2021 through Dec of 2021, then what is the value of maxrank for 31/12/2020?

I.e., when you say "next 12 months", are you also including the current month (making a total of 13 months) when finding the maximum?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Patrick · Posted 09-24-2021 11:13 PM

Can you please explain why in your desired result you've got a desired max rank of 1 for below highlighted months?

Given your rule 3 shouldn't there be missings because there aren't any future 12 months?:

If in the next 12 months Max rank is 2 or lower but there at least one missing month (From 31 December 2020 to 31 December 2021 the ID is not present in one of the months) then Max Rank should be empty for each ID.

The way I interpret your rules a SQL could look like below. But this logic doesn't return the result as per your post.

Please also tell us what data volumes you're dealing with. I wouldn't use below SQL for big volumes.

data have;
  infile datalines truncover;
  input ID $ Date:ddmmyy10. Rank max_expected;
  format date date9.;
  datalines;
a 31/12/2020 1 3
a 31/01/2021 1 3
a 28/02/2021 1 3
a 31/03/2021 1 3
a 30/04/2021 1 3
a 31/05/2021 1 3
a 30/06/2021 1 3
a 31/07/2021 3 3
a 31/08/2021 1 1
a 30/09/2021 1 1
a 31/10/2021 1 1
a 30/11/2021 1 1
a 31/12/2021 1 1
b 31/12/2020 1 3
b 31/01/2021 1 3
b 31/05/2021 1 3
b 30/06/2021 1 3
b 31/07/2021 3 3
b 31/08/2021 1 1
b 30/09/2021 1 1
b 31/10/2021 1 1
b 30/11/2021 1 1
b 31/12/2021 1 1
c 31/12/2020 1  
c 31/01/2021 1  
c 28/02/2021 1  
c 31/03/2021 1  
c 30/04/2021 1  
c 31/07/2021 1  
c 31/08/2021 1  
c 30/09/2021 1  
c 31/10/2021 1  
c 30/11/2021 1  
c 31/12/2021 1  
;

proc sql;
  select 
    o.*, 
    (
      select
        case
          when max_calculated>2 then max_calculated
          when n_months=13 then max_calculated
          else .
          end as max_calculated
      from
        (
          select max(Rank) as max_calculated, count(*) as n_months
          from have i
          where i.id=o.id and i.date between o.date and intnx('month',o.date,12,'e')
          group by i.id
        )
    ) as max_calculated
  from 
    have o 
  ;
quit;

inid · Posted 09-25-2021 01:34 AM

Your answer is what I was looking for. You are right about the cases presented because I truncated the data from a larger one that's why your results do not match.

mkeintz · Posted 09-24-2021 11:47 PM

Consider reading each ID group twice. The first time to populate an array of 13 ranks (for each month from Dec 2020 through Dec 2021). Count the number of ranks, and find an initial maximum.

During the second reading of the ID group, if the initial max is >2 or the array started out completely populated, then you know that you can progressively calculate "future" maxima from the array. Otherwise the max_rank will be missing:

data want (drop=_:); 

   array _rank_history {0:12}; /*For Dec 2020 (array index zero) through Dec 2021 (array index 12)*/

   do until (last.id);
     set have;
     by id;
     _i=intck('month','31dec2020'd,date);  /*Get month index for current date */
     _rank_history{_i}=rank;                /*Save current rank in the array   */
     _initial_count=sum(_initial_count,1);
   end;
   _initial_max=max(of _rank_history{*});


  do until (last.id);
    set have;
    by id;
    if _initial_count=13 or _initial_max>2 then max_rank=max(of _rank_history{*});
    _i=intck('month','31dec2020'd,date);
    _rank_history{_i}=.;      /*Remove current month for future max considerations */
    output;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Finding the maximum value in the upcoming 12 months

Re: Finding the maximum value in the upcoming 12 months

Re: Finding the maximum value in the upcoming 12 months

Re: Finding the maximum value in the upcoming 12 months

Re: Finding the maximum value in the upcoming 12 months

Re: Finding the maximum value in the upcoming 12 months

Click image to register for webinar

Classroom Training Available!