BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
inid
Obsidian | Level 7

Hello 

 
I have a starting table as below where I have a rank for each ID on each month. 
IDDataRank 
a31/12/20201
a31/01/20211
a28/02/20211
a31/03/20211
a30/04/20211
a31/05/20211
a30/06/20211
a31/07/20213
a31/08/20211
a30/09/20211
a31/10/20211
a30/11/20211
a31/12/20211
b31/12/20201
b31/01/20211
b31/05/20211
b30/06/20211
b31/07/20213
b31/08/20211
b30/09/20211
b31/10/20211
b30/11/20211
b31/12/20211
c31/12/20201
c31/01/20211
c28/02/20211
c31/03/20211
c30/04/20211
c31/07/20211
c31/08/20211
c30/09/20211
c31/10/20211
c30/11/20211
c31/12/20211

 

I would like to have add a new Value (Max Rank) having the below conditions (Either Proc SQL or SAS Base):

1. If in the next 12 months (E.g. for 31 December 2020 it should check until 31 December 2021) Max Rank is higher than 2 then show Max rank for each ID. 

2. If in the next 12 months Max rank is 2 or lower  and there is no missing month (so the ID should be present in all consecutive months (From 31 December 2020 to 31 December 2021) then show Max Rank for each ID. 

3. If in the next 12 months Max rank is 2 or lower but there at least one missing month (From 31 December 2020 to 31 December 2021 the ID is not present in one of the months) then Max Rank should be empty for each ID. 

A summary of want table is presented below:

 

IDDataRank Max Rank
a31/12/202013
a31/01/202113
a28/02/202113
a31/03/202113
a30/04/202113
a31/05/202113
a30/06/202113
a31/07/202133
a31/08/202111
a30/09/202111
a31/10/202111
a30/11/202111
a31/12/202111
b31/12/202013
b31/01/202113
b31/05/202113
b30/06/202113
b31/07/202133
b31/08/202111
b30/09/202111
b31/10/202111
b30/11/202111
b31/12/202111
c31/12/20201 
c31/01/20211 
c28/02/20211 
c31/03/20211 
c30/04/20211 
c31/07/20211 
c31/08/20211 
c30/09/20211 
c31/10/20211 
c30/11/20211 
c31/12/20211 

 

Thank you in advance for your help and apologize if I have skipped anything. 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Can you please explain why in your desired result you've got a desired max rank of 1 for below highlighted months?

 

Patrick_0-1632539440802.png

 

Given your rule 3 shouldn't there be missings because there aren't any future 12 months?: 

If in the next 12 months Max rank is 2 or lower but there at least one missing month (From 31 December 2020 to 31 December 2021 the ID is not present in one of the months) then Max Rank should be empty for each ID. 

 

The way I interpret your rules a SQL could look like below. But this logic doesn't return the result as per your post.

Please also tell us what data volumes you're dealing with. I wouldn't use below SQL for big volumes.

data have;
  infile datalines truncover;
  input ID $ Date:ddmmyy10. Rank max_expected;
  format date date9.;
  datalines;
a 31/12/2020 1 3
a 31/01/2021 1 3
a 28/02/2021 1 3
a 31/03/2021 1 3
a 30/04/2021 1 3
a 31/05/2021 1 3
a 30/06/2021 1 3
a 31/07/2021 3 3
a 31/08/2021 1 1
a 30/09/2021 1 1
a 31/10/2021 1 1
a 30/11/2021 1 1
a 31/12/2021 1 1
b 31/12/2020 1 3
b 31/01/2021 1 3
b 31/05/2021 1 3
b 30/06/2021 1 3
b 31/07/2021 3 3
b 31/08/2021 1 1
b 30/09/2021 1 1
b 31/10/2021 1 1
b 30/11/2021 1 1
b 31/12/2021 1 1
c 31/12/2020 1  
c 31/01/2021 1  
c 28/02/2021 1  
c 31/03/2021 1  
c 30/04/2021 1  
c 31/07/2021 1  
c 31/08/2021 1  
c 30/09/2021 1  
c 31/10/2021 1  
c 30/11/2021 1  
c 31/12/2021 1  
;

proc sql;
  select 
    o.*, 
    (
      select
        case
          when max_calculated>2 then max_calculated
          when n_months=13 then max_calculated
          else .
          end as max_calculated
      from
        (
          select max(Rank) as max_calculated, count(*) as n_months
          from have i
          where i.id=o.id and i.date between o.date and intnx('month',o.date,12,'e')
          group by i.id
        )
    ) as max_calculated
  from 
    have o 
  ;
quit;

View solution in original post

4 REPLIES 4
mkeintz
PROC Star

If rank for 31/12/2020 is 3, but is 1 for Jan 2021 through Dec of 2021, then what is the value of maxrank for 31/12/2020?

 

I.e., when you say "next 12 months", are you also including the current month (making a total of 13 months) when finding the maximum?

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

Can you please explain why in your desired result you've got a desired max rank of 1 for below highlighted months?

 

Patrick_0-1632539440802.png

 

Given your rule 3 shouldn't there be missings because there aren't any future 12 months?: 

If in the next 12 months Max rank is 2 or lower but there at least one missing month (From 31 December 2020 to 31 December 2021 the ID is not present in one of the months) then Max Rank should be empty for each ID. 

 

The way I interpret your rules a SQL could look like below. But this logic doesn't return the result as per your post.

Please also tell us what data volumes you're dealing with. I wouldn't use below SQL for big volumes.

data have;
  infile datalines truncover;
  input ID $ Date:ddmmyy10. Rank max_expected;
  format date date9.;
  datalines;
a 31/12/2020 1 3
a 31/01/2021 1 3
a 28/02/2021 1 3
a 31/03/2021 1 3
a 30/04/2021 1 3
a 31/05/2021 1 3
a 30/06/2021 1 3
a 31/07/2021 3 3
a 31/08/2021 1 1
a 30/09/2021 1 1
a 31/10/2021 1 1
a 30/11/2021 1 1
a 31/12/2021 1 1
b 31/12/2020 1 3
b 31/01/2021 1 3
b 31/05/2021 1 3
b 30/06/2021 1 3
b 31/07/2021 3 3
b 31/08/2021 1 1
b 30/09/2021 1 1
b 31/10/2021 1 1
b 30/11/2021 1 1
b 31/12/2021 1 1
c 31/12/2020 1  
c 31/01/2021 1  
c 28/02/2021 1  
c 31/03/2021 1  
c 30/04/2021 1  
c 31/07/2021 1  
c 31/08/2021 1  
c 30/09/2021 1  
c 31/10/2021 1  
c 30/11/2021 1  
c 31/12/2021 1  
;

proc sql;
  select 
    o.*, 
    (
      select
        case
          when max_calculated>2 then max_calculated
          when n_months=13 then max_calculated
          else .
          end as max_calculated
      from
        (
          select max(Rank) as max_calculated, count(*) as n_months
          from have i
          where i.id=o.id and i.date between o.date and intnx('month',o.date,12,'e')
          group by i.id
        )
    ) as max_calculated
  from 
    have o 
  ;
quit;
inid
Obsidian | Level 7
Your answer is what I was looking for. You are right about the cases presented because I truncated the data from a larger one that's why your results do not match.
mkeintz
PROC Star

Consider reading each ID group twice.  The first time to populate an array of 13 ranks (for each month from Dec 2020 through Dec 2021).  Count the number of ranks, and find an initial maximum.

 

During the second reading of the ID group, if the initial max is >2 or the array started out completely populated, then you know that you can progressively calculate "future" maxima from the array.  Otherwise the max_rank will be missing:

 

data want (drop=_:); 

   array _rank_history {0:12}; /*For Dec 2020 (array index zero) through Dec 2021 (array index 12)*/

   do until (last.id);
     set have;
     by id;
     _i=intck('month','31dec2020'd,date);  /*Get month index for current date */
     _rank_history{_i}=rank;                /*Save current rank in the array   */
     _initial_count=sum(_initial_count,1);
   end;
   _initial_max=max(of _rank_history{*});


  do until (last.id);
    set have;
    by id;
    if _initial_count=13 or _initial_max>2 then max_rank=max(of _rank_history{*});
    _i=intck('month','31dec2020'd,date);
    _rank_history{_i}=.;      /*Remove current month for future max considerations */
    output;
  end;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 753 views
  • 2 likes
  • 3 in conversation