Hello
ID | Data | Rank |
a | 31/12/2020 | 1 |
a | 31/01/2021 | 1 |
a | 28/02/2021 | 1 |
a | 31/03/2021 | 1 |
a | 30/04/2021 | 1 |
a | 31/05/2021 | 1 |
a | 30/06/2021 | 1 |
a | 31/07/2021 | 3 |
a | 31/08/2021 | 1 |
a | 30/09/2021 | 1 |
a | 31/10/2021 | 1 |
a | 30/11/2021 | 1 |
a | 31/12/2021 | 1 |
b | 31/12/2020 | 1 |
b | 31/01/2021 | 1 |
b | 31/05/2021 | 1 |
b | 30/06/2021 | 1 |
b | 31/07/2021 | 3 |
b | 31/08/2021 | 1 |
b | 30/09/2021 | 1 |
b | 31/10/2021 | 1 |
b | 30/11/2021 | 1 |
b | 31/12/2021 | 1 |
c | 31/12/2020 | 1 |
c | 31/01/2021 | 1 |
c | 28/02/2021 | 1 |
c | 31/03/2021 | 1 |
c | 30/04/2021 | 1 |
c | 31/07/2021 | 1 |
c | 31/08/2021 | 1 |
c | 30/09/2021 | 1 |
c | 31/10/2021 | 1 |
c | 30/11/2021 | 1 |
c | 31/12/2021 | 1 |
I would like to have add a new Value (Max Rank) having the below conditions (Either Proc SQL or SAS Base):
1. If in the next 12 months (E.g. for 31 December 2020 it should check until 31 December 2021) Max Rank is higher than 2 then show Max rank for each ID.
2. If in the next 12 months Max rank is 2 or lower and there is no missing month (so the ID should be present in all consecutive months (From 31 December 2020 to 31 December 2021) then show Max Rank for each ID.
3. If in the next 12 months Max rank is 2 or lower but there at least one missing month (From 31 December 2020 to 31 December 2021 the ID is not present in one of the months) then Max Rank should be empty for each ID.
A summary of want table is presented below:
ID | Data | Rank | Max Rank |
a | 31/12/2020 | 1 | 3 |
a | 31/01/2021 | 1 | 3 |
a | 28/02/2021 | 1 | 3 |
a | 31/03/2021 | 1 | 3 |
a | 30/04/2021 | 1 | 3 |
a | 31/05/2021 | 1 | 3 |
a | 30/06/2021 | 1 | 3 |
a | 31/07/2021 | 3 | 3 |
a | 31/08/2021 | 1 | 1 |
a | 30/09/2021 | 1 | 1 |
a | 31/10/2021 | 1 | 1 |
a | 30/11/2021 | 1 | 1 |
a | 31/12/2021 | 1 | 1 |
b | 31/12/2020 | 1 | 3 |
b | 31/01/2021 | 1 | 3 |
b | 31/05/2021 | 1 | 3 |
b | 30/06/2021 | 1 | 3 |
b | 31/07/2021 | 3 | 3 |
b | 31/08/2021 | 1 | 1 |
b | 30/09/2021 | 1 | 1 |
b | 31/10/2021 | 1 | 1 |
b | 30/11/2021 | 1 | 1 |
b | 31/12/2021 | 1 | 1 |
c | 31/12/2020 | 1 | |
c | 31/01/2021 | 1 | |
c | 28/02/2021 | 1 | |
c | 31/03/2021 | 1 | |
c | 30/04/2021 | 1 | |
c | 31/07/2021 | 1 | |
c | 31/08/2021 | 1 | |
c | 30/09/2021 | 1 | |
c | 31/10/2021 | 1 | |
c | 30/11/2021 | 1 | |
c | 31/12/2021 | 1 |
Thank you in advance for your help and apologize if I have skipped anything.
Can you please explain why in your desired result you've got a desired max rank of 1 for below highlighted months?
Given your rule 3 shouldn't there be missings because there aren't any future 12 months?:
If in the next 12 months Max rank is 2 or lower but there at least one missing month (From 31 December 2020 to 31 December 2021 the ID is not present in one of the months) then Max Rank should be empty for each ID.
The way I interpret your rules a SQL could look like below. But this logic doesn't return the result as per your post.
Please also tell us what data volumes you're dealing with. I wouldn't use below SQL for big volumes.
data have;
infile datalines truncover;
input ID $ Date:ddmmyy10. Rank max_expected;
format date date9.;
datalines;
a 31/12/2020 1 3
a 31/01/2021 1 3
a 28/02/2021 1 3
a 31/03/2021 1 3
a 30/04/2021 1 3
a 31/05/2021 1 3
a 30/06/2021 1 3
a 31/07/2021 3 3
a 31/08/2021 1 1
a 30/09/2021 1 1
a 31/10/2021 1 1
a 30/11/2021 1 1
a 31/12/2021 1 1
b 31/12/2020 1 3
b 31/01/2021 1 3
b 31/05/2021 1 3
b 30/06/2021 1 3
b 31/07/2021 3 3
b 31/08/2021 1 1
b 30/09/2021 1 1
b 31/10/2021 1 1
b 30/11/2021 1 1
b 31/12/2021 1 1
c 31/12/2020 1
c 31/01/2021 1
c 28/02/2021 1
c 31/03/2021 1
c 30/04/2021 1
c 31/07/2021 1
c 31/08/2021 1
c 30/09/2021 1
c 31/10/2021 1
c 30/11/2021 1
c 31/12/2021 1
;
proc sql;
select
o.*,
(
select
case
when max_calculated>2 then max_calculated
when n_months=13 then max_calculated
else .
end as max_calculated
from
(
select max(Rank) as max_calculated, count(*) as n_months
from have i
where i.id=o.id and i.date between o.date and intnx('month',o.date,12,'e')
group by i.id
)
) as max_calculated
from
have o
;
quit;
If rank for 31/12/2020 is 3, but is 1 for Jan 2021 through Dec of 2021, then what is the value of maxrank for 31/12/2020?
I.e., when you say "next 12 months", are you also including the current month (making a total of 13 months) when finding the maximum?
Can you please explain why in your desired result you've got a desired max rank of 1 for below highlighted months?
Given your rule 3 shouldn't there be missings because there aren't any future 12 months?:
If in the next 12 months Max rank is 2 or lower but there at least one missing month (From 31 December 2020 to 31 December 2021 the ID is not present in one of the months) then Max Rank should be empty for each ID.
The way I interpret your rules a SQL could look like below. But this logic doesn't return the result as per your post.
Please also tell us what data volumes you're dealing with. I wouldn't use below SQL for big volumes.
data have;
infile datalines truncover;
input ID $ Date:ddmmyy10. Rank max_expected;
format date date9.;
datalines;
a 31/12/2020 1 3
a 31/01/2021 1 3
a 28/02/2021 1 3
a 31/03/2021 1 3
a 30/04/2021 1 3
a 31/05/2021 1 3
a 30/06/2021 1 3
a 31/07/2021 3 3
a 31/08/2021 1 1
a 30/09/2021 1 1
a 31/10/2021 1 1
a 30/11/2021 1 1
a 31/12/2021 1 1
b 31/12/2020 1 3
b 31/01/2021 1 3
b 31/05/2021 1 3
b 30/06/2021 1 3
b 31/07/2021 3 3
b 31/08/2021 1 1
b 30/09/2021 1 1
b 31/10/2021 1 1
b 30/11/2021 1 1
b 31/12/2021 1 1
c 31/12/2020 1
c 31/01/2021 1
c 28/02/2021 1
c 31/03/2021 1
c 30/04/2021 1
c 31/07/2021 1
c 31/08/2021 1
c 30/09/2021 1
c 31/10/2021 1
c 30/11/2021 1
c 31/12/2021 1
;
proc sql;
select
o.*,
(
select
case
when max_calculated>2 then max_calculated
when n_months=13 then max_calculated
else .
end as max_calculated
from
(
select max(Rank) as max_calculated, count(*) as n_months
from have i
where i.id=o.id and i.date between o.date and intnx('month',o.date,12,'e')
group by i.id
)
) as max_calculated
from
have o
;
quit;
Consider reading each ID group twice. The first time to populate an array of 13 ranks (for each month from Dec 2020 through Dec 2021). Count the number of ranks, and find an initial maximum.
During the second reading of the ID group, if the initial max is >2 or the array started out completely populated, then you know that you can progressively calculate "future" maxima from the array. Otherwise the max_rank will be missing:
data want (drop=_:);
array _rank_history {0:12}; /*For Dec 2020 (array index zero) through Dec 2021 (array index 12)*/
do until (last.id);
set have;
by id;
_i=intck('month','31dec2020'd,date); /*Get month index for current date */
_rank_history{_i}=rank; /*Save current rank in the array */
_initial_count=sum(_initial_count,1);
end;
_initial_max=max(of _rank_history{*});
do until (last.id);
set have;
by id;
if _initial_count=13 or _initial_max>2 then max_rank=max(of _rank_history{*});
_i=intck('month','31dec2020'd,date);
_rank_history{_i}=.; /*Remove current month for future max considerations */
output;
end;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.