Hi. I got some good suggestions for doing a cumulative sum of VALUE for the previous 3 months using a LAG strategy.
As usual, I find the data set is incomplete/missing date rows.
To do a rolling 3 month sum I need a complete set of DATE(MY) and NAMES with at least 0 as a VALUE (I think?).
What is the best way to do this? I can create an excel list of all month/years if that helps.
Or is there a better strategy? Thanks!
HAVE: WANT: (a row for every month)
NAME DATE (MY) VALUE NAME DATE(MY) VALUE CUM_SUM (prev 3Mo)
Name1 11/2018 4 Name1 11/2018 4 4
Name1 01/2019 3 Name1 12/2018 0 4
Name1 04//2019 2 Name1 01/2019 3 4
Name2 10/2018 1 Name1 02/2019 0 7
Name 2 01/2019 3 .... (for all month/years to present)
Name2 10/2018 1 1
Name2 11/2018 0 1
...etc.
Here is a similar method, but that also includes the cum. From your example, I'm not sure how you want to calculate the cum, so the code might have to be adjusted to meet your needs.
data have;
informat NAME $20.;
informat DATE mmddyy10.;
format DATE mmyys7.;
input NAME $ DATE $ VALUE;
cards;
Name1 11/01/2018 4
Name1 01/01/2019 3
Name1 04/01/2019 2
Name2 10/01/2018 1
Name2 01/01/2019 3
;
data want (drop=_: Next:);
set have (rename=(DATE=_DATE));
by NAME;
set have ( firstobs = 2 keep = DATE rename = (DATE = Next_DATE) )
have ( obs = 1 drop = _all_ );
Next_DATE = ifn( last.NAME, _DATE+1, Next_DATE );
if first.name then cum=value;
else cum+value;
format DATE mmyys7.;
do DATE=_DATE to Next_DATE-1;
if DATE eq _DATE or day(date) eq 1 then do;
if DATE ne _DATE then value=0;
output;
end;
end;
run;
Art, CEO, AnalystFinder.com
what have you tried?
How to create a data-step version of your data
https://communities.sas.com/t5/help/faqpage/faq-category-id/posting#posting
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
How to Add attachments
https://communities.sas.com/t5/Community-Memo/Adding-attachments-to-your-communities-posts/ba-p/4647...
...nothing that I think is close. I am looking for code ideas. I am a relatively new SAS user.
Hello @crawfe
Help us with this information
1. Is your date, a SAS numeric date variable
2. Or is it a character date i.e a text in the form mm/yyyy?
Hi. They are Date variables. mmyys7.
Here is a similar method, but that also includes the cum. From your example, I'm not sure how you want to calculate the cum, so the code might have to be adjusted to meet your needs.
data have;
informat NAME $20.;
informat DATE mmddyy10.;
format DATE mmyys7.;
input NAME $ DATE $ VALUE;
cards;
Name1 11/01/2018 4
Name1 01/01/2019 3
Name1 04/01/2019 2
Name2 10/01/2018 1
Name2 01/01/2019 3
;
data want (drop=_: Next:);
set have (rename=(DATE=_DATE));
by NAME;
set have ( firstobs = 2 keep = DATE rename = (DATE = Next_DATE) )
have ( obs = 1 drop = _all_ );
Next_DATE = ifn( last.NAME, _DATE+1, Next_DATE );
if first.name then cum=value;
else cum+value;
format DATE mmyys7.;
do DATE=_DATE to Next_DATE-1;
if DATE eq _DATE or day(date) eq 1 then do;
if DATE ne _DATE then value=0;
output;
end;
end;
run;
Art, CEO, AnalystFinder.com
Hi @crawfe here is an illustration with the assumption that your date variable is a SAS numeric date variable formatted in the form mm/yyyy. So the informat read those nonstandard date and reverted them to the 1st of the month, albeit again formatting as month year monyy. All this process does is increment by 1 month from current to next-1. HTH
data HAVE;
input NAME $ DATE : anydtdte10. VALUE ;
format date mmyys7.;
cards;
Name1 11/2018 4
Name1 01/2019 3
Name1 04//2019 2
Name2 10/2018 1
Name2 01/2019 3
;
data want;
merge have have(firstobs=2 drop=value rename=(name=_name date=_date));
if name=_name then do while(date<_date);
output;
date=intnx('month',date,1);
value=0;
end;
else output;
drop _:;
run;
Rather than using a look-ahead logic, you can try the "paint brush" key-indexing method where consecutive "months" are stored in a temp array:
data have ;
input (name datemy) ($) value ;
cards ;
Name1 11/2018 4
Name1 01/2019 3
Name1 04/2019 2
Name2 10/2018 1
Name2 01/2019 3
run ;
data want (drop = _:) ;
retain _0 "01jan1582"d _i "mon" ;
array val [99999] _temporary_ ;
do _lo = 99999 by 0 until (last.name) ;
set have ;
by name ;
_d = input (datemy, anydtdte.) ;
_lo = _lo min _d ;
_hi = _hi max _d ;
val [intck (_i, _0, _d)] = value ;
end ;
do _m = intck (_i, _0, _lo) to intck (_i, _0, _hi) ;
datemy = put (intnx (_i, _0, _m), mmyys7.) ;
value = sum (val[_m], 0) ;
cum_sum = sum (cum_sum, value) ;
val[_m] = . ;
output ;
end ;
run ;
Notes:
1. 99999 is an arbitrarily "big" number (good enough to accommodate 8000+ years worth of months since the beginning of SAS time).
2. RETAIN is used just to make the program look prettier than one peppered with hard coded constants.
3. _lo = 99999 by 0 intializes _lo to 99999 every time the DoW loop starts for the next BY group.
Kind regards
Paul D.
Special handling needs to be applied to dates that represent future events.
Taking medications would be one. We don't want to predict that a client receives meds until 99999 as prescribed when forecasting company profits, because that is just not realistic. While a person on SSI may receive their benefits until their death and they may take a prescribed medication until that date, a business might try to practice their profits from those meds by setting a cut-off date value for the persons life expectancy or date-of- predicted death.
We don't pay our bills 6 years in advance.
@VDD:
But of course ... all true.
However, here 99999 doesn't represent any "future event" - it's merely a subterfuge to allocate an array of sufficient size, basically a technical "infinity" to avoid pre-reading the data set to find the maximum date in it - or, alternatively, to using a hash table that doesn't need pre-sizing at the expense of making the code more (and perhaps unnecessarily) complex. Though apparently a kludge, such an approach is well justified in a number of cases to avoid extra processing.
Kind regards
Paul D.
Thanks everyone! I can work with this. I appreciate you sharing your knowledge, through both your code and your "code strategy".
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.