BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
crawfe
Quartz | Level 8

Hi. I got some good suggestions for doing a cumulative sum of VALUE for the previous 3 months using a LAG strategy.

As usual, I find the data set is incomplete/missing date rows.

 

To do a rolling 3 month sum I need a complete set of DATE(MY) and NAMES  with at least 0 as a VALUE (I think?).

What is the best way to do this? I can create an excel list of all month/years if that helps. 

Or is there a better strategy? Thanks!

 

HAVE:                                                               WANT:             (a row for every month)

NAME               DATE (MY)          VALUE          NAME            DATE(MY)      VALUE          CUM_SUM (prev 3Mo)

Name1             11/2018                   4                  Name1            11/2018            4                         4

Name1              01/2019                  3                  Name1            12/2018            0                         4

Name1              04//2019                 2                  Name1             01/2019           3                         4

Name2              10/2018                  1                  Name1             02/2019           0                         7

Name 2              01/2019                 3                   .... (for all month/years to present)

                                                                             Name2              10/2018         1                          1

                                                                             Name2              11/2018          0                         1

                                                                                ...etc.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Here is a similar method, but that also includes the cum. From your example, I'm not sure how you want to calculate the cum, so the code might have to be adjusted to meet your needs.

data have;
  informat NAME $20.;
  informat DATE mmddyy10.;
  format DATE mmyys7.;
  input NAME $ DATE $ VALUE;
  cards;
Name1              11/01/2018                 4
Name1              01/01/2019                 3
Name1              04/01/2019                 2
Name2              10/01/2018                 1
Name2              01/01/2019                 3
;

data want (drop=_: Next:);
  set have (rename=(DATE=_DATE));
  by NAME;
  set have ( firstobs = 2 keep = DATE rename = (DATE = Next_DATE) )
      have (      obs = 1 drop = _all_ );
  Next_DATE = ifn(  last.NAME, _DATE+1, Next_DATE );
  if first.name then cum=value;
  else cum+value;
  format DATE mmyys7.;
  do DATE=_DATE to Next_DATE-1;
    if DATE eq _DATE or day(date) eq 1 then do;
      if DATE ne _DATE then value=0;
      output;
    end;
  end;
run;

Art, CEO, AnalystFinder.com

 

 

 

 

View solution in original post

10 REPLIES 10
crawfe
Quartz | Level 8

...nothing that I think is close. I am looking for code ideas. I am a relatively new SAS user.

novinosrin
Tourmaline | Level 20

Hello @crawfe 

 

Help us with this information

 

1. Is your date, a SAS numeric date variable 

2. Or is it a character date i.e a text in the form mm/yyyy?

 

 

 

crawfe
Quartz | Level 8

Hi. They are Date variables. mmyys7.

art297
Opal | Level 21

Here is a similar method, but that also includes the cum. From your example, I'm not sure how you want to calculate the cum, so the code might have to be adjusted to meet your needs.

data have;
  informat NAME $20.;
  informat DATE mmddyy10.;
  format DATE mmyys7.;
  input NAME $ DATE $ VALUE;
  cards;
Name1              11/01/2018                 4
Name1              01/01/2019                 3
Name1              04/01/2019                 2
Name2              10/01/2018                 1
Name2              01/01/2019                 3
;

data want (drop=_: Next:);
  set have (rename=(DATE=_DATE));
  by NAME;
  set have ( firstobs = 2 keep = DATE rename = (DATE = Next_DATE) )
      have (      obs = 1 drop = _all_ );
  Next_DATE = ifn(  last.NAME, _DATE+1, Next_DATE );
  if first.name then cum=value;
  else cum+value;
  format DATE mmyys7.;
  do DATE=_DATE to Next_DATE-1;
    if DATE eq _DATE or day(date) eq 1 then do;
      if DATE ne _DATE then value=0;
      output;
    end;
  end;
run;

Art, CEO, AnalystFinder.com

 

 

 

 

novinosrin
Tourmaline | Level 20

Hi @crawfe   here is an illustration with the assumption that your date variable is a SAS numeric date variable formatted in the form mm/yyyy. So the informat read those nonstandard date and reverted them to the 1st of the month, albeit again formatting as month year monyy. All this process does is increment by 1 month from current to next-1. HTH

 

 




data HAVE;                                                              
input NAME       $        DATE : anydtdte10.         VALUE    ;
format date mmyys7.;
cards;
Name1             11/2018                   4                 
Name1              01/2019                  3                
Name1              04//2019                 2                 
Name2              10/2018                  1             
Name2              01/2019                 3  
;


data want;
merge have have(firstobs=2 drop=value rename=(name=_name date=_date));
if name=_name then do while(date<_date);
output;
date=intnx('month',date,1);
value=0;
end;
else output;
drop _:;
run;
hashman
Ammonite | Level 13

@crawfe:

Rather than using a look-ahead logic, you can try the "paint brush" key-indexing method where consecutive "months" are stored in a temp array: 

data have ;                                           
  input (name datemy) ($) value ;                     
  cards ;                                             
Name1  11/2018  4                                     
Name1  01/2019  3                                     
Name1  04/2019  2                                     
Name2  10/2018  1                                     
Name2  01/2019  3                                     
run ;                                                 
                                                      
data want (drop = _:) ;                               
  retain _0 "01jan1582"d _i "mon" ;         
  array val [99999] _temporary_ ;                     
  do _lo = 99999 by 0 until (last.name) ;                              
    set have ;                                        
    by name ;                                         
    _d = input (datemy, anydtdte.) ;                  
    _lo = _lo min _d  ;                               
    _hi = _hi max _d  ;                               
    val [intck (_i, _0, _d)] = value ;                
  end ;                                               
  do _m = intck (_i, _0, _lo) to intck (_i, _0, _hi) ;
    datemy = put (intnx (_i, _0, _m), mmyys7.) ;      
    value = sum (val[_m], 0) ;                        
    cum_sum = sum (cum_sum, value) ;                  
    val[_m] = . ;                                     
    output ;                                          
  end ;                                               
run ;                                                 

Notes:

1. 99999 is an arbitrarily "big" number (good enough to accommodate 8000+ years worth of months since the beginning of SAS time). 

2. RETAIN is used just to make the program look prettier than one peppered with hard coded constants. 

3. _lo = 99999 by 0 intializes _lo to 99999 every time the DoW loop starts for the next BY group.

 

Kind regards

Paul D.

VDD
Ammonite | Level 13 VDD
Ammonite | Level 13

 

Special handling needs to be applied to dates that represent future events.

Taking medications would be one.  We don't want to predict that a client receives meds until 99999 as prescribed when forecasting company profits, because that is just not realistic.  While a person on SSI may receive their benefits until their death and they may take a prescribed medication until that date, a business might try to practice their profits from those meds by setting a cut-off date value for the persons life expectancy or date-of- predicted death.

 

We don't pay our bills 6 years in advance.

 

 

 

hashman
Ammonite | Level 13

@VDD:

But of course ... all true.

However, here 99999 doesn't represent any "future event" - it's merely a subterfuge to allocate an array of sufficient size, basically a technical "infinity" to avoid pre-reading the data set to find the maximum date in it - or, alternatively, to using a hash table that doesn't need pre-sizing at the expense of making the code more (and perhaps unnecessarily) complex. Though apparently a kludge, such an approach is well justified in a number of cases to avoid extra processing.

  

Kind regards

Paul D.

 

crawfe
Quartz | Level 8

Thanks everyone! I can work with this. I appreciate you sharing your knowledge, through both your code and your "code strategy".

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 8027 views
  • 1 like
  • 5 in conversation