BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mssun
Calcite | Level 5

Hello,

 

Example of data:

 

 

ID  TOTAL_VALUE   START_DATE   END_DATE

1   1000          15-11-04     17-09-30

2   3000          16-11-29     17-12-15

3   5000          15-12-08     16-01-11

 

(all dates are in SAS date)

 

I would like to get pro rata transaction value for each year involved. For example, for ID1, the transaction starts in Nov 2015 and ends in Sep 2017. It has 2 months in 2015, 12 months in 2016, and 9 months in 2017. So for 2015, the assigned value = 2/23 * 1000, for 2016, 12/23 * 1000, for 2017, 9/23 * 1000. 

 

My desired output:

 

ID  YEAR    VALUE   

1   2015    86.96

1   2016    521.73

1   2017    391.30

2   2016    230.77

2   2017    2769.23

3   2015    2500.00

3   2016    2500.00

 

Calculation in terms of number of days, or round down to number of months as I have shown in the example are both acceptable. 

 

I'm very new to SAS, and I can only think of extracting years from start_date and end_date and then create an array of years out of it. But I can't figure out how to do the prorata part. Any thoughts would help. Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Unless I misunderstand the OP request, that is way too much code.  I believe a single loop will do:

 

data have;
  input ID  TOTAL_VALUE   (START_DATE   END_DATE) (:yymmdd10.);
  format START_DATE   END_DATE yymmdd10.;
cards;
1   1000          15-11-04     17-09-30
2   3000          16-11-29     17-12-15
3   5000          15-12-08     16-01-11
;
data want (drop=_:);
  set have;
  _ndays=1+end_date-start_date;
  do year=year(start_date) to year(end_date);
    value=  total_value*(1+ min(end_date,mdy(12,31,year))-max(start_date,mdy(1,1,year)) )/_ndays;
    output;
  end;
run;

 

The above is in days.  If you want months (where each month in a range counts as a whole month):

 

data want2 (drop=_:);
  set have;
  _nmonths=1+intck('month',start_date,end_date);

  do year=year(start_date) to year(end_date);
    value= total_value
         *(1+intck('month',max(start_date,mdy(1,1,year)),min(end_date,mdy(12,31,year))))
         /_nmonths;
    output;
  end;
run;

 

Edited additional note: this problem fits nicely in sas base programming forum, where it would get the widest audience.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
novinosrin
Tourmaline | Level 20

I'm afraid my result doesn;t seem to match your ID2, 

Can you correct the logic and see where i am wrong

For ID2, my logic computes 

 

2016  2/14*3000

2017 12/14*3000

 

data have;
input ID  TOTAL_VALUE   (START_DATE   END_DATE) (:yymmdd10.);
format START_DATE   END_DATE yymmdd10.;
cards;
1   1000          15-11-04     17-09-30
2   3000          16-11-29     17-12-15
3   5000          15-12-08     16-01-11
;
data want;
if _n_=1 then do;
 dcl hash H (ordered: "A", multidata:'y') ;
   h.definekey  ("id") ;
   h.definedata ('k','month') ;
   h.definedone () ;
end;
set have;
_month=0;
do k=year(start_date) to year(end_date);
if k ne year(end_date) then do;
month=intck('month',start_date,intnx('year',start_date,0,'e'))+1;
h.add();
_month+month;
start_date=intnx('year',start_date,1,'b');
end;
else if k=year(end_date) then do; month=intck('month',mdy(1,1,k),end_date)+1;
h.add();
_month+month;
end;
end;
do while(h.do_over(key:id) eq 0);
value=month/_month*TOTAL_VALUE;
output;
end;
keep id k value;
run;

 

SAS Output

Obs ID k value1234567
1201586.96
12016521.74
12017391.30
22016428.57
220172571.43
320152500.00
320162500.00
mkeintz
PROC Star

Unless I misunderstand the OP request, that is way too much code.  I believe a single loop will do:

 

data have;
  input ID  TOTAL_VALUE   (START_DATE   END_DATE) (:yymmdd10.);
  format START_DATE   END_DATE yymmdd10.;
cards;
1   1000          15-11-04     17-09-30
2   3000          16-11-29     17-12-15
3   5000          15-12-08     16-01-11
;
data want (drop=_:);
  set have;
  _ndays=1+end_date-start_date;
  do year=year(start_date) to year(end_date);
    value=  total_value*(1+ min(end_date,mdy(12,31,year))-max(start_date,mdy(1,1,year)) )/_ndays;
    output;
  end;
run;

 

The above is in days.  If you want months (where each month in a range counts as a whole month):

 

data want2 (drop=_:);
  set have;
  _nmonths=1+intck('month',start_date,end_date);

  do year=year(start_date) to year(end_date);
    value= total_value
         *(1+intck('month',max(start_date,mdy(1,1,year)),min(end_date,mdy(12,31,year))))
         /_nmonths;
    output;
  end;
run;

 

Edited additional note: this problem fits nicely in sas base programming forum, where it would get the widest audience.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mssun
Calcite | Level 5
Thank you! Your solution is elegant and works like a charm, especially in handling single-year situation.
I'll also take a look at sas base programming forum. Thanks!
novinosrin
Tourmaline | Level 20

@mkeintz Thank you sir for helping me learn so much and your points have always been amazing, not just this thread. Paying attention to your points, I cleaned my code a little bit, though yours is classy

 

data have;
input ID  TOTAL_VALUE   (START_DATE   END_DATE) (:yymmdd10.);
format START_DATE   END_DATE yymmdd10.;
cards;
1   1000          15-11-04     17-09-30
2   3000          16-11-29     17-12-15
3   5000          15-12-08     16-01-11
;
data want;
set have;
_month=intck('month',start_date,end_Date)+1; 
do year=year(start_date) to year(end_date);
if year=year(start_date) then months=intck('month',start_date,intnx('year',start_date,0,'e'))+1; 
else if year=year(end_date) then months=intck('month',intnx('year',end_date,0,'b'),end_date)+1; 
else months=12;
value=months/_month*TOTAL_VALUE;
output;
end;
keep id year value;
run;

 or

 

data want;
set have;
_month=intck('month',start_date,end_Date)+1; 
do year=year(start_date) to year(end_date);
if year=year(start_date) then months=12-month(start_date)+1; 
else if year=year(end_date) then months=month(end_date); 
else months=12;
value=months/_month*TOTAL_VALUE;
output;
end;
keep id year value;
run;

 

mkeintz
PROC Star

@novinosrin

 

Your code makes the logic clearer to the viewer than my dense single line.

 

But it assumes that all intervals are truly multi-year.  If the interval starts and ends in the same year and is less than 12 months, you'll need to make some changes.  Otherwise, the title of this discussion better be 100% correct.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mssun
Calcite | Level 5
Hi! Your solution works for multi-year situation, but doesn't work for duration less than one year. Sorry that I didn't make it clear in description.
I took a second thought on number of months - for example, if end date is Oct 1, the entire Oct would be taken into account in your code, but I excluded Oct when I eyeballed the data. That's why our results differ. Your logic would work perfectly if we both counted number of days.
Thank you for the answer!
Ksharp
Super User
data have;
input ID  TOTAL_VALUE   (START_DATE   END_DATE) (:yymmdd10.);
format START_DATE   END_DATE yymmdd10.;
cards;
1   1000          15-11-04     17-09-30
2   3000          16-11-29     17-12-15
3   5000          15-12-08     16-01-11
;
data temp;
 set have;
 range=end_date-start_date;
 do date=start_date to end_date;
  year=year(date);
  output;
 end;
 format date yymmdd10.;
run;
proc summary data=temp;
by id total_value range year;
output out=temp1;
run;
data want;
 set temp1;
 value=total_value*_freq_/range;
 keep id year value;
run;
mssun
Calcite | Level 5
Hi! I think your solution works too, but mkeintz answered earlier and his/her solution runs faster so I accepted that one. Thank you for answering!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2149 views
  • 1 like
  • 4 in conversation