BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
twildone
Pyrite | Level 9

Hi Tom. I took the data and re-created the table in excel to get the values and formulas to derive the calculated fields. I needed to create helper fields such as NumberDays1 and BegAmount1 to get the final results. In the code below, the grouping is necessary so that the calculated NumberDays1 stops decreasing when their is a change in NumberDays as this distiguishes the different froup within ID. Thanks. 

 

 
data have;
    format ID ShowAmount NumberDays NumberDaysMonth NumberDaysBeg NumberDaysEnd BegAmount best12.;
    infile datalines4 dlm='7F'x missover dsd;
    input ID ShowAmount NumberDays NumberDaysMonth NumberDaysBeg NumberDaysEnd BegAmount : best32.;
datalines4;
152 . 36 28 0 28 3650
152 . 36 8 0 8 36500
152 . 85 14 0 14 4100
152 . 85 31 0 31 4100
152 700 85 30 10 20 4100
152 . 85 10 0 10 41000
152 . 71 19 0 19 4650
152 450 71 29 28 1 4650
152 . 71 23 0 23 4650
152 . 85 25 0 25 4650
152 . 85 36 0 31 4650
152 . 85 29 0 29 4650
;;;;
data have;
	set have;
		if ShowAmount = .
			then ShowAmount = 0;
run;

data want;
	set have;
by ID NumberDays notsorted;
if first.ID then do;
BegAmount1 = BegAmount;
EndAmount = (BegAmount + ShowAmount);
NumberDays1 = NumberDays;
MonthlyAmount = (((NumberDaysBeg/NumberDays1)*BegAmount1) + ((NumberDaysEnd/NumberDays1)*EndAmount));
EndAmount1 = (EndAmount - MonthlyAmount);
end;
if not first.ID then do;
BegAmount1 = BegAmount - lag(BegAmount) + lag(EndAmount1);
EndAmount = (BegAmount - lag(BegAmount) + ShowAmount + lag(EndAmount1));
NumberDays1 = lag(NumberDays1) - lag(NumberDaysMonth);
MonthlyAmount = (((NumberDaysBeg/NumberDays1)*BegAmount1) + ((NumberDaysEnd/NumberDays1)*EndAmount));
EndAount1 = (EndAmount - MonthlyAmount);
end;
run;

 

 Want:

ID	ShowAmount	NumberDays	NumberDays1	NumberDaysMonth	NumberDaysBeg	NumberDaysEnd	BegAmount	BegAmount1	EndAmount	MonthlyAmount	EndAmount1
152	0	36	36	28	0	28	3650	3650	3650	2839	811
152	0	36	8	8	0	8	3650	811	811	811	0
152	0	85	85	14	0	14	4100	450	450	74	376
152	0	85	71	31	0	31	4100	376	376	164	212
152	700	85	40	30	10	20	4100	212	912	509	403
152	0	85	10	10	0	10	4100	403	403	403	0
152	0	71	71	19	0	19	4650	550	550	147	403
152	450	71	52	29	28	1	4650	403	853	233	620
152	0	71	23	23	0	23	4650	620	620	620	0
152	0	85	85	25	0	25	4650	0	0	0	0
152	0	85	60	31	0	31	4650	0	0	0	0
152	0	85	29	29	0	29	4650	0	0	0	0

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If every other value is missing the that means the values the variable had when you called it also was alternating between missing and non-missing.

Here is simple example of how LAG() works.

71    data test;
72      set sashelp.class (obs=4);
73      lag_name=lag(name);
74      put (_n_ name lag_name) (=);
75    run;

_N_=1 Name=Alfred lag_name=
_N_=2 Name=Alice lag_name=Alfred
_N_=3 Name=Barbara lag_name=Alice
_N_=4 Name=Carol lag_name=Barbara

On first iteration (_N_=1) the value of ALFRED is saved and missing is returned.

Then on second (_N_=2) the value of ALICE is saves and ALFRED is returned.

etc.

View solution in original post

17 REPLIES 17
Quentin
Super User

I'm not understanding what you're trying to do, but I suspect part of the problem is that you are reading a variable TOTAL from have, and also trying to create a variable TOTAL in the datastep and use it as an accumulator (retaining across observations and incrementing it). That won't work, because every time you read a record from HAVE, it will over-write your retained value. You're probably better off creating a new variable TOTAL2 to use as an accumulator, to avoid the collision.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
twildone
Pyrite | Level 9
Hi Quentin....thanks for your help. Your suggestion was very helpful and the missing entries were causing a problem when summing data. I noticed some missing information on the posted data.....my apology for the confusion. I am trying to edit the original post but can't seem the find the wheel or anywhere that would allow me to edit the post....how can I edit my original post....thanks
Tom
Super User Tom
Super User

To edit look for the vertical ... under your user icon next to the top of the message.  Clicking on that icon opens the menu that has Edit.

twildone
Pyrite | Level 9
Thanks Tom....the vertical.... doesn't display next to the message but by clicking next o the message a menu opens up.....not sure why the vertical....doesn't display but I was able to edit the original post...thanks.
Tom
Super User Tom
Super User

I don't follow the logic.  Perhaps I am just not able to figure out what is going on because I am distracted by not understanding the data structure.

Are you trying to use BY ID NUMBERDAYS to mean that the first two rows are both in the same group somehow? A group that lasts 38 days. 28 for the first observation an 8 for the second.  How does that impact the logic of the calculations? Or does it? You have a BY statement but you aren't referencing and of the FIRST. or LAST. variables that SAS will generate because of the BY statement.

 

Also why do you have BY ID when there is only one value of ID in the example data?  Does your real data have multiple values of ID?

 

twildone
Pyrite | Level 9

Hi Tom....Yes you are right. The first two rows are in the same group that last 36 days (28 + 8). To explain the logic, for the first row in this group, the BegAmount is 3650. Since there is no new revenue for ShowAmount, that would mean that the EndAmount  is 3650 as well. The calculation of MonthlyAmount based on the BegAmount and EndAmount is 2839 and is calculated as a weighted average of the BegAmount and EndAmount values. This would mean that the new EndAmount after the calculation of the MonthlyAmount is actually 811 (3650 - 2839) which is EndAmount1.
For the second row, BegAmount becomes EndAmount1 which is 811 as this is a carryover from the previous row. Since their is no new revenue from Show Amount to add to obtain the EndAmount, the EndAmount = 811. Once again, the MonthlyAmount is calculated as 811. This should return a result of EndAmount1 = 0 (811 - 811), that is, BegAmount minus Monthly Amount.

The EndAmount is needed to calculate the MonthlyAmount before EndAmount1 is calculated and carried over to the next row.

Yes the real data has multiple ID's is why I have a by statement. Hopefully this explains the logic.

Tom
Super User Tom
Super User

What is SHOWAMOUNT?  Is that a deduction from the account? The balance?  Why does it appear on one of the middle months in a group instead of at the start or end of a group of months?

twildone
Pyrite | Level 9

Hi Tom.....ShowAmount is actually revenue received during that particular Month. For example, if you look at the row where ShowAmount = 700. For this particular row, the NumberDaysBeg =10 and NumberDaysEnd =20, that would mean that ShowAmount of 700 was received 10 days into the month which would suggest that it was available for last 20 days of the 30 day month. Because the BegAmount for that row is 0, then calculation for the MonthlyAmount is 467 (((10/30)*(0))+((20/30)*(700))).

 

The BegAmount will always be the EndAmount1 from the previous row as the Beginning Balance in the current row will always be the Ending Balance from the previous row.

 

The ShowAmount can be both positive or negative due to revenue or cost. By adding the ShowAmount to the BegAmount to get the EndAmount and then calculate the MonthlyAmount, that should reflect the actual amount of revenue to allocate for that particular month.

 

The reason why ShowAmounts appear in the middle of the group is because it was actually received in that month. For example, for the first row, the BegAmount is 3650. That means that 3650 was received before it was actually earned. For that particular row or month, services was delivered for 28 days out of a total of 36 days to complete to services to be delivered. It is possible to receive or even refund funds anytime  during the time period to deliver the services. The allocation of funds to calculate MonthlyAmount is based on when funds have been received and earned.

Quentin
Super User

I suggest you focus on one variable at a time.  Your code is still a bit confusing.  You have a BY statement, but it is doing nothing, because you're not using first. and last. variables.  It sounds like you're trying to accumulate something (making a running total of some sort), but nothing is being retained.  

 

If you focus on your first variable, EndAmount, your code is essentially:

 

data want;
  set have;
  EndAmount = (BegAmount+ShowAmount);	
run;

So EndAmount is always being calculated by adding together two values you read from HAVE.  Which isn't what you want, is it?  

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
twildone
Pyrite | Level 9

EndAmount is calculated by adding together two values (BegAmount & ShowAmount)that is read in from HAVE for the first row then EndAmount = previous row's EndAmount plus the current row's ShowAmount. The reason for calculating EndAmount1 was to get the new End Balance to carry it forward to the next row as the BegAmount so that EndAmount would continue to be calculated as BegAmount & ShowAmount for that same row as well. I am wondering if doing the calculations for the first row then a loop for the remaining rows as recalculations of the variables are needed.

Tom
Super User Tom
Super User

So you have this data:

 

data have;
  input id days daysmonth beg add ;
  add=sum(add,0);
cards;
152 36 28 3650 .
152 36  8 3650 .
152 85 14 4100 .
152 85 31 4100 .
152 85 30 4100 700
152 85 10 4100 .
152 71 19 4650 .
152 71 29 4650 450
152 71 23 4650 .
152 85 25 4650 .
152 85 36 4650 .
152 85 29 4650 .
;

Sounds like for the first group of two months you have decided that the balance should go down from 3,650 at the start of the first month to 0 at the end of the second month and you want to divide that into two pieces that is proportional to the number of days that month has out of the whole period.

 

But then on the start of the second group the value has jumped back up to 4,100.  Where did that come from? Why is that not reflected in your expected output?

twildone
Pyrite | Level 9

Hi Tom,

 

The 4100 and the rest of the values in the BegAmount are a running total for the balance at the start of each project to account for revenue received between projects. I revised the original post and re-create the expected table in excel and incorporated the same formulas into the data step....Thanks

Tom
Super User Tom
Super User

Do not put LAG() function calls inside of conditional execution.  That messes up the list of values that it "lags".  LAG() does not know anything about observations. It just returns out the previous value you passed in.  If you must use LAG() and only want to used the lagged value sometimes then make sure to execute the LAG() function everytime and then reference the generated value conditionally.

 

You can also use RETAIN to keep values into the next observation rather than trying to use lag to look backwards. (Only use RETAIN for variables that are being created in this data step.  Don't try to retain variables that are coming in from the input dataset because reading the next observation will overwrite the value.)

 

It can also help to OUTPUT before you are done calculating the things you want to RETAIN.

 

 

twildone
Pyrite | Level 9

Hi Tom.....thanks for suggestion about the lag.....that was very helpful. I broke the data step into two data steps. The first to get the first row and starting entries which works fine. The second data step, I defined the lag variables outside the conditional execution and I am getting results for every second row. Is using lag's causing this? Thanks.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 8666 views
  • 5 likes
  • 3 in conversation