Hello ,
I am trying to calculate the total amount that we have received from a customer at different points of time. But, I am not sure if there is an easy way to do it.
Let me share my data ;
ID | Period | Amount |
123 | Jan-17 | 10 |
123 | Feb-17 | 20 |
123 | Mar-17 | 30 |
123 | Apr-17 | 40 |
123 | May-17 | 50 |
123 | Jun-17 | 60 |
123 | Jul-17 | 70 |
123 | Aug-17 | 80 |
For example I want to calculate how much the customer (ID=123) has paid since Feb-17 until Aug-17, so that will gives me this
Checkpoint | ID | Period | Amount |
Feb-17 | 123 | Feb-17 | 20 |
Feb-17 | 123 | Mar-17 | 30 |
Feb-17 | 123 | Apr-17 | 40 |
Feb-17 | 123 | May-17 | 50 |
Feb-17 | 123 | Jun-17 | 60 |
Feb-17 | 123 | Jul-17 | 70 |
Feb-17 | 123 | Aug-17 | 80 |
And I would like to repeat that for all these months ; Hence my final data set will look like that
CheckPoint | ID | Period | Amount |
Feb-17 | 123 | Feb-17 | 20 |
Feb-17 | 123 | Mar-17 | 30 |
Feb-17 | 123 | Apr-17 | 40 |
Feb-17 | 123 | May-17 | 50 |
Feb-17 | 123 | Jun-17 | 60 |
Feb-17 | 123 | Jul-17 | 70 |
Feb-17 | 123 | Aug-17 | 80 |
Mar-17 | 123 | Mar-17 | 30 |
Mar-17 | 123 | Apr-17 | 40 |
Mar-17 | 123 | May-17 | 50 |
Mar-17 | 123 | Jun-17 | 60 |
Mar-17 | 123 | Jul-17 | 70 |
Mar-17 | 123 | Aug-17 | 80 |
Apr-17 | 123 | Apr-17 | 40 |
Apr-17 | 123 | May-17 | 50 |
Apr-17 | 123 | Jun-17 | 60 |
Apr-17 | 123 | Jul-17 | 70 |
Apr-17 | 123 | Aug-17 | 80 |
May-17 | 123 | May-17 | 50 |
May-17 | 123 | Jun-17 | 60 |
May-17 | 123 | Jul-17 | 70 |
May-17 | 123 | Aug-17 | 80 |
Jun-17 | 123 | Jun-17 | 60 |
Jun-17 | 123 | Jul-17 | 70 |
Jun-17 | 123 | Aug-17 | 80 |
Jul-17 | 123 | Jul-17 | 70 |
Jul-17 | 123 | Aug-17 | 80 |
Aug-17 | 123 | Aug-17 | 80 |
The way I have done it is ;
Data feb17;
set have(keep= id period);
where period >=Feb-17;
checkpoint=Feb-17;
run;
Data t1;
merge feb17( in=a)
have (in=b);
by id period;
if a;
run;
But this code takes ages to run and creates a huge final data set.
I was wondering if there is another way to do it ?
Thank you in advance
@Adam_ welcome to the SAS community 🙂 Is Period a numeric or character variable?
period is numeric. It is actually a date formatted as yymmn6.
But if that is a problem I can easily convert it to a character.
You can use proc summary as in:
proc summary data=have(where=(period > '01FEB2017'd );
class id period;
format period yymmn6.; /* or yymms7. */
var amount;
output out=want sum=;
run;
Hello Shmuel ,
Thanks for your reply.
That won't work for my case. Because I won't be able to see how much the customer has paid each month.
And I will I have to repeat it , several times and then to merge all the outputs
But this code takes ages to run and creates a huge final data set.
Well, that's not surprising if the real data set (not the example you show) is large.
I'm really wondering why you need the final output in the form you show it in, what do you plan to do with it in that final output form? I'mg guessing that there might be another way to get what you want without creating such a huge data set.
Hello Paige,
Indeed , my data set contains more than 7m of records.
I don't need my final output to have this output.
I have this data set
Account | Period | payment |
123 | Jan-17 | 10 |
123 | Feb-17 | 20 |
123 | Mar-17 | 30 |
123 | Apr-17 | 40 |
123 | May-17 | 50 |
123 | Jun-17 | 60 |
123 | Jul-17 | 70 |
123 | Aug-17 | 80 |
and what I really want to do is
Checkpoint Customer Month1 Month2 Month3
Feb-17 123 20 30 40
Mar-17 123 30 40 50
Where month1 - monthN is the difference between the checkpoint and the period
I wasn;t sure how to do it ; So I have created that huge data set and then I have used the proc means (Class Statement has the Checkpoint)
I was wondering if I can use arrays to calculate this?
any ideas?
@Adam_ wrote:
Hello Paige,
Indeed , my data set contains more than 7m of records.
I don't need my final output to have this output.
I have this data set
Account Period payment 123 Jan-17 10 123 Feb-17 20 123 Mar-17 30 123 Apr-17 40 123 May-17 50 123 Jun-17 60 123 Jul-17 70 123 Aug-17 80
and what I really want to do is
Checkpoint Customer Month1 Month2 Month3
Feb-17 123 20 30 40
Mar-17 123 30 40 50
Where month1 - monthN is the difference between the checkpoint and the period
I wasn;t sure how to do it ; So I have created that huge data set and then I have used the proc means (Class Statement has the Checkpoint)
This still doesn't answer WHY you need the data in a different form, and it doesn't answer what analysis you intend to do with it in this form that you can't do in its original form. I ask because there may be other ways to get to the desired analysis without doing this transformation of the data.
I don't think ARRAYs work given your original data set.
@Adam_ wrote:
For any given month, let's say Jan - 00, to track all the monthly payments (by using the data set that i have shown to you) that I have received from that point until today. And I have to repeat that for all months from Jan-00 until today.
Well, either I'm lost, or your explanation is lacking some key detail, and maybe it's best to stop here, but I'll try one more time.
What benefit do you gain by performing this rearrangement? What additional analyses are going to happen once you have something to work with?
And why can't you track the monthly payments with the data set in its original form? What do you even mean by "track" (because my understanding of "track" is that I can "track" the payments with the data in its original form)?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.