Hi,
I am trying to calculate cumulative total using proc sql. I know how to do in a data step but i am trying to understand sas better and wanted to try using proc sql
data is this. I want a cumulative column for the monthly_pmt variable by ID. I am just testing this code for one ID right now
ID | ENT_SEQ_NO | ACTV_DATE | Monthly_pmt | month | count |
759966427 | 988522250 | 07JUL2019:00:00:00 | 150 | 7 | 1 |
759966427 | 988522251 | 02AUG2019:00:00:00 | 150 | 8 | 2 |
759966427 | 988522252 | 06SEP2019:00:00:00 | 150 | 9 | 3 |
759966427 | 988522253 | 05OCT2019:00:00:00 | 100 | 10 | 4 |
759966427 | 988522254 | 08NOV2019:00:00:00 | 400 | 11 | 5 |
759966427 | 988522256 | 21DEC2019:00:00:00 | 200 | 12 | 6 |
I wrote this but i am getting duplication
PROC SQL;
CREATE TABLE CUMULATIVE AS
SELECT a.*, SUM(b.MONTHLY_PMT) AS CUML
FROM TEST3 A INNER JOIN TEST3 B
ON A.ID= B.ID and b.count <= a.count
GROUP BY A.ID,a.ent_seq_no,a.actv_date
ORDER BY A.ID,a.ent_seq_no,a.actv_date
;
QUIT;
result is:
ID | ENT_SEQ_NO | ACTV_DATE | Monthly_pmt | month | count | CUML |
759966427 | 988522250 | 07JUL2019:00:00:00 | 150 | 7 | 1 | 150 |
759966427 | 988522251 | 02AUG2019:00:00:00 | 150 | 8 | 2 | 300 |
759966427 | 988522251 | 02AUG2019:00:00:00 | 150 | 8 | 2 | 300 |
759966427 | 988522252 | 06SEP2019:00:00:00 | 150 | 9 | 3 | 450 |
759966427 | 988522252 | 06SEP2019:00:00:00 | 150 | 9 | 3 | 450 |
759966427 | 988522252 | 06SEP2019:00:00:00 | 150 | 9 | 3 | 450 |
759966427 | 988522253 | 05OCT2019:00:00:00 | 100 | 10 | 4 | 550 |
759966427 | 988522253 | 05OCT2019:00:00:00 | 100 | 10 | 4 | 550 |
759966427 | 988522253 | 05OCT2019:00:00:00 | 100 | 10 | 4 | 550 |
759966427 | 988522253 | 05OCT2019:00:00:00 | 100 | 10 | 4 | 550 |
759966427 | 988522254 | 08NOV2019:00:00:00 | 400 | 11 | 5 | 950 |
759966427 | 988522254 | 08NOV2019:00:00:00 | 400 | 11 | 5 | 950 |
759966427 | 988522254 | 08NOV2019:00:00:00 | 400 | 11 | 5 | 950 |
759966427 | 988522254 | 08NOV2019:00:00:00 | 400 | 11 | 5 | 950 |
759966427 | 988522254 | 08NOV2019:00:00:00 | 400 | 11 | 5 | 950 |
759966427 | 988522256 | 21DEC2019:00:00:00 | 200 | 12 | 6 | 1150 |
759966427 | 988522256 | 21DEC2019:00:00:00 | 200 | 12 | 6 | 1150 |
759966427 | 988522256 | 21DEC2019:00:00:00 | 200 | 12 | 6 | 1150 |
759966427 | 988522256 | 21DEC2019:00:00:00 | 200 | 12 | 6 | 1150 |
759966427 | 988522256 | 21DEC2019:00:00:00 | 200 | 12 | 6 | 1150 |
759966427 | 988522256 | 21DEC2019:00:00:00 | 200 | 12 | 6 | 1150 |
I got around it by modifying to ask for distinct a.* in the above query but is this the only way? why is it doing this?
HI @TheNovice It's simple. Your GROUP BY is not complete.
You could include A.* in group by to not allow PROC SQL to REMERGE
So,
GROUP BY A.ID,a.ent_seq_no,a.actv_date,a.monthly_pmt,a.month,a.count
should solve the problem
Full Version:
PROC SQL;
CREATE TABLE CUMULATIVE AS
SELECT a.*, SUM(b.MONTHLY_PMT) AS CUML
FROM have A INNER JOIN have B
ON A.ID= B.ID and b.count <= a.count
GROUP BY A.ID,a.ent_seq_no,a.actv_date,a.monthly_pmt,a.month,a.count
ORDER BY A.ID,a.ent_seq_no,a.actv_date
;
QUIT;
HI @TheNovice It's simple. Your GROUP BY is not complete.
You could include A.* in group by to not allow PROC SQL to REMERGE
So,
GROUP BY A.ID,a.ent_seq_no,a.actv_date,a.monthly_pmt,a.month,a.count
should solve the problem
Full Version:
PROC SQL;
CREATE TABLE CUMULATIVE AS
SELECT a.*, SUM(b.MONTHLY_PMT) AS CUML
FROM have A INNER JOIN have B
ON A.ID= B.ID and b.count <= a.count
GROUP BY A.ID,a.ent_seq_no,a.actv_date,a.monthly_pmt,a.month,a.count
ORDER BY A.ID,a.ent_seq_no,a.actv_date
;
QUIT;
yes, that makes so much sense.... thank you so much. it worked!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.