Hi,
I am trying to calculate cumulative total using proc sql. I know how to do in a data step but i am trying to understand sas better and wanted to try using proc sql
data is this. I want a cumulative column for the monthly_pmt variable by ID. I am just testing this code for one ID right now
ID | ENT_SEQ_NO | ACTV_DATE | Monthly_pmt | month | count |
759966427 | 988522250 | 07JUL2019:00:00:00 | 150 | 7 | 1 |
759966427 | 988522251 | 02AUG2019:00:00:00 | 150 | 8 | 2 |
759966427 | 988522252 | 06SEP2019:00:00:00 | 150 | 9 | 3 |
759966427 | 988522253 | 05OCT2019:00:00:00 | 100 | 10 | 4 |
759966427 | 988522254 | 08NOV2019:00:00:00 | 400 | 11 | 5 |
759966427 | 988522256 | 21DEC2019:00:00:00 | 200 | 12 | 6 |
I wrote this but i am getting duplication
PROC SQL;
CREATE TABLE CUMULATIVE AS
SELECT a.*, SUM(b.MONTHLY_PMT) AS CUML
FROM TEST3 A INNER JOIN TEST3 B
ON A.ID= B.ID and b.count <= a.count
GROUP BY A.ID,a.ent_seq_no,a.actv_date
ORDER BY A.ID,a.ent_seq_no,a.actv_date
;
QUIT;
result is:
ID | ENT_SEQ_NO | ACTV_DATE | Monthly_pmt | month | count | CUML |
759966427 | 988522250 | 07JUL2019:00:00:00 | 150 | 7 | 1 | 150 |
759966427 | 988522251 | 02AUG2019:00:00:00 | 150 | 8 | 2 | 300 |
759966427 | 988522251 | 02AUG2019:00:00:00 | 150 | 8 | 2 | 300 |
759966427 | 988522252 | 06SEP2019:00:00:00 | 150 | 9 | 3 | 450 |
759966427 | 988522252 | 06SEP2019:00:00:00 | 150 | 9 | 3 | 450 |
759966427 | 988522252 | 06SEP2019:00:00:00 | 150 | 9 | 3 | 450 |
759966427 | 988522253 | 05OCT2019:00:00:00 | 100 | 10 | 4 | 550 |
759966427 | 988522253 | 05OCT2019:00:00:00 | 100 | 10 | 4 | 550 |
759966427 | 988522253 | 05OCT2019:00:00:00 | 100 | 10 | 4 | 550 |
759966427 | 988522253 | 05OCT2019:00:00:00 | 100 | 10 | 4 | 550 |
759966427 | 988522254 | 08NOV2019:00:00:00 | 400 | 11 | 5 | 950 |
759966427 | 988522254 | 08NOV2019:00:00:00 | 400 | 11 | 5 | 950 |
759966427 | 988522254 | 08NOV2019:00:00:00 | 400 | 11 | 5 | 950 |
759966427 | 988522254 | 08NOV2019:00:00:00 | 400 | 11 | 5 | 950 |
759966427 | 988522254 | 08NOV2019:00:00:00 | 400 | 11 | 5 | 950 |
759966427 | 988522256 | 21DEC2019:00:00:00 | 200 | 12 | 6 | 1150 |
759966427 | 988522256 | 21DEC2019:00:00:00 | 200 | 12 | 6 | 1150 |
759966427 | 988522256 | 21DEC2019:00:00:00 | 200 | 12 | 6 | 1150 |
759966427 | 988522256 | 21DEC2019:00:00:00 | 200 | 12 | 6 | 1150 |
759966427 | 988522256 | 21DEC2019:00:00:00 | 200 | 12 | 6 | 1150 |
759966427 | 988522256 | 21DEC2019:00:00:00 | 200 | 12 | 6 | 1150 |
I got around it by modifying to ask for distinct a.* in the above query but is this the only way? why is it doing this?
HI @TheNovice It's simple. Your GROUP BY is not complete.
You could include A.* in group by to not allow PROC SQL to REMERGE
So,
GROUP BY A.ID,a.ent_seq_no,a.actv_date,a.monthly_pmt,a.month,a.count
should solve the problem
Full Version:
PROC SQL;
CREATE TABLE CUMULATIVE AS
SELECT a.*, SUM(b.MONTHLY_PMT) AS CUML
FROM have A INNER JOIN have B
ON A.ID= B.ID and b.count <= a.count
GROUP BY A.ID,a.ent_seq_no,a.actv_date,a.monthly_pmt,a.month,a.count
ORDER BY A.ID,a.ent_seq_no,a.actv_date
;
QUIT;
HI @TheNovice It's simple. Your GROUP BY is not complete.
You could include A.* in group by to not allow PROC SQL to REMERGE
So,
GROUP BY A.ID,a.ent_seq_no,a.actv_date,a.monthly_pmt,a.month,a.count
should solve the problem
Full Version:
PROC SQL;
CREATE TABLE CUMULATIVE AS
SELECT a.*, SUM(b.MONTHLY_PMT) AS CUML
FROM have A INNER JOIN have B
ON A.ID= B.ID and b.count <= a.count
GROUP BY A.ID,a.ent_seq_no,a.actv_date,a.monthly_pmt,a.month,a.count
ORDER BY A.ID,a.ent_seq_no,a.actv_date
;
QUIT;
yes, that makes so much sense.... thank you so much. it worked!
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: