Hi,
I'm trying to sum a field using a number of conditions and put the resultant figure into a date array. The data I have looks like this:
GL_No | CostCentre_Dtbs | Deal_ID | date | Billing_Period | InvoiceAmount1 |
xxxxx | CC_1 | AA11111 | 11/01/2018 | ANNUAL_November | 6000.00 |
xxxxx | CC_1 | AA11111 | 12/01/2019 | ANNUAL_November | 6000.00 |
xxxxx | CC_1 | BB11111 | 01/01/2018 | QUARTERLY_M-J-S-D | 1663.57 |
xxxxx | CC_1 | BB11111 | 02/01/2018 | QUARTERLY_M-J-S-D | -560.57 |
xxxxx | CC_3 | BB11111 | 03/01/2018 | QUARTERLY_M-J-S-D | 253.15 |
xxxxx | CC_3 | CC11111 | 05/01/2018 | QUARTERLY_M-J-S-D | 560.54 |
xxxxx | CC_3 | CC11111 | 06/01/2018 | QUARTERLY_M-J-S-D | 271.23 |
xxxxx | CC_3 | CC11111 | 07/01/2018 | QUARTERLY_M-J-S-D | 280.27 |
xxxxx | CC_3 | CC11111 | 08/01/2018 | QUARTERLY_M-J-S-D | 3271.23 |
xxxxx | CC_3 | CC11111 | 09/01/2018 | ANNUAL_February | 280.27 |
xxxxx | CC_3 | CC11111 | 10/01/2018 | ANNUAL_March | 280.27 |
xxxxx | CC_3 | CC11111 | 11/01/2018 | ANNUAL_April | 271.23 |
xxxxx | CC_3 | CC11111 | 12/01/2018 | ANNUAL_April | 280.27 |
If Deal_ID, GL_No, CostCentre_DTBS and Billing_Period are the same (in that order) I want to sum InvoiceAmount1 across these criteria and place the resultant figure in a date array as follows:
GL_No | CostCentre_Dtbs | Deal_ID | date | Billing_Period | __2018_1 | __2018_2 | __2018_3 | __2018_4 | __2018_5 | __2018_6 | __2018_7 | __2018_8 | __2018_9 | __2018_10 | __2018_11 | __2018_12 |
xxxxx | CC_1 | AA11111 | 11/01/2018 | ANNUAL_November | 6000.00 | |||||||||||
xxxxx | CC_1 | AA11111 | 12/01/2019 | ANNUAL_December | 6000.00 | |||||||||||
xxxxx | CC_1 | BB11111 | 01/01/2018 | QUARTERLY_M-J-S-D | 1663.57 | -560.57 | ||||||||||
xxxxx | CC_3 | BB11111 | 03/01/2018 | QUARTERLY_M-J-S-D | 253.15 | |||||||||||
xxxxx | CC_3 | CC11111 | 05/01/2018 | QUARTERLY_M-J-S-D | 560.54 | 271.23 | 280.27 | 3271.23 | ||||||||
xxxxx | CC_3 | CC11111 | 09/01/2018 | ANNUAL_September | 280.27 | 280.27 | ||||||||||
xxxxx | CC_3 | CC11111 | 11/01/2018 | ANNUAL_November | 271.23 | 280.27 |
The data itself runs to 2020 but I've kept it to 2018 here. I've written the following script however it doesn't seem to reset the values in the array as it should.
proc sort data=TrendForPlanv2; by Deal_ID Date GL_No CostCentre_DTBS Billing_Period; run;
data Want;
set Have;
array _2018 {12} _temporary_;
array _2019 {12} _temporary_;
array _2020 {12} _temporary_;
array __2018 {12} __2018_1-__2018_12;
array __2019 {12} __2019_1-__2019_12;
array __2020 {12} __2020_1-__2020_12;
do i=1 to 12;
if Month_No=i and Year=2018 then _2018{i}=InvoiceAmount1;
if Month_No=i and Year=2019 then _2019{i}=InvoiceAmount1;
if Month_No=i and Year=2020 then _2020{i}=InvoiceAmount1;
end;
retain
__2018_1 __2018_2 __2018_3 __2018_4 __2018_5 __2018_6 __2018_7 __2018_8 __2018_9 __2018_10 __2018_11 __2018_12
__2019_1 __2019_2 __2019_3 __2019_4 __2019_5 __2019_6 __2019_7 __2019_8 __2019_9 __2019_10 __2019_11 __2019_12
__2020_1 __2020_2 __2020_3 __2020_4 __2020_5 __2020_6 __2020_7 __2020_8 __2020_9 __2020_10 __2020_11 __2020_12;
by Deal_ID Date GL_No CostCentre_DTBS Billing_Period;
do i=1 to 12;
if first.Billing_Period then __2018{i} = 0;
__2018{i} + _2018{i};
end;
do i=1 to 12;
if first.Billing_Period then __2019{i} = 0;
__2019{i} + _2019{i};
end;
do i=1 to 12;
if first.Billing_Period then __2020{i} = 0;
__2020{i} + _2020{i};
end;
run;
I've attached a sample of data. Can anyone help?
Thanks,
Gary
There are cosmetic differences between this output and what was originally requested, all of these cosmetic differences can be fixed if they are important.
ods excel file="myfilename.xlsx";
proc report data=have;
columns gl_no CostCentre_Dtbs deal_id billing_period date,invoiceamount1;
define gl_no/group;
define CostCentre_Dtbs/group;
define deal_id/group;
define billing_period/group;
define date/across format=yymmd7.;
define invoiceamount1/sum ' ';
run;
ods excel close;
Summing by month is very easy in PROC SUMMARY, and very difficult if you absolutely have to have columns named __2018_1, __2018_2, etc.
Do you need these columns as a report? In this case it's very simple in PROC REPORT.
So PROC REPORT gets you exactly what you want, relatively easily, without any arrays and without having difficult coding you are showing. (And extra bonus, if you have to run the report again next month or next year, you don't have to change the code!)
If you can provide (a portion of) the data as SAS data step code, I can provide the PROC REPORT code needed.
Important concept: If for any reason, you think you need to put data into a variable name (data such as month, year, state, country, etc.) stop! Don't do it! This is almost never a good idea! SAS has already done the hard work to code this into procedures such as PROC REPORT and PROC TRANSPOSE and many others, and so you don't have to create this difficult code.
/* There must be a typo at 2nd observation for variable date and Billing_Period in initial dataset. */
/* Please check. */
/* Based on my assumption , below is my code. */
data have;
infile datalines dlm="09"x;
input GL_No $ CostCentre_Dtbs $ Deal_ID $ date:mmddyy10. Billing_Period:$15. InvoiceAmount1:9.2;
format date mmddyy10.;
datalines;
xxxxx CC_1 AA11111 11/01/2018 ANNUAL_November 6000.00
xxxxx CC_1 AA11111 12/01/2018 ANNUAL_December 6000.00
xxxxx CC_1 BB11111 01/01/2018 QUARTERLY_M-J-S-D 1663.57
xxxxx CC_1 BB11111 02/01/2018 QUARTERLY_M-J-S-D -560.57
xxxxx CC_3 BB11111 03/01/2018 QUARTERLY_M-J-S-D 253.15
xxxxx CC_3 CC11111 05/01/2018 QUARTERLY_M-J-S-D 560.54
xxxxx CC_3 CC11111 06/01/2018 QUARTERLY_M-J-S-D 271.23
xxxxx CC_3 CC11111 07/01/2018 QUARTERLY_M-J-S-D 280.27
xxxxx CC_3 CC11111 08/01/2018 QUARTERLY_M-J-S-D 3271.23
xxxxx CC_3 CC11111 09/01/2018 ANNUAL_February 280.27
xxxxx CC_3 CC11111 10/01/2018 ANNUAL_March 280.27
xxxxx CC_3 CC11111 11/01/2018 ANNUAL_April 271.23
xxxxx CC_3 CC11111 12/01/2018 ANNUAL_April 280.27
;
run;
/* generate date labels as dataset */
data date_str;
startdate=input("01/2018", anydtdte.);
do i=0 to 11;
interval=intnx('month', startdate, i);
_interval=put(interval, mmyyd.);
output;
end;
run;
/* generate statement macro variables */
proc sql noprint;
select distinct cat("MON", put(i+1, z2.), "=", "'", _interval, "'")
into:loop_datestr separated by " " from date_str;
quit;
/* debug in case */
/* options mprint; */
data want;
do until (last.billing_period);
set have;
by CostCentre_Dtbs Deal_ID Billing_Period notsorted;
if first.billing_period then
do;
_date=date;
end;
label &loop_datestr;
array M[12] mon01-mon12;
startdate=input("01/01/2018", mmddyy10.);
do i=0 to 11;
interval=intnx('month', startdate, i);
if interval=date then
do;
M[i+1]=InvoiceAmount1;
end;
end;
if last.Billing_Period then
do;
output;
end;
format _date mmddyy10.;
drop date startdate i interval InvoiceAmount1;
rename _date=date;
end;
run;
/* example of using proc transpose */
data have;
infile datalines dlm="09"x;
input GL_No $ CostCentre_Dtbs $ Deal_ID $ date:mmddyy10. Billing_Period:$15. InvoiceAmount1:9.2;
format date mmddyy10.;
datalines;
xxxxx CC_1 AA11111 11/01/2018 ANNUAL_November 6000.00
xxxxx CC_1 AA11111 12/01/2018 ANNUAL_December 6000.00
xxxxx CC_1 BB11111 01/01/2018 QUARTERLY_M-J-S-D 1663.57
xxxxx CC_1 BB11111 02/01/2018 QUARTERLY_M-J-S-D -560.57
xxxxx CC_3 BB11111 03/01/2018 QUARTERLY_M-J-S-D 253.15
xxxxx CC_3 CC11111 05/01/2018 QUARTERLY_M-J-S-D 560.54
xxxxx CC_3 CC11111 06/01/2018 QUARTERLY_M-J-S-D 271.23
xxxxx CC_3 CC11111 07/01/2018 QUARTERLY_M-J-S-D 280.27
xxxxx CC_3 CC11111 08/01/2018 QUARTERLY_M-J-S-D 3271.23
xxxxx CC_3 CC11111 09/01/2018 ANNUAL_February 280.27
xxxxx CC_3 CC11111 10/01/2018 ANNUAL_March 280.27
xxxxx CC_3 CC11111 11/01/2018 ANNUAL_April 271.23
xxxxx CC_3 CC11111 12/01/2018 ANNUAL_April 280.27
;
run;
data have;
set have;
_date=put(date, mmyyd.);
run;
proc sort data=have out=_have;
by _date;
run;
proc transpose data=_have prefix=_ out=want(drop=_NAME_);
by GL_NO CostCentre_Dtbs Deal_ID Billing_Period date notsorted;
var InvoiceAmount1;
id _date;
run;
Nice job, @hhinohar
However, I still content that's an awful lot of programming compared to PROC REPORT.
Indeed!
I'm looking forward to Proc report sample!
There are cosmetic differences between this output and what was originally requested, all of these cosmetic differences can be fixed if they are important.
ods excel file="myfilename.xlsx";
proc report data=have;
columns gl_no CostCentre_Dtbs deal_id billing_period date,invoiceamount1;
define gl_no/group;
define CostCentre_Dtbs/group;
define deal_id/group;
define billing_period/group;
define date/across format=yymmd7.;
define invoiceamount1/sum ' ';
run;
ods excel close;
Modification to sort by Deal_ID and GL_NO
columns deal_id gl_no CostCentre_Dtbs billing_period date,invoiceamount1;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.