I have a table
Group | Revenue |
A | 100 |
A | 50 |
B | 200 |
B | 100 |
I want to sum aggregation by proc sql, like:
Group | Total Revenue |
A | 150 |
B | 300 |
proc sql;
select
Group, sum(Revenue) as Total_Revenue
from mytable group by Group
;quit;
However, it produces duplicates records:
Group | Total Revenue |
A | 150 |
A | 150 |
B | 300 |
B | 300 |
Of course, there are couple of solution like NOMERGE SQL (not always work) or SORT NODUPKEY. But I wonder any efficient and direct way, something like:
proc sql;
select
first(Group), # Take the first name record in the Group
sum(Revenue) as Total_Revenue
from mytable group by Group
;quit;
Either your GROUP variable values are not exact because of leading spaces or invisible characters, but this should work as expected. What do you get if you run the following on the same data set. Please show the log from your original code and this.
proc freq data=mytable;
table group;
weight revenue;
run;
Note that I cannot replicate your issue either:
data have;
infile cards dlm=',';
input Group $ Revenue;
cards;
A, 100
A, 50
B, 200
B, 100
;;;;
proc sql;
create table want as
select
Group, sum(Revenue) as Total_Revenue
from have group by Group
;quit;
proc print data=want;run;
This generates the table shown in your original post.
Log:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 68 69 data have; 70 infile cards dlm=','; 71 input Group $Revenue; 72 cards; NOTE: The data set WORK.HAVE has 4 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 524.12k OS Memory 24740.00k Timestamp 05/31/2021 05:23:57 PM Step Count 45 Switch Count 2 Page Faults 0 Page Reclaims 126 Page Swaps 0 Voluntary Context Switches 10 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 264 77 ;;;; 78 79 proc sql; 80 create table want as 81 select 82 Group, sum(Revenue) as Total_Revenue 83 from have group by Group 84 ; NOTE: Table WORK.WANT created, with 2 rows and 2 columns. 84 ! quit; NOTE: PROCEDURE SQL used (Total process time): real time 0.00 seconds user cpu time 0.01 seconds system cpu time 0.00 seconds memory 5433.81k OS Memory 29864.00k Timestamp 05/31/2021 05:23:57 PM Step Count 46 Switch Count 2 Page Faults 0 Page Reclaims 219 Page Swaps 0 Voluntary Context Switches 10 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 272
@vietlinh12hoa wrote:
I have a table
Group Revenue A 100 A 50 B 200 B 100
I want to sum aggregation by proc sql, like:
Group Total Revenue A 150 B 300
proc sql; select Group, sum(Revenue) as Total_Revenue from mytable group by Group ;quit;
However, it produces duplicates records:
Group Total Revenue A 150 A 150 B 300 B 300
Of course, there are couple of solution like NOMERGE SQL (not always work) or SORT NODUPKEY. But I wonder any efficient and direct way, something like:
proc sql; select first(Group), # Take the first name record in the Group sum(Revenue) as Total_Revenue from mytable group by Group ;quit;
Ok Thanks. I check and so it apparently had the space in Group. But anyway, any chance I can use first name (or last name) observation of Group (in case all Group values are not perfectly matched)?
Just for fun, you can have both situations generated in one data step 😉
data have;
infile cards dlm=',';
input Group $ Revenue;
cards;
A, 100
A, 50
B, 200
B, 100
;;;;
run;
data want1 want2;
do _N_=1 by 1 until(last.group);
set have;
by group;
Total_Revenue + Revenue;
end;
do _N_ = 1 to _N_;
set have;
output want1;
end;
output want2;
Total_Revenue=.;
run;
proc print data = want1;
run;
proc print data = want2;
run;
Bart
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.