I have a table
| Group | Revenue | 
| A | 100 | 
| A | 50 | 
| B | 200 | 
| B | 100 | 
I want to sum aggregation by proc sql, like:
| Group | Total Revenue | 
| A | 150 | 
| B | 300 | 
proc sql;
        select 
             Group, sum(Revenue) as Total_Revenue
        from mytable group by Group
;quit;However, it produces duplicates records:
| Group | Total Revenue | 
| A | 150 | 
| A | 150 | 
| B | 300 | 
| B | 300 | 
Of course, there are couple of solution like NOMERGE SQL (not always work) or SORT NODUPKEY. But I wonder any efficient and direct way, something like:
proc sql;
        select 
             first(Group), # Take the first name record in the Group 
             sum(Revenue) as Total_Revenue
        from mytable group by Group
;quit;
Either your GROUP variable values are not exact because of leading spaces or invisible characters, but this should work as expected. What do you get if you run the following on the same data set. Please show the log from your original code and this.
proc freq data=mytable;
table group;
weight revenue;
run;Note that I cannot replicate your issue either:
data have;
infile cards dlm=',';
input Group $	Revenue;
cards;
A,	100
A,	50
B,	200
B,	100
;;;;
proc sql;
     create table want as
        select 
             Group, sum(Revenue) as Total_Revenue
        from have group by Group
;quit;
proc print data=want;run;This generates the table shown in your original post.
Log:
 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 68         
 69         data have;
 70         infile cards dlm=',';
 71         input Group $Revenue;
 72         cards;
 
 NOTE: The data set WORK.HAVE has 4 observations and 2 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              524.12k
       OS Memory           24740.00k
       Timestamp           05/31/2021 05:23:57 PM
       Step Count                        45  Switch Count  2
       Page Faults                       0
       Page Reclaims                     126
       Page Swaps                        0
       Voluntary Context Switches        10
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           264
       
 77         ;;;;
 
 78         
 79         proc sql;
 80              create table want as
 81                 select
 82                      Group, sum(Revenue) as Total_Revenue
 83                 from have group by Group
 84         ;
 NOTE: Table WORK.WANT created, with 2 rows and 2 columns.
 
 84       !  quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.00 seconds
       user cpu time       0.01 seconds
       system cpu time     0.00 seconds
       memory              5433.81k
       OS Memory           29864.00k
       Timestamp           05/31/2021 05:23:57 PM
       Step Count                        46  Switch Count  2
       Page Faults                       0
       Page Reclaims                     219
       Page Swaps                        0
       Voluntary Context Switches        10
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           272
@vietlinh12hoa wrote:
I have a table
Group Revenue A 100 A 50 B 200 B 100 
I want to sum aggregation by proc sql, like:
Group Total Revenue A 150 B 300 
proc sql; select Group, sum(Revenue) as Total_Revenue from mytable group by Group ;quit;However, it produces duplicates records:
Group Total Revenue A 150 A 150 B 300 B 300 
Of course, there are couple of solution like NOMERGE SQL (not always work) or SORT NODUPKEY. But I wonder any efficient and direct way, something like:
proc sql; select first(Group), # Take the first name record in the Group sum(Revenue) as Total_Revenue from mytable group by Group ;quit;
Ok Thanks. I check and so it apparently had the space in Group. But anyway, any chance I can use first name (or last name) observation of Group (in case all Group values are not perfectly matched)?
Just for fun, you can have both situations generated in one data step 😉
data have;
infile cards dlm=',';
input Group $	Revenue;
cards;
A,	100
A,	50
B,	200
B,	100
;;;;
run;
data want1 want2;
  do _N_=1 by 1 until(last.group);
    set have;
    by group;
    Total_Revenue + Revenue;
  end;
  do _N_ = 1 to _N_;
    set have;
    output want1;
  end;
  output want2;
  Total_Revenue=.;
run;
proc print data = want1;
run;
proc print data = want2;
run;
Bart
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
