Re: Aggregation Group by provide duplicated records

vietlinh12hoa · Posted 05-31-2021 01:16 PM

I have a table

Group	Revenue
A	100
A	50
B	200
B	100

I want to sum aggregation by proc sql, like:

Group	Total Revenue
A	150
B	300

proc sql;
        select 
             Group, sum(Revenue) as Total_Revenue
        from mytable group by Group
;quit;

However, it produces duplicates records:

Group	Total Revenue
A	150
A	150
B	300
B	300

Of course, there are couple of solution like NOMERGE SQL (not always work) or SORT NODUPKEY. But I wonder any efficient and direct way, something like:

proc sql;
        select 
             first(Group), # Take the first name record in the Group 
             sum(Revenue) as Total_Revenue
        from mytable group by Group
;quit;

Reeza · Posted 05-31-2021 01:24 PM

Either your GROUP variable values are not exact because of leading spaces or invisible characters, but this should work as expected. What do you get if you run the following on the same data set. Please show the log from your original code and this.

proc freq data=mytable;
table group;
weight revenue;
run;

Note that I cannot replicate your issue either:

data have;
infile cards dlm=',';
input Group $	Revenue;
cards;
A,	100
A,	50
B,	200
B,	100
;;;;

proc sql;
     create table want as
        select 
             Group, sum(Revenue) as Total_Revenue
        from have group by Group
;quit;

proc print data=want;run;

This generates the table shown in your original post.

Log:

 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 68         
 69         data have;
 70         infile cards dlm=',';
 71         input Group $Revenue;
 72         cards;
 
 NOTE: The data set WORK.HAVE has 4 observations and 2 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              524.12k
       OS Memory           24740.00k
       Timestamp           05/31/2021 05:23:57 PM
       Step Count                        45  Switch Count  2
       Page Faults                       0
       Page Reclaims                     126
       Page Swaps                        0
       Voluntary Context Switches        10
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           264
       
 77         ;;;;
 
 78         
 79         proc sql;
 80              create table want as
 81                 select
 82                      Group, sum(Revenue) as Total_Revenue
 83                 from have group by Group
 84         ;
 NOTE: Table WORK.WANT created, with 2 rows and 2 columns.
 
 84       !  quit;
 NOTE: PROCEDURE SQL used (Total process time):
       real time           0.00 seconds
       user cpu time       0.01 seconds
       system cpu time     0.00 seconds
       memory              5433.81k
       OS Memory           29864.00k
       Timestamp           05/31/2021 05:23:57 PM
       Step Count                        46  Switch Count  2
       Page Faults                       0
       Page Reclaims                     219
       Page Swaps                        0
       Voluntary Context Switches        10
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           272

Spoiler

@vietlinh12hoa wrote:

I have a table

Group Revenue

A 100

A 50

B 200

B 100

I want to sum aggregation by proc sql, like:

Group Total Revenue

A 150

B 300
proc sql;
        select 
             Group, sum(Revenue) as Total_Revenue
        from mytable group by Group
;quit;
However, it produces duplicates records:

Group Total Revenue

A 150

A 150

B 300

B 300

Of course, there are couple of solution like NOMERGE SQL (not always work) or SORT NODUPKEY. But I wonder any efficient and direct way, something like:
proc sql;
        select 
             first(Group), # Take the first name record in the Group 
             sum(Revenue) as Total_Revenue
        from mytable group by Group
;quit;

wrote: I have a table Group Revenue A 100 A 50 B 200 B 100 I want to sum aggregation by proc sql, like: Group Total Revenue A 150 B 300 proc sql; select Group, sum(Revenue) as Total_Revenue from mytable group by Group ;quit; However, it produces duplicates records: Group Total Revenue A 150 A 150 B 300 B 300 Of course, there are couple of solution like NOMERGE SQL (not always work) or SORT NODUPKEY. But I wonder any efficient and direct way, something like: proc sql; select first(Group), # Take the first name record in the Group sum(Revenue) as Total_Revenue from mytable group by Group ;quit;

vietlinh12hoa · Posted 05-31-2021 01:47 PM

Ok Thanks. I check and so it apparently had the space in Group. But anyway, any chance I can use first name (or last name) observation of Group (in case all Group values are not perfectly matched)?

Reeza · Posted 05-31-2021 01:52 PM

The differences would cause them to be considered different groups, so first/last would not work. FIRST/LAST are data step concepts that do not overlap with SQL anyways.

You can try to use COMPRESS() or TRIM() within a query.

proc sql;
create table want as
select
compress(Group) as group, sum(Revenue) as Total_Revenue
from have group by compress(Group)
;quit;

Reeza · Posted 05-31-2021 01:26 PM

FYI this would happen if you also had a "Select *" or another column/variable in the query that isn't in your GROUP BY or in an aggregation statement. But that means your query wouldn't match what you've shown us.

yabwon · Posted 05-31-2021 03:00 PM

Just for fun, you can have both situations generated in one data step 😉

data have;
infile cards dlm=',';
input Group $	Revenue;
cards;
A,	100
A,	50
B,	200
B,	100
;;;;
run;

data want1 want2;
  do _N_=1 by 1 until(last.group);
    set have;
    by group;
    Total_Revenue + Revenue;
  end;

  do _N_ = 1 to _N_;
    set have;
    output want1;
  end;
  output want2;
  Total_Revenue=.;
run;

proc print data = want1;
run;
proc print data = want2;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Ksharp · Posted 06-01-2021 08:57 AM

It should not be, there is some clause you didn't posted.

proc sql;
select
DISTINCT Group, sum(Revenue) as Total_Revenue
from mytable group by Group
;quit;

Aggregation Group by provide duplicated records