topic Re: GROUP BY keeping duplicates in SAS Programming

GROUP BY keeping duplicates

geds133 — Tue, 25 Jun 2019 10:42:31 GMT

I am trying to group the towns and Area_segment so that I have a number of customers for each area segment and town. However there are duplicate results in my table which means the group by has not grouped the desired variables. The code is a s follows:

proc sql;

---------------------------------------------------------------------------------------------
CREATE TABLE WORK.Qopenbranch AS

---------------------------------------------------------------------------------------------
SELECT
case when areasegment = ' ' then 'Unknown' else areasegment end as Area_Segment,
COUNT( calculated Area_Segment) AS nocustomers,
town

---------------------------------------------------------------------------------------------
FROM WORK.QProductList AS a LEFT JOIN Cn.Branch AS b

---------------------------------------------------------------------------------------------
ON
a.branchcode = b.branchcode

---------------------------------------------------------------------------------------------
GROUP BY
town, Area_Segment;

---------------------------------------------------------------------------------------------
quit;

Could someone explain?

Thanks

Re: GROUP BY keeping duplicates

Kurt_Bremser — Tue, 25 Jun 2019 09:48:58 GMT

Please supply example data for your datasets. Use data steps with datalines, so we can easily recreate your datasets for testing.

Re: GROUP BY keeping duplicates

geds133 — Tue, 25 Jun 2019 10:33:50 GMT

I cannot provide example data as it is confidential. Apologies, I am new to SAS so am unsure how this should be laid out.

Re: GROUP BY keeping duplicates

Kurt_Bremser — Tue, 25 Jun 2019 10:49:02 GMT

Take small subsets of your data, just enough to illustrate the issue.

Anonymize the necessary columns, then post here.

SAS code is data driven, without knowing the data it's next to impossible to diagnose anything.

Re: GROUP BY keeping duplicates

FreelanceReinh — Tue, 25 Jun 2019 13:18:23 GMT

@geds133 wrote:

However there are duplicate results in my table which means the group by has not grouped the desired variables.

Maybe the "duplicates" only look like duplicates, but in fact there are minor differences between the (character) GROUP BY variable values, e.g. due to leading blanks or invisible characters. (With numeric grouping variables tiny rounding errors could cause the differences.)

Example:

proc print data=test;
run;

Result:

Obs        x    state      town

 1      -0.1     NY      New York
 2      22.2     NY      New York
 3      22.8     NY      New York

proc sql;
select state, town, sum(x) as total
from test
group by state, town;
quit;

Result:

state  town                     total
-------------------------------------
NY     New York                 -0.1
NY     New York                  22.8
NY     New York                  22.2

No aggregation has occurred, but aren't there duplicate state-town combinations?

Solution:

data test;
length x 8 state $3 town $20;
input x 4. town $char20.;
if _n_=2 then substr(town,4,1)='A0'x;
state='NY'||byte(x+10);
cards;
-0.1 New York
22.2New York
22.8New York
;

proc print data=test;
format state $hex6. town $hex18.;
run;