Group by - wrong results?

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Group by - wrong results?

Hello.

Why first request return 2 rows? I'am waiting 1 row. Give any links, please.

data dbl;
dummy = 'X'; output;
dummy = 'X '; output;
proc sql;

select count(*) as cnt
from (
   select
      strip(dummy) as s_dummy,
       count(*) as cnt1
   from dbl
   group by
      strip(dummy)
) t1;

select count(*) as cnt
from (
   select
      strip(dummy) as s_dummy,
       count(*) as cnt1
   from dbl
   group by
      1
) t2;

select count(*) as cnt
from (
   select distinct
      strip(dummy) as s_dummy
   from dbl
) t3;

quit;
------
RESULTS:

    cnt
    2
   
    cnt
    1
   
    cnt
    1


Accepted Solutions
Solution
‎12-24-2014 02:44 AM
Grand Advisor
Posts: 9,596

Re: Group by - wrong results?

Patrick,

  /* version 2 */

  select

    dummy,

    count(*) as cnt1

  from dbl

    group by

      strip(dummy)

  ;

I think they are two different variables , that makes query is still listing query, not summary query, If we need the summary query indeed, they should be the same variable :

select

    dummy,

    count(*) as cnt1

  from dbl

    group by

      dummy

OR

select

      strip(dummy) as s_dummy,

       count(*) as cnt1

   from dbl

   group by calculated s_dummy

if  there are some new variable appear either in GROUP or SELECT ,that would turn query into listing query not summary query . Even they have the same form like OP display:

select

      strip(dummy) as s_dummy,

       count(*) as cnt1

   from dbl

   group by

      strip(dummy)


SAS still take them as two different variables , even though they are the same. I guess. Smiley Happy



Xia Keshan

View solution in original post


All Replies
Trusted Advisor
Posts: 1,203

Re: Group by - wrong results?

Hi,


In group by clause refer s_dummy variable to fix this.


proc sql;

select count(*) as cnt
from (
   select
      strip(dummy) as s_dummy,
       count(*) as cnt1
   from dbl
   group by
      s_dummy
) t1;

New Contributor
Posts: 3

Re: Group by - wrong results?

Thank you. Of course I can use your variant or my second variant.

But it's interesting how first variant work - why data not grouping? And can I use expressions in "group by" as I see in documentation?

Grand Advisor
Posts: 9,596

Re: Group by - wrong results?

Because you add strip(dummy) as s_dummy to create a new variable , that make the query is listing or detail query ,not summary query , so you got two X in table .

or write it as :

data dbl;

dummy = 'X'; output;

dummy = 'X '; output;

run;

proc sql;

select count(*) as cnt

from (

   select

      strip(dummy) as s_dummy,

       count(*) as cnt1

   from dbl

   group by calculated s_dummy )

;

quit;

Xia Keshan

Respected Advisor
Posts: 3,841

Re: Group by - wrong results?

Version 2 below returns two lines as result instead of one line. It appears that in this case the SQL parser needs to back-merge the result to the original data "NOTE: The query requires remerging summary statistics back with the original data.".

I couldn't find a SAS Note which clearly explains this behavior. Not sure if this can be called a "feature" or not. May be someone else can shed some more light on this.

data dbl;

  dummy = 'X'; output; output;

run;

proc sql feedback _method _tree;

  /* version 1 */

  select

    dummy,

    count(*) as cnt1

  from dbl

    group by

      dummy

  ;

  /* version 2 */

  select

    dummy,

    count(*) as cnt1

  from dbl

    group by

      strip(dummy)

  ;

  /* version 3 */

  select DISTINCT

    dummy,

    count(*) as cnt1

  from dbl

    group by

      strip(dummy)

  ;

  /* version 4 */

  select

    strip(dummy) as dummy2,

    count(*) as cnt1

  from dbl

    group by

      dummy2

  ;

quit;

Solution
‎12-24-2014 02:44 AM
Grand Advisor
Posts: 9,596

Re: Group by - wrong results?

Patrick,

  /* version 2 */

  select

    dummy,

    count(*) as cnt1

  from dbl

    group by

      strip(dummy)

  ;

I think they are two different variables , that makes query is still listing query, not summary query, If we need the summary query indeed, they should be the same variable :

select

    dummy,

    count(*) as cnt1

  from dbl

    group by

      dummy

OR

select

      strip(dummy) as s_dummy,

       count(*) as cnt1

   from dbl

   group by calculated s_dummy

if  there are some new variable appear either in GROUP or SELECT ,that would turn query into listing query not summary query . Even they have the same form like OP display:

select

      strip(dummy) as s_dummy,

       count(*) as cnt1

   from dbl

   group by

      strip(dummy)


SAS still take them as two different variables , even though they are the same. I guess. Smiley Happy



Xia Keshan

Grand Advisor
Posts: 9,596

Re: Group by - wrong results?

Sorry ,Oleg

I found a very interesting thing , as long as you use a function in GROUP , sas will take it as a new variable and make your query be listing not summary. Therefore, due to this reason , you'd better not use any function in GROUP to avoid to generate listing query .

data dbl;
dummy = 'X'; output;
dummy = 'X '; output;
run;
proc sql;

   select
      dummy ,
       count(*) as cnt1
   from dbl
   group by  strip(dummy)
;
quit;

OUTPUT:
X   2
X   2




proc sql;

   select
      strip(dummy) ,
       count(*) as cnt1
   from dbl
   group by  dummy
;
quit;

OUTPUT:
X    2

It is too many fun .

proc sql;

   select   
       count(*) as cnt1
   from dbl
   group by  strip(dummy)
;
quit;

OUTPUT:
 2

Obviously , no matter what variable you add into SELECT ,

select dummy,count(*) as cnt1  from dbl group by strip(dummy)

or

select strip(dummy),count(*) as cnt1  from dbl group by strip(dummy)

sas will turn it into listing query not summary query any more .

but if you don't add any variable in SELECT, it will be summary query .Very funny.

Xia Keshan

Message was edited by: xia keshan

New Contributor
Posts: 3

Re: Group by - wrong results?

Thank's for your time, xia keshan! Your thoughts very helpfull!

It's intresting for me now what about ansi sql? For example, in oracle this quiery

   select rtrim(dummy) s_dummy, count(*)

   from (

      select 'X' dummy from dual union all

      select 'X ' dummy from dual

      ) t1

   group by rtrim(dummy)

   ;

return only one row - not like in SAS proc sql. I think it's importent and maybe it's sould be noted on PROC SQL and the ANSI Standard.

Or maybe Oracle working unlike ANSI?

Of course, it's another question.

Thank you very match!

Grand Advisor
Posts: 9,596

Re: Group by - wrong results?

proc sql also return one row . That is real weird .

It is obviously SAS take different action to a table and a sub-query .

For a table , SAS will take SELECT and GROUP variables as two different variable ,although they are the same ,

But for a sub-query, SAS take them as the same variable  . Isn't it weird ?

data dual;
x = 'X'; output;
x = 'X'; output;
run;
proc sql;
 select trim(dummy) as s_dummy, count(*) as cnt1
   from (
      select 'X'  as dummy from dual union all
      select 'X'  as dummy from dual
      ) 
   group by trim(dummy)
   ;
   quit;

OUTPUT:
X    4









data dbl;
dummy = 'X'; output;
dummy = 'X '; output;
run;
proc sql;

   select
      trim(dummy) as s_dummy, count(*) as cnt1
   from dbl
   group by  trim(dummy)
;
quit;

OUTPUT:
X    2
X    2

Xia Keshan

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 585 views
  • 1 like
  • 4 in conversation