BookmarkSubscribeRSS Feed
Jeg123
Calcite | Level 5

Writing some SQL code which gives results that does not make sense. Is this a bug or is it suppose to be like this? I have used several SQL engines and none gives this results. Two examples below:

 

SAS:

 

proc SQL;

create table test (
a varchar(8),
b varchar(8));

insert into test (a, b) values ('XX_ABC01', 'ABCDXX01');
insert into test (a, b) values ('XX_ABC01', 'ABCDXX01');
insert into test (a, b) values ('XX_ABC02', 'ABCDXX02');
insert into test (a, b) values ('XY_DCY02', 'ABCDXX02');

;quit;

 

proc SQL;

select
substr(a, 1, 6) as a,
substr(b, 1, 6) as b,
count(*) as n
from test
group by
substr(a, 1, 6),
substr(b, 1, 6)

;quit;

 

result:

 

 
a b n
XX_ABCABCDXX3
XX_ABCABCDXX3
XY_DCYABCDXX1

 

SQL SERVER (See SQL fiddle for working example)

 

create table test (
a varchar(8),
b varchar(8));

insert into test (a, b) values ('XX_ABC01', 'ABCDXX01');
insert into test (a, b) values ('XX_ABC01', 'ABCDXX01');
insert into test (a, b) values ('XX_ABC02', 'ABCDXX02');
insert into test (a, b) values ('XY_DCY02', 'ABCDXX02');

 

select
substring(a, 1, 6) as a,
substring(b, 1, 6) as b,
count(*) as n
from test
group by
substring(a, 1, 6),
substring(b, 1, 6) ;

 

RESULT (correct in my opinion)

 

XX_ABCABCDXX3a b n
XX_ABCABCDXX3
XY_DCYABCDXX
1
   

 

4 REPLIES 4
Kurt_Bremser
Super User

Since substr(a,1,6) (used in the group by) is a different object than new variable a from the select, SAS sees the need to remerge the summary statistics (see the NOTE in the log).

Use this instead:

proc SQL;
create table test (
  a char(8),
  b char(8)
);
insert into test (a, b) values ('XX_ABC01', 'ABCDXX01');
insert into test (a, b) values ('XX_ABC01', 'ABCDXX01');
insert into test (a, b) values ('XX_ABC02', 'ABCDXX02');
insert into test (a, b) values ('XY_DCY02', 'ABCDXX02');
quit;

proc SQL;
select
  substr(a, 1, 6) as a,
  substr(b, 1, 6) as b,
  count(*) as n
from test
group by calculated a, calculated b
;
quit;

Note that SAS has no datatype varchar, only char.

ChrisNZ
Tourmaline | Level 20
GROUP BY 1,2

works too.

 

@Kurt_Bremser It is the same object though. SAS is wrong here no?

Kurt_Bremser
Super User

Logically, it's the same, but technically it's just the result of a function (in the group by of the OP). 1,2 adresses the newly created variables, not the expressions, and is therefore equivalent to my usage of the calculated variables.

The SQL compiler is obviously not intelligent enough to check if the expressions in the group by are the same as those in the select. Maxim 31 in action.

ChrisNZ
Tourmaline | Level 20

LOL #31

All other SQL parsers manage the feat, so while maybe not a defect per se (and while I much prefer the calculated or 1,2 syntaxes anyway) I'd suggest this is at least a gap or a weakness.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1102 views
  • 2 likes
  • 3 in conversation