BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kas
Fluorite | Level 6 kas
Fluorite | Level 6

I as surprised to find that I could use a column alias in my where statement. 

Example

create table xxx as

select one, two, three

from dset

where one =1

;

create table yyy as

select one as flag, two, three

from dset

where flag = 1;

Do these statements both yield these same result?  If soon, is this documented somewhere.  I have found these documentation for using caculated values in a where statements but not for using these column alias in these where statements.

1 ACCEPTED SOLUTION

Accepted Solutions
LewisC_sas
SAS Employee

The first few paragraphs below  are mostly general comments about column aliasing in the SQL language. The remaining paragraphs are specific to PROC SQL.

Effectively, a column alias can be considered to be a temporary renaming of the expression that is being referenced. The duration of this renaming lasts for the lifetime of the query's execution. For example, in the code below, the qty column has been renamed to Quantity and the expression cost+100 has been renamed to ListPrice.

     title 'columns with aliases';

     select qty as Quantity, cost, cost+100 as ListPrice

       from calc;

The template for an SQL query can be specified, at a high abstraction level, as follows below. The parenthesized numbers at the right of each query part represents its position in the (conceptual) execution order of the parts of an SQL query. More will be said about this below. Please note that planners/optimizers, such as the one that is present in PROC SQL, have the freedom to alter internal execution orders (as long as any alterations do NOT change the meaning of a/the query).

      select <select list>         (5)

        from <from clause>         (1)

       where <where clause>        (2)

    group by <group by clause>     (3)

      having <having clause>       (4)

    order by <order by clause>     (6)

   

Of course, the only mandatory parts of an SQL query are the SELECT and FROM, The other four parts may be optional, depending on what someone is trying to accomplish with the query.  Technically, the ORDER BY part is not part of the query. Rather, it is part of a cursor. However, for simplicity, we can pretend that it is part of the query.

  

From the ISO/ANSI SQL Standards perspective, the conceptual order of execution of the six parts, if all are present, is this sequence: the FROM part is executed first, the WHERE part is executed second, the GROUP BY part is executed third, the HAVING part is executed fourth, the SELECT part is executed fifth, and the ORDER BY part is executed last.

The ISO/ANSI SQL Standards do not require that the value that is associated with any column alias be present until ORDER BY-time. More specifically, there is no guarantee that a column alias will work as expected if it appears on the clauses that are parts of execution steps 2, 3, and 4, namely, the WHERE clause, GROUP BY clause, and HAVING clause. A column alias is not guaranteed to have been resolved until the beginning of execution step 6 (the start of execution for the ORDER BY part).

The major ramificaton of this is that, per the ISO/ANSI SQL Standards rules regarding column aliases, there is no guarantee, whatsoever, that a SQL processor will resolve a column alias in time for it to be referenced in a <where clause>, <group by clause>, or <having clause>. However, per the Standards, any column alias must be available for reference by the time that the <order by clause> is processed. Hence, for maximum portability of SQL code, it is best to AVOID writing code that refers to column aliases in either a <where clause>, <group by clause>, or <having clause>.

In PROC SQL, if a column alias is just a rename of a table column, then either the table name or column name can be used in the <where clause>, <having clause>, or <order by clause>. For example, the first alias in each SELECT part is just a rename of a table column whereas

the second alias refers to a calculated epression. Both the first and second SQL statements below will output the expected results in PROC SQL.

     /*-- preferred since no column alias is referenced in the <where clause> --*/

     /*-- PORTABLE to other SQL processors --*/

     select qty as Quantity, cost, cost+100 as ListPrice

       from calc

      where qty > 5;

     

     /*-- this will work in PROC SQL, but may not work with some other SQL processors --*/

     select qty as Quantity, cost, cost+100 as ListPrice

       from calc

      where Quantity > 5;

     

The development of what we know today as PROC SQL began in the 1984-85 timeframe. Some of its scoping and aliasing rules predate those of the first ANSI SQL Standard (1986) and the first ISO SQL Standard (1987). One of the early extensions that was made to PROC SQL was the development of the CALCULATED keyword. Basically, this keyword enables PROC SQL users to reference column aliases that are associated with calculated expressions. The column alias that is being referenced by the CALCULATED keyword can be on either the <where clause>, <group by clause>, <having clause>, or <order by clause>. Make note, though, that the use of the CALCULATED keyword is redundant if is used on the <order by clause> to refer to a column alias because that alias will have been resolved by the time that the execution of the ORDER BY part of a query begins.

Now, for an example that uses the CALCULATED keyword. Suppose we wish to subset the rows by the value that is associated with the second alias (i.e., ListPrice). Then, it can be accomplished in PROC SQL by this code.

     /*-- note the use of the CALCULATED keyword --*/

     select qty as Quantity, cost, cost+100 as ListPrice

       from calc

      where CALCULATED ListPrice > 1500;

     

Unfortunately, the above code is not portable. The ISO/ANSI SQL Standards-approved way of accomplishing this is as follows.

     /*-- note that the ListPrice column alias does not appear in the <where clause> --*/

     select qty as Quantity, cost, cost+100 as ListPrice

       from calc

      where cost+100 > 1500;

I hope that the above comments help to clarify matters somewhat.

View solution in original post

14 REPLIES 14
MikeZdeb
Rhodochrosite | Level 12

hi ... no different than what happens a data step ...

data males;

set sashelp.class (keep=sex name age rename=(sex=gender));

where gender eq 'M';

run;

proc sql;

create table males as

select sex as gender, name age

from sashelp.class

where gender eq 'M';

quit;


Pritish
Quartz | Level 8

Yes they will yeild the same result. These alias will further help you when you are using any aggregate functions in your query and ordering your dataset based on that new variable.

proc sql;

create table yourdata as

select location, sum(salary) as Total_Salary

from employee

group by location

order by Total_Salary Desc;

quit;

kas
Fluorite | Level 6 kas
Fluorite | Level 6

Thank you both for you responses; I have a couple of additional questions/comments for anyone who might know the answer.

Perhaps it is not clear, but this does not follow ANSI standards for SQL and so I thought SAS would have this documented.  I have not been able to find any yet and this is my question - where is it documented?  Someone who has experience writing SQL for Oracle, would find the interchangeability of the alias to original column peculiar.

It is different than a data step because in your example (for MikeZdeb), you cannot interchange sex/gender in your where statement.  The variable has been renamed upon reading and can no longer be referenced by the old value (i.e. sex no longer exists).  In my example, either column (source/alias) can be identified in the where clause.

art297
Opal | Level 21

I'm not sure what specifically you want to see the documentation for.  The two sql statements you originally posted will end up with the same records, but one will have the variable's flag, two and three, and the other will have the variable's one, two and three.  The records selected will be the same.

As for other things you might be confused about, you can use dataset options, in SAS, in both SQL and the datastep.  If you use them in a datastep, their priority is alphabetical.  I.e., drop statements are operated on first, keep second, and rename third.

I've never run Oracle SQL, but I would have to think that it would do the same thing in oracle.

Joeldw
Calcite | Level 5

The SAS documentation for PROC SQL seems to describe this behavior of an alias. "After you assign an alias to a column, you can use the alias to refer to that column in other clauses."

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473678.htm

art297
Opal | Level 21

Again, while I've never used Oracle SQL, the Oracle SQL statement appears to provide conflicting instructions:

c_alias Specify an alias for the column expression. Oracle Database will use this alias in the column heading of the result set. The AS keyword is optional. The alias effectively renames the select list item for the duration of the query. The alias can be used in the order_by_clause but not other clauses in the query.

http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10002.htm#SQLRF01702

kas
Fluorite | Level 6 kas
Fluorite | Level 6

Interesting input from everyone. Thanks all for your responses.

art297 - and I think the Oracle SQL definitions are clearer and that SAS had more conflict in usage.

Joeldw - yes, that documentation is helpful but not precise.  In practice, after a column is assignged an alias, the alias can be used or the original column name.  This is actually pretty cool (but not documented to my knowledge).

The data step processing makes sense.  The rename executes as the data is read and the original variable name is no longer valid.  In SQL, either can be used; I see that as more in conflict than Oracle.

Anyways, I think SAS would better serve its customer base by documenting the 'undocumented features' that we often encounter.

Peter_C
Rhodochrosite | Level 12

see http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002294533.htm for doc on the use of "calculated" in SAS sql.

As I understand, the syntax was updated and no longer requires that prefix

LewisC_sas
SAS Employee

The first few paragraphs below  are mostly general comments about column aliasing in the SQL language. The remaining paragraphs are specific to PROC SQL.

Effectively, a column alias can be considered to be a temporary renaming of the expression that is being referenced. The duration of this renaming lasts for the lifetime of the query's execution. For example, in the code below, the qty column has been renamed to Quantity and the expression cost+100 has been renamed to ListPrice.

     title 'columns with aliases';

     select qty as Quantity, cost, cost+100 as ListPrice

       from calc;

The template for an SQL query can be specified, at a high abstraction level, as follows below. The parenthesized numbers at the right of each query part represents its position in the (conceptual) execution order of the parts of an SQL query. More will be said about this below. Please note that planners/optimizers, such as the one that is present in PROC SQL, have the freedom to alter internal execution orders (as long as any alterations do NOT change the meaning of a/the query).

      select <select list>         (5)

        from <from clause>         (1)

       where <where clause>        (2)

    group by <group by clause>     (3)

      having <having clause>       (4)

    order by <order by clause>     (6)

   

Of course, the only mandatory parts of an SQL query are the SELECT and FROM, The other four parts may be optional, depending on what someone is trying to accomplish with the query.  Technically, the ORDER BY part is not part of the query. Rather, it is part of a cursor. However, for simplicity, we can pretend that it is part of the query.

  

From the ISO/ANSI SQL Standards perspective, the conceptual order of execution of the six parts, if all are present, is this sequence: the FROM part is executed first, the WHERE part is executed second, the GROUP BY part is executed third, the HAVING part is executed fourth, the SELECT part is executed fifth, and the ORDER BY part is executed last.

The ISO/ANSI SQL Standards do not require that the value that is associated with any column alias be present until ORDER BY-time. More specifically, there is no guarantee that a column alias will work as expected if it appears on the clauses that are parts of execution steps 2, 3, and 4, namely, the WHERE clause, GROUP BY clause, and HAVING clause. A column alias is not guaranteed to have been resolved until the beginning of execution step 6 (the start of execution for the ORDER BY part).

The major ramificaton of this is that, per the ISO/ANSI SQL Standards rules regarding column aliases, there is no guarantee, whatsoever, that a SQL processor will resolve a column alias in time for it to be referenced in a <where clause>, <group by clause>, or <having clause>. However, per the Standards, any column alias must be available for reference by the time that the <order by clause> is processed. Hence, for maximum portability of SQL code, it is best to AVOID writing code that refers to column aliases in either a <where clause>, <group by clause>, or <having clause>.

In PROC SQL, if a column alias is just a rename of a table column, then either the table name or column name can be used in the <where clause>, <having clause>, or <order by clause>. For example, the first alias in each SELECT part is just a rename of a table column whereas

the second alias refers to a calculated epression. Both the first and second SQL statements below will output the expected results in PROC SQL.

     /*-- preferred since no column alias is referenced in the <where clause> --*/

     /*-- PORTABLE to other SQL processors --*/

     select qty as Quantity, cost, cost+100 as ListPrice

       from calc

      where qty > 5;

     

     /*-- this will work in PROC SQL, but may not work with some other SQL processors --*/

     select qty as Quantity, cost, cost+100 as ListPrice

       from calc

      where Quantity > 5;

     

The development of what we know today as PROC SQL began in the 1984-85 timeframe. Some of its scoping and aliasing rules predate those of the first ANSI SQL Standard (1986) and the first ISO SQL Standard (1987). One of the early extensions that was made to PROC SQL was the development of the CALCULATED keyword. Basically, this keyword enables PROC SQL users to reference column aliases that are associated with calculated expressions. The column alias that is being referenced by the CALCULATED keyword can be on either the <where clause>, <group by clause>, <having clause>, or <order by clause>. Make note, though, that the use of the CALCULATED keyword is redundant if is used on the <order by clause> to refer to a column alias because that alias will have been resolved by the time that the execution of the ORDER BY part of a query begins.

Now, for an example that uses the CALCULATED keyword. Suppose we wish to subset the rows by the value that is associated with the second alias (i.e., ListPrice). Then, it can be accomplished in PROC SQL by this code.

     /*-- note the use of the CALCULATED keyword --*/

     select qty as Quantity, cost, cost+100 as ListPrice

       from calc

      where CALCULATED ListPrice > 1500;

     

Unfortunately, the above code is not portable. The ISO/ANSI SQL Standards-approved way of accomplishing this is as follows.

     /*-- note that the ListPrice column alias does not appear in the <where clause> --*/

     select qty as Quantity, cost, cost+100 as ListPrice

       from calc

      where cost+100 > 1500;

I hope that the above comments help to clarify matters somewhat.

FriedEgg
SAS Employee

That was a good read, thank you Lewis for the insightful post!

LewisC_sas
SAS Employee

You are welcome. Thanks for your kind comments.

kas
Fluorite | Level 6 kas
Fluorite | Level 6

Yes, that is what I was looking for. 

I did not realize proc sql pre-dated the ANSI standards; interesting. 

This is good information.  Is it in the documentation? 

LewisC_sas
SAS Employee

As far as I could determine, this information was not in the current documentation. However, we plan to address this in a future documentation release.

On Tuesday of this week, I talked with the main documentation specialist for PROC SQL about this issue. I am glad to report that from followup correspondence with her during the past two days, the area of the PROC SQL documentation that is relevant to this issue will be modified to reflect the new information that I posted on November 30, 2011 at 4:14 PM. Early this afternoon, the PROC SQL  documentation specialist informed me that the revised documentation is not expected to be available to our customers until the latter part of Summer 2012.

Peter_C
Rhodochrosite | Level 12

echoing thanks from others,

it is good to see the documentation update is coming soon ( or very soon - considering the long lifetime of PROC SQL is even greater than ansi sql - half a year seems quick )

peter

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 10728 views
  • 2 likes
  • 8 in conversation