In a DATA step, the program data vector does not allow two variables to have the same name. SQL is different. The namespace for a query can have multiple instances of the same column name.
Usually, the SQL processor will generate an error message if there is an ambiguous reference to a column name. There are still situations, however, where it is quite possible to get into trouble.
Here is an example:
proc sql; create table one as select name, height, round(weight) as Size from sashelp.class where sex='M'; title 'Boys Taller Than 66 Inches'; create table two as select name, round(height) as Size from one where size > 66; select * from two;
The result is
Boys Taller Than 66 Inches
Name Size ------------------ Alfred 69 Henry 64 James 57 Jeffrey 63 John 59 Philip 72 Robert 65 Ronald 67 Thomas 58 William 67
There are lots of values of 66 or less. Why? It has to do with the order in which the parts of the query are processed. The WHERE filtering is done first. In fact, it is passed up to the data engine, which sees only the columns in the header of data set ONE. So it is not aware of the ambiguity and works with the SIZE values it finds in ONE, which of course reflect weights, not heights.
One way to fix this is to change the WHERE clause and specify (by means of the keyword CALCULATED) that SIZE refers to the new column derived from heights, as in:
create table three as select name, round(height) as Size from one where calculated size > 66; select * from three; quit;
That tells the processor to use the new SIZE columns and forces the WHERE filtering to be done after that column is derived. It produces, as expected:
Boys Taller Than 66 Inches
Name Size ------------------ Alfred 69 Philip 72 Ronald 67 William 67
While multiple like-named columns are tolerated in SQL's workspace, they cannot be stored in SAS data sets. So if query results are feeding a CREATE TABLE or CREATE VIEW statement, only the first one will be saved, and warnings will be issued for subsequent columns with duplicative names. Of course unambiguous names can be specified to avoid this problem.
This article was originally posted by Howard Schreier on sasCommunity.org.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.