Help using Base SAS procedures

A problem with controlling the length with PROC SQL

Reply
Super Contributor
Posts: 297

A problem with controlling the length with PROC SQL

Hello everyone,

I wonder why it gives me problems to control the length of a variable in a dataset constructed with proc sql.

It seems that when the "union all" is used then the length of a numerical variable can not be specified. Here is my example:

data data1;

  length a 3;

  a=1;

run;

data data2;

  length a 3;

  a=2;

run;

proc sql;

  create table one_and_two as

  select a length=3 from data1

  union all

  select a length=3 from data2;

quit;

In the "one_and_two" dataset the length of the variable "a" is now 8! I would have expected it should be 3.

If I do the same as above with a character variable, then there is no problem in controling the length. Also, If there is no union then  the length  can also be controlled. Any explanation for that?

Super User
Posts: 7,758

Re: A problem with controlling the length with PROC SQL

Posted in reply to JacobSimonsen

Interesting. Another point on my (steadliy growing longer) list of "SQL is evil".

Edit: Found a working method.

proc sql;

  create table one_and_two as select a length=3 from

  (select a from data1

  union all

  select a from data2);

quit;

But

data one_and_two;

set data1 data2;

run;

is much simpler and works also.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Trusted Advisor
Posts: 3,211

Re: A problem with controlling the length with PROC SQL

Posted in reply to KurtBremser

interesting yes checked and behavior as described. SQL is evil agree on that.
Even more as most DBA's do not know the difference between numeric and character still living in the Hollerith card age.

Explanation on that.

- Doing measurements you have real numbers that are in generic approach of floating type (8 bytes).

- Classifications ordinal/nominal are of text types (length 1-*)

  Telling female =1 male=2 would say the mean of sex = 1.5. A classic example of nonsense statistics. In this case 1 and 2 are still characters that have limited valid range of values.

  That are constraints.   There are 10 types of people in this world, those who understand binary and those who don'thttp://www.urbandictionary.com/define.php?term=there%20are%2010%20types%20of%20people%20in%20this%20...

For the length question: is not very sensible trying to limit numeric lengths as of effects loss of precision.

It looks as the union is using the default length setting for all SAS numerics, some kind of protection for beginners or not implemented the shortening.

The datastep is faster on this and gives more control

Would you prefer a more strict datatype approach. There is new language "DS2" and another SQL implementation FedSql

---->-- ja karman --<-----
Super User
Posts: 10,018

Re: A problem with controlling the length with PROC SQL

Posted in reply to JacobSimonsen

That is really weird .report it to sas support . see how they say it .

Here is a workaround way . build a table construct before full the data.

data data1;

  length a 4;

  a=1;

run;

data data2;

  length a 4;

  a=2;

run;

proc sql;

create table one_and_two like data1;

insert into  one_and_two

  select a  from data1

  union all corr

  select a  from data2;

quit;

Xia Keshan

Super User
Super User
Posts: 7,942

Re: A problem with controlling the length with PROC SQL

Hi,

Yes, I think what is happening is this:

step 1 - the engine sees a union all

2 - it creates a table in the background examining the variables in table 1

3 - the table is assigned as num - default length

4 - the data is inserted

5 - then the next set of data

(not complete)

KurtBremser's solutions works by creating the table explicitly with the create table statement, and then the temporary table behind the scenes would then be inserted into that.  Same with yours KSharp.  Table creation first then insert the data.  You can fool it with inputs as well:

proc sql _method _tree;

  create table one_and_two as

  select input(put(a,best.),3.) length=3 from data1

  union all

  select a from data2;

quit;

So from an SQL point of view, think about what you want the final table to look like, order, lengths, variables etc.  then add the data into it.

Super Contributor
Posts: 297

Re: A problem with controlling the length with PROC SQL

Posted in reply to JacobSimonsen

As shown above it is not difficult to find a work around, and the question is therefore not how a workaround should be done. It is simple, and the dataset approach is like the easiest way.

Rather, the question is why SQL do something different from what expected. I am happy that you experts also find it strange, so it is not just me who have overlooked something trivial.

Thank you for your answers:smileyblush:

Ask a Question
Discussion stats
  • 5 replies
  • 460 views
  • 0 likes
  • 5 in conversation