BookmarkSubscribeRSS Feed
JacobSimonsen
Barite | Level 11

Hello everyone,

I wonder why it gives me problems to control the length of a variable in a dataset constructed with proc sql.

It seems that when the "union all" is used then the length of a numerical variable can not be specified. Here is my example:

data data1;

  length a 3;

  a=1;

run;

data data2;

  length a 3;

  a=2;

run;

proc sql;

  create table one_and_two as

  select a length=3 from data1

  union all

  select a length=3 from data2;

quit;

In the "one_and_two" dataset the length of the variable "a" is now 8! I would have expected it should be 3.

If I do the same as above with a character variable, then there is no problem in controling the length. Also, If there is no union then  the length  can also be controlled. Any explanation for that?

5 REPLIES 5
Kurt_Bremser
Super User

Interesting. Another point on my (steadliy growing longer) list of "SQL is evil".

Edit: Found a working method.

proc sql;

  create table one_and_two as select a length=3 from

  (select a from data1

  union all

  select a from data2);

quit;

But

data one_and_two;

set data1 data2;

run;

is much simpler and works also.

jakarman
Barite | Level 11

interesting yes checked and behavior as described. SQL is evil agree on that.
Even more as most DBA's do not know the difference between numeric and character still living in the Hollerith card age.

Explanation on that.

- Doing measurements you have real numbers that are in generic approach of floating type (8 bytes).

- Classifications ordinal/nominal are of text types (length 1-*)

  Telling female =1 male=2 would say the mean of sex = 1.5. A classic example of nonsense statistics. In this case 1 and 2 are still characters that have limited valid range of values.

  That are constraints.   There are 10 types of people in this world, those who understand binary and those who don'thttp://www.urbandictionary.com/define.php?term=there%20are%2010%20types%20of%20people%20in%20this%20...

For the length question: is not very sensible trying to limit numeric lengths as of effects loss of precision.

It looks as the union is using the default length setting for all SAS numerics, some kind of protection for beginners or not implemented the shortening.

The datastep is faster on this and gives more control

Would you prefer a more strict datatype approach. There is new language "DS2" and another SQL implementation FedSql

---->-- ja karman --<-----
Ksharp
Super User

That is really weird .report it to sas support . see how they say it .

Here is a workaround way . build a table construct before full the data.

data data1;

  length a 4;

  a=1;

run;

data data2;

  length a 4;

  a=2;

run;

proc sql;

create table one_and_two like data1;

insert into  one_and_two

  select a  from data1

  union all corr

  select a  from data2;

quit;

Xia Keshan

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Yes, I think what is happening is this:

step 1 - the engine sees a union all

2 - it creates a table in the background examining the variables in table 1

3 - the table is assigned as num - default length

4 - the data is inserted

5 - then the next set of data

(not complete)

KurtBremser's solutions works by creating the table explicitly with the create table statement, and then the temporary table behind the scenes would then be inserted into that.  Same with yours KSharp.  Table creation first then insert the data.  You can fool it with inputs as well:

proc sql _method _tree;

  create table one_and_two as

  select input(put(a,best.),3.) length=3 from data1

  union all

  select a from data2;

quit;

So from an SQL point of view, think about what you want the final table to look like, order, lengths, variables etc.  then add the data into it.

JacobSimonsen
Barite | Level 11

As shown above it is not difficult to find a work around, and the question is therefore not how a workaround should be done. It is simple, and the dataset approach is like the easiest way.

Rather, the question is why SQL do something different from what expected. I am happy that you experts also find it strange, so it is not just me who have overlooked something trivial.

Thank you for your answers:smileyblush:

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2032 views
  • 0 likes
  • 5 in conversation