BookmarkSubscribeRSS Feed
0 Likes

I don't know why the INPUT function behaves differently in the DATA step compared to PROC SQL, but it does. I know I can work around it, but I don't think we should need to create a fudge.

 

I think PROC SQL should be changed to handle the INPUT function in the same way as the DATA step.

 

The following code hopefully shows what I mean. The expression in the DATA step performs perfectly, but PROC SQL does not like the same expression and throws and ERROR.

 

**************************************************************;
*Create a Numeric Informat.                                   ;
**************************************************************;
Proc Format Library=WORK;
 InValue PartNumber 1      =8526961 
                    2,5,18 =6926656 
                    3      =3181908 
                    Other  =-9999999
                    ;
Run;

**************************************************************;
*Create the list of items ordered.                            ;
**************************************************************;
Data ORDERS;
 Input  OrderID
        ItemOrdered 
        ;

DATALINES;
555 3
847 1
;
Run;

**************************************************************;
*Use a DATA step to test out our Informat.                    ;
**************************************************************;
Data ORDERS_DATASTEP;
 Set ORDERS;

 Part_To_Pick=Input(ItemOrdered, PartNumber.);
Run;

**************************************************************;
*Use PROC SQL to test out our Informat.                       ;
**************************************************************;
Proc SQL;
 Create Table ORDERS_PROCSQL As
 Select OrderID,
        ItemOrdered,
        Input(ItemOrdered, PartNumber.) As Part_To_Pick
 From   ORDERS
;
Quit;

When the PROC SQL step runs, we get the message:

ERROR: INPUT function requires a character argument.

To get the expression with the INPUT function to work in PROC SQL it could be coerced into a character value with a nested PUT function, but I think PROC SQL should be amended so that we do not have to do this extra step.

Input(Put(ItemOrdered, 12.), PartNumber.) As Part_To_Pick

Cheers,

Downunder Dave
New Zealand

 

 

 

 

11 Comments
Kurt_Bremser
Super User

This is perfectly OK behavior.

Look at the log of the data step:

 101        Data ORDERS_DATASTEP;
 102         Set ORDERS;
 103        
 104         Part_To_Pick=Input(ItemOrdered, PartNumber.);
 105        Run;
 
 NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
       104:21   

The data step does an automatic type conversion for you, SQL does not. This is not "different behavior of the INPUT function", this is "different behavior of PROC SQL vs. DATA step".

The data step just covers up your sloppy programming (see Maxims 25 & 31). The numeric value should be explicitly converted to a string with an explicit format before using it as argument of the INPUT function.

 

In the given example, I take it that you want to set explicit part numbers, something that is best stored as character, so you can do that with a format and the PUT function.

DaveShea
Lapis Lazuli | Level 10

Hi Kurt,

Well, I guess we will have to agree to disagree.

I simply want consistency between the DATA step and PROC SQL, either they both spit an ERROR or they both issue a NOTE.

In my contrived example for this post, my client stores their Part Numbers as numeric values, and that is their business how they store their data.

Thank you for your kind words though.

.

Downunder Dave

 

 

mkeintz
PROC Star

First, it's just too late. Too many production programs are out there that are (knowingly or unknowingly) dependent on this behavior - especially the programs using the data step.

 

Second, I don't agree with the use of "consistency" in this proposal.  This interpretation of consistency would rely on not attending to explicit notes in the log about treatment of data types - a fundamental concept in SAS.

 

But ... you have an out.  Take a look at the VARINITCHK option .  I looks like you can set it to ERROR, which I believe will stop the data step from doing the automatic conversion using INPUT.   Not true, misinterpreted my test program.

Tom
Super User
Super User

This has nothing to do with the INPUT() function.  The issue is that SQL syntax does not support the automatic conversion (casting) of numbers to character strings (and reverse) that SAS syntax supports.  PROC SQL is an attempt to implement ANSI SQL syntax.  So when they built PROC SQL to implement SQL standard they decided support the SQL standard rather than to deviate from the standard to support this SAS language specific feature.

 

To convert numbers to numbers using formats and/or informats you will need a two step process. Formats convert values to text and informats convert text for values.  So text is always involved.  The data step you posted is doing one of steps implicitly.  The SQL syntax is requiring that you do both steps explicitly. 

DaveShea
Lapis Lazuli | Level 10

Hi All,

 

Thank you for your input.

 

I don't believe we are too late to change anything in SAS. I thought that was the purpose of the SAS Ballot.

 

My intention is, and remains, to get PROC SQL to behave the same way as the DATA step when faced with an expression in the SELECT statement containing an INPUT function such as:

 

Input(ItemOrdered, PartNumber.) As Part_To_Pick

 

I am not proposing a change that would break other people's PROC SQL code, simply make it more flexible. When I am trying to explain to new or non-SAS users why it is necessary to use a nested PUT function in this context, eyes glaze over very quickly. I would be happy that PROC SQL issues the same SASLOG note that the DATA step issues, and I'd even accept an option on the PROC statement such as TREAT_INPUT_FUNCTION_LIKE_DATA_STEP=YES (maybe a little wordy 🤔).

 

As to compliance with ANSI SQL, the current PROC SQL documentation (https://tinyurl.com/uhhe54d7) makes it clear that "PROC SQL follows most of the guidelines set by the American National Standards Institute (ANSI) in its implementation of SQL. However, it is not fully compliant with the current ANSI standard for SQL."

 

Other posters have made it clear that PROC SQL is not quite ANSI, some examples being:

  • I cannot use FORMAT=xxxxx in my Oracle SQL because that is a SAS only concept.
  • I cannot over-write an existing table in Oracle during a CREATE TABLE, as I can in PROC SQL. The SASLOG shows neither NOTE nor WARNING when I do this.
  • I can even use an option on PROC SQL to over-write a table that I am currently reading from without raising a WARNING in the SASLOG (UNDO_POLICY=NONE)

PROC SQL clearly already has deviations from ANSI SQL standard that do not seem to have raised anyone's ire to date.

 

I think I will leave it at that, and see if others decide to give it a "Like" or not.

 

ngā mihi

Downunder Dave

ChrisNZ
Tourmaline | Level 20

You needn't use the put() function, the simpler cat() function suffices most times.

DaveShea
Lapis Lazuli | Level 10

Chris,

I can see that by substituting the nested PUT function for a nested CAT function, I have made the Proc SQL expression slightly simpler,thanks for that idea.

Input(PUT(ItemOrdered, 12.), PartNumber.) As Part_To_Pick
now becomes
Input(CAT(ItemOrdered), PartNumber.) As Part_To_Pick

And oddly, if I used the same expression in the DATA step, it suddenly goes all coy about issuing NOTEs regarding to Numeric to Character conversions, presumably because the CAT function has done that conversion, but silently. I am not really sure that is a great move either.

DaveShea_0-1637982851502.png

 

Ho hum, thank you for your contribution.

 

ngā mihi

Downunder Dave

ChrisNZ
Tourmaline | Level 20

And oddly, if I used the same expression in the DATA step

Nothing odd. the cat functions apply the best12. format when encountering numeric values.

 

>  I am not really sure that is a great move either.

I'd argue it is good: It (hopefully) shows the user has thought out the process and has taken care of the conversion, instead of the note showing that the user is sloppy and the result may not be as intended.

 

Tom
Super User
Super User

I doubt SAS will implement your idea, mainly because it would probably be very difficult.  I suspect that PROC SQL does not know as much about the variables it will be operating on in advance as the DATA step does so it might not know that you have the wrong type until it is actually running the code.  Plus they have to worry about how to pass the code thru to the external database if you are using SAS/Access engines for any of the libraries being used, which would make the implication that much more complicated.

 

Might be nice to have the reverse however. A user settable option to cause implicit numeric->character and/or character->numeric conversions in a data step to generate an error or warning instead of a note.

 

PS The CAT..() series of functions do an implicit conversion without any notification by design, check the documentation.  And unlike the implicit conversion done by the DATA step they do not use the BEST12 format but something more like the BEST32. format.

ChrisNZ
Tourmaline | Level 20

@Tom You're right, it's not best12. Or at least not always.

It seems SAS missed an opportunity to do this right once and for all. Why oh why?

data T; 
 N=123456789012345; 
 C=cat(N);            F='CAT'; output;
 C=put(N,best32. -l); F='B32'; output;
 C=put(N,best12. -l); F='B12'; output;
 
 N=1234567890.12345;
 C=cat(N);            F='CAT'; output;
 C=put(N,best32. -l); F='B32'; output;
 C=put(N,best12. -l); F='B12'; output;
 
 format N best20. C $20.;
run;
N C F
123456789012345 123456789012345 CAT
123456789012345 123456789012345 B32
123456789012345 1.2345679E14 B12
1234567890.12345 1234567890.1 CAT
1234567890.12345 1234567890.12345 B32
1234567890.12345 1234567890.1 B12