DATA Step, Macro, Functions and more

SQL creates bogus warning message

Reply
Super Contributor
Posts: 387

SQL creates bogus warning message

Hi,

I need to get the distinct values of formatted date periods for further processing in a macro.

Here is some test code:

data dates;
  start="01JAN2011"d;
  do i=0 to 23;
    monthStart=intnx("month",start,i,"B");
    monthEnd=intnx("month",start,i,"E");
    monyy=monthStart;
    yyqtr=monthStart;
    year=monthStart;
    output;
  end;
  keep monthStart monthEnd monyy yyqtr year;
  format monthStart monthEnd date9. monyy monyy. yyqtr yyq. year year.;
run;

proc sql noprint;
  select distinct put(year,year.) into Smiley Tongueeriods separated by " " from dates;
  %put period=&periods;

  select distinct put(yyqtr,yyq.) into Smiley Tongueeriods separated by " " from dates;
  %put period=&periods;

  * bogus error message ;
  select distinct put(monyy,monyy.) into Smiley Tongueeriods separated by " " from dates order by monthStart;
  %put period=&periods;

  * this works but is klunky ;
  select distinct monthStart, put(monyy,monyy.) into :dummy, Smiley Tongueeriods separated by " " from dates order by monthStart;
  %put period=&periods;
quit;

For the final query, I want the monyy. dates to be processed in chronological rather than alphabetical order.

Is the test code above the best approach to avoid this bogus error message in SQL:

WARNING: The query as specified involves ordering by an item that doesn't appear in its SELECT clause. Since you are ordering the output of a SELECT DISTINCT it may appear that some duplicates have not been eliminated.

Is there any reason I should care that I'm ordering by an item that doesn't appear in its SELECT clause?  I just want SAS to "do the right thing" and not issue warning statements unless it's a real issue.

I found this Usage Note from 1993:  http://support.sas.com/techsup/unotes/V6/6/6925.html.

Little birdies:  why would this issue, marked as High Priority, be fixed for some operating systems, but still be an issue 8 years later on Windows?

Thanks,

Scott

Super User
Posts: 10,023

Re: SQL creates bogus warning message

Posted in reply to ScottBass

There is an option nowarn to suppress the annoying Warning Message.

proc sql noprint nowarn;

Ksharp

And why not directly use monyy as order by variable.

  select distinct monyy format=monyy. into Smiley Tongueeriods separated by " " from  dates order by monyy;

Super Contributor
Posts: 387

SQL creates bogus warning message

Hi KSharp,

Thanks.  Anyone know where this is documented?  I searched both the 9.2 and 9.3 SQL Procedure doc, and don't see the NOWARN option anywhere.

I want to order by monthStart because I want the periods ordered by chronological order, i.e JAN2011 FEB2011 MAR2011, not APR11 APR12 AUG11 AUG12 etc.

Thanks,

Scott

PROC Star
Posts: 7,468

SQL creates bogus warning message

Posted in reply to ScottBass

Scott,  You can't find it because it is one of those undocumented features!  In short, don't use it in production code as it could disappear with a future release.

Valued Guide
Posts: 2,177

SQL creates bogus warning message

Posted in reply to ScottBass

Scott

proc sql can output a numeric in a date format sorted in the internal order, (just what you wanted) with syntax only a littl;e adapted from your code.

Try instead of this:

select distinct put(monyy,monyy.) into Smiley Tongueeriods separated by " " from dates order by monthStart;

%put period=&periods;

this slightly changed syntax:

select distinct  monthStart format=monyy. into Smiley Tongueeriods separated by " " from dates order by monthStart;

%put period=&periods;

Does that provide what you wanted?

Super Contributor
Posts: 387

Re: SQL creates bogus warning message

Posted in reply to ScottBass

Hi All,

Thanks for the answers so far.  Much appreciated.

Perhaps a bit more detail will help put this in context.

I'm building a large dataset combining several dimension tables, deriving date crossings, and adding LOTS of measures.

The SQL join on the full datasets doesn't perform well, as it's doing a sort/merge join.  I'm writing code to convert this to a multi-table hash object join.  I think I've got a cool approach using surrogate keys.

The problem is, the hash object join on the full tables runs out of memory.  So, the approach we're looking to try is to segment the datasets (i.e. by year/qtr/month, depending on memory consumption), do the hash object join to a temp dataset, then build the final dataset using PROC APPEND of temp to target.  We're hoping that, even with the multiple passes on the datasests, that the hash object join(s) will perform better than the SQL join.  We'll use SPDE datasets indexed on the time dimensions, which will make the hash object loads run faster.

To filter the source tables, I have two possibilities:  1) store the time dimensions as formatted physical values, eg. monthStart format=year., or 2) as the formatted result, eg. put(monthStart,year.) = "2011" (or "2011Q1", or "2011NOV").

In the case of #1, the code to filter the dataset (say by QTR) would be " where put(monthStart,yyq.)='2011Q1' ", whereas in the case of #2, the code to filter the dataset would be " where qtr='2011Q1' ".

I believe #2 is going to perform much better, and I'll need to store the formatted string in the working dataset. (Note:  this approach is different than my original post).

OK, now back to the orginal question :-)...

If I store the data as in #1 and do:

select distinct monthStart format=year. into : periods separated by " " from dates order by monthStart;

select distinct monthStart format=yyq. into : periods separated by " " from dates order by monthStart;

select distinct monthStart format=yymon. into : periods separated by " " from dates order by monthStart;

I don't get distinct values for year or qtr, since monthStart is actually different - only the format groups them into "buckets".

If I do:

select distinct put(monthStart,year.) into : periods separated by " " from dates order by monthStart;

select distinct put(monthStart,qtr.) into : periods separated by " " from dates order by monthStart;

select distinct put(monthStart,yymon.) into : periods separated by " " from dates order by monthStart;

I again don't get distinct values for year or qtr due to the order by statement.

If I do:

select distinct put(monthStart,year.) into : periods separated by " " from dates;

select distinct put(monthStart,qtr.) into : periods separated by " " from dates;

select distinct put(monthStart,yymon.) into : periods separated by " " from dates order by monthStart;

I get the output I want, but only because "distinct" causes the data to be returned in sorted order, and only because I'm "lucky" that year (character) and qtr (character) sort in chronological order.

If I store the data as in #2, and do:

select distinct year into : periods separated by " " from dates;

select distinct qtr into : periods separated by " " from dates;

select distinct month into : periods separated by " " from dates order by monthStart;

I again get the output I want for the same reasons as above.

And in most of the examples above, I get the warning message about sorting on a variable not in the select statement.

What I wish PROC SQL did is not have the order by statement affect the distinct values returned. 

For example:  select distinct monthStart format=year. into : periods separated by " " from dates order by monthStart;

I wish SQL would use the order by statement to order the results by their physical value, then use the formatted value to group into "buckets", then apply the distinct option to dedup the formatted values.

If there is an system option or SQL option to do this I don't know what it is.

Finally, I go back to the TS Usage Note (the only one I could find on the subject).  This was identified as a high priority issue in 1993.  From the UNOTE, it appears that it was fixed for VM/ESA, OS/390, and Open VMS VAX.

Questions:

1) If I try the above on those operating systems, do I get the same SQL warning?

2) If not, then why hasn't the same fix been ported to other operating systems?

3) Should it be a warning at all?  Why should it be a warning if I want to sort the results of a select (distict) by a variable not in the select clause?  If I add the order by variable to the select and store the results in a dummy macro variable, I get the same results as before (for &period), but the warning goes away.  Seems like a red herring error message to me.

4) Why would SAS create a NOWARN option but then not document it?  Either there's situations where that option is useful (and perhaps indicative of poor initial design), or not.  If so, then document and support it!

5) If SAS did agree with my wish above, does existing code prevent them from fixing this issue in a future release?

Regards,

Scott

Valued Guide
Posts: 2,177

SQL creates bogus warning message

Posted in reply to ScottBass

Scott

surely it is not a good idea to code like

     where put(monthStart,yyq.)='2011Q1'

when  it demands that a PUT() function be applied, especially as there is an equivalent without that function

     where monthStart between '1-jan-2011'd and "31-Mar-2011"d

These date constants can be generated as easily as 2011Q1.

The alternative filtering  for months and years are as easy.

Super User
Super User
Posts: 7,039

Re: SQL creates bogus warning message

Posted in reply to ScottBass

Scott -

  I always just push those variables used in the ORDER BY clause into a dummy macro variable as in your last example.  For SAS to know that you meant to order the rows by variables not included in the output you need to give it some additional information.  Adding the extra variables into the SELECT and INTO clauses seems like a reasonable way to give it that information.

  The other work around is make a view that does the ordering and select from the view, but I am not sure if that would be any clearer.

Tom

Ask a Question
Discussion stats
  • 7 replies
  • 1727 views
  • 3 likes
  • 5 in conversation