Hi All, Thanks for the answers so far. Much appreciated. Perhaps a bit more detail will help put this in context. I'm building a large dataset combining several dimension tables, deriving date crossings, and adding LOTS of measures. The SQL join on the full datasets doesn't perform well, as it's doing a sort/merge join. I'm writing code to convert this to a multi-table hash object join. I think I've got a cool approach using surrogate keys. The problem is, the hash object join on the full tables runs out of memory. So, the approach we're looking to try is to segment the datasets (i.e. by year/qtr/month, depending on memory consumption), do the hash object join to a temp dataset, then build the final dataset using PROC APPEND of temp to target. We're hoping that, even with the multiple passes on the datasests, that the hash object join(s) will perform better than the SQL join. We'll use SPDE datasets indexed on the time dimensions, which will make the hash object loads run faster. To filter the source tables, I have two possibilities: 1) store the time dimensions as formatted physical values, eg. monthStart format=year., or 2) as the formatted result, eg. put(monthStart,year.) = "2011" (or "2011Q1", or "2011NOV"). In the case of #1, the code to filter the dataset (say by QTR) would be " where put(monthStart,yyq.)='2011Q1' ", whereas in the case of #2, the code to filter the dataset would be " where qtr='2011Q1' ". I believe #2 is going to perform much better, and I'll need to store the formatted string in the working dataset. (Note: this approach is different than my original post). OK, now back to the orginal question :-)... If I store the data as in #1 and do: select distinct monthStart format=year. into : periods separated by " " from dates order by monthStart; select distinct monthStart format=yyq. into : periods separated by " " from dates order by monthStart; select distinct monthStart format=yymon. into : periods separated by " " from dates order by monthStart; I don't get distinct values for year or qtr, since monthStart is actually different - only the format groups them into "buckets". If I do: select distinct put(monthStart,year.) into : periods separated by " " from dates order by monthStart; select distinct put(monthStart,qtr.) into : periods separated by " " from dates order by monthStart; select distinct put(monthStart,yymon.) into : periods separated by " " from dates order by monthStart; I again don't get distinct values for year or qtr due to the order by statement. If I do: select distinct put(monthStart,year.) into : periods separated by " " from dates; select distinct put(monthStart,qtr.) into : periods separated by " " from dates; select distinct put(monthStart,yymon.) into : periods separated by " " from dates order by monthStart; I get the output I want, but only because "distinct" causes the data to be returned in sorted order, and only because I'm "lucky" that year (character) and qtr (character) sort in chronological order. If I store the data as in #2, and do: select distinct year into : periods separated by " " from dates; select distinct qtr into : periods separated by " " from dates; select distinct month into : periods separated by " " from dates order by monthStart; I again get the output I want for the same reasons as above. And in most of the examples above, I get the warning message about sorting on a variable not in the select statement. What I wish PROC SQL did is not have the order by statement affect the distinct values returned. For example: select distinct monthStart format=year. into : periods separated by " " from dates order by monthStart; I wish SQL would use the order by statement to order the results by their physical value, then use the formatted value to group into "buckets", then apply the distinct option to dedup the formatted values. If there is an system option or SQL option to do this I don't know what it is. Finally, I go back to the TS Usage Note (the only one I could find on the subject). This was identified as a high priority issue in 1993. From the UNOTE, it appears that it was fixed for VM/ESA, OS/390, and Open VMS VAX. Questions: 1) If I try the above on those operating systems, do I get the same SQL warning? 2) If not, then why hasn't the same fix been ported to other operating systems? 3) Should it be a warning at all? Why should it be a warning if I want to sort the results of a select (distict) by a variable not in the select clause? If I add the order by variable to the select and store the results in a dummy macro variable, I get the same results as before (for &period), but the warning goes away. Seems like a red herring error message to me. 4) Why would SAS create a NOWARN option but then not document it? Either there's situations where that option is useful (and perhaps indicative of poor initial design), or not. If so, then document and support it! 5) If SAS did agree with my wish above, does existing code prevent them from fixing this issue in a future release? Regards, Scott
... View more