- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
HI all,
i am reading from a ms sql server using the sas passthrough query.
PROC SQL; CONNECT TO ODBC(DSN=SQLDSN USER='' PASS='' DBMAX_TEXT=1024); CREATE TABLE SASOUT AS SELECT * FROM CONNECTION TO ODBC (SELECT COLUMN1, COLUMN2 FROM DB.DBO.TBL); DISCONNECT FROM ODBC; QUIT;
the colum1 and column2 in sql server are of type nvarchar
in the sas dataset the length of these columns shows as 32767
i was hoping that the DBMAX_TEXT option would help restrict the length of the columns but that doesnt seem to work here.
If any one has any suggestions on how i can restrict the length then please do let me know ?
SAS Version - SAS 9.4 M7 (UTF Encoding)
SAS/ACCESS Interface to ODBC
odbc - MSSQL ODBC driver 17
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think that option only applies to CLOBs or other objects converted to text as part of the transfer.
Either cast the value before transferring or change the length after transfer. If you use the later option watch out as SAS has a nasty habit of attaching $nnn formats to characters variables pulled from remote databases. So you might end up with the a character variable that is only 20 bytes long but still has the $32767. format attached to it. Plus there is no syntax in PROC SQL for removing unwanted formats from variables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think that option only applies to CLOBs or other objects converted to text as part of the transfer.
Either cast the value before transferring or change the length after transfer. If you use the later option watch out as SAS has a nasty habit of attaching $nnn formats to characters variables pulled from remote databases. So you might end up with the a character variable that is only 20 bytes long but still has the $32767. format attached to it. Plus there is no syntax in PROC SQL for removing unwanted formats from variables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is one way:
PROC SQL;
CONNECT TO ODBC(DSN=SQLDSN USER='' PASS='' DBMAX_TEXT=1024);
CREATE TABLE SASOUT AS
SELECT COLUMN1 length = 1024
,COLUMN2 length = 1024
FROM CONNECTION TO ODBC
(SELECT COLUMN1, COLUMN2 FROM DB.DBO.TBL);
DISCONNECT FROM ODBC;
QUIT;
Also since the major downside is the increased disk space, you could compress the output SAS dataset and not worry about column lengths:
CREATE TABLE SASOUT (compress = yes) .....
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Casting or manually defining the length of each the columns does work , it just needs an extra effort at our end as there are a lot of columns and multiple codes .
Also we have options compress=yes enabled for for entire session itself.
but these new datastets even though compressed take up a lot of time during sorts.
Till now we were using a datadirect driver which somehow supported the dbmax_text option.
anyway i think casting seems like the best option i have for now.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To do the work only once have the owner of the database create views for you to use that convert the types on the database side to something more reasonable. Since it is their data they should know better what are the actual maximum lengths for those fields.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I've struggled with this same (very) frustrating issue. There are a couple of ways of limiting the lengths of character fields using SQL. I'm using Hive tables, so the syntax below is Hive syntax, but presumably SQL Server would have similar functions available.
- Substring and cast as Char (or VarChar) with a specified length. SAS handles Char/VarChar with a specified length quite well. For example, if the maximum length of a type String column in Hive named Gmid is 40, then "SELECT Gmid" becomes "CAST(SUBSTR(Gmid, 1, 40) AS CHAR(40)) AS Gmid". If you don't know the maximum length, then a default, such as 255 or whatever might work with your data, can be applied.
- As @SASKiwi suggests use LENGTH statements on the SAS portion of the explicit pass through query. Since @SASKiwi has already gone over this, I refer you to his response in this thread. I will also reply to @SASKiwi's response because I've experienced a bit of a complication there.
If you're interested, I have macros that will take an Excel spreadsheet or a SAS data set and build either the Substring and Cast as described in #1, above, or code the LENGTH statements as described in #2, above. The macros save a lot of tedious coding.
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@NN - I too have noticed the idiosyncrasies of different SQL Server database drivers. I guess it is something we just we have to live with. BTW we have COMPRESS = BINARY as our session default and that works pretty well. We don't bother trimming column lengths as this appears to be enough to give us good performance.
However it is possible with significant effort to trim column lengths automatically. We have a data dictionary of all tables and columns we wish to extract and then have SAS macros generate the required SQL automatically including setting column lengths on the SAS side if required. That avoids any CASTing on the database side.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@SASKiwi wrote:
...since the major downside is the increased disk space, you could compress the output SAS dataset and not worry about column lengths:
CREATE TABLE SASOUT (compress = yes) ...
@SASKiwi, maybe I'm doing something wrong, but I haven't found using COMPRESS=CHAR (or YES) to be as efficient as I would like. Take a look at the below chart. When I've used a SUBSTR and CAST to optimize the column lengths, I save about 15 GB per run. My file sizes are fairly large, typically about 500 million per run, sometimes more. I'm assuming that the encoding of the compression takes some room inside the file and that physically optimizing the columns is actually more efficient.
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@jimbarbour - Try COMPRESS = BINARY. With our SAS tables that improved on COMPRESS = YES by around 10 percent although results vary depending on the mix of character and numeric variables. Fortunately our data volumes are much smaller than yours so the effort of trimming or casting everything didn't seem to be worth the effort. we were happy just to wear the overhead.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@SASKiwi wrote:
@jimbarbour - Try COMPRESS = BINARY.
I'll give that a try. I've read that if the majority of the columns are Character, then COMPRESS=CHAR should be used and if the majority of the columns are Numeric, then COMPRESS=BINARY should be used. However, in actual practice, I didn't find that such a hard and fast rule existed.
This is easy enough to test -- if I can find the disk space for two whole sets of this data!
Jim