BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NN
Quartz | Level 8 NN
Quartz | Level 8

HI all,
i am reading from a ms sql server using the sas passthrough query.

PROC SQL;
CONNECT TO ODBC(DSN=SQLDSN USER='' PASS='' DBMAX_TEXT=1024);
CREATE TABLE SASOUT AS SELECT * FROM CONNECTION TO ODBC
(SELECT COLUMN1, COLUMN2 FROM DB.DBO.TBL);
DISCONNECT FROM ODBC;
QUIT;

the colum1 and column2 in sql server are of type nvarchar
in the sas dataset the length of these columns shows as 32767
i was hoping that the DBMAX_TEXT option would help restrict the length of the columns but that doesnt seem to work here.
If any one has any suggestions on how i can restrict the length then please do let me know ?

SAS Version - SAS 9.4 M7 (UTF Encoding)
SAS/ACCESS Interface to ODBC
odbc - MSSQL ODBC driver 17

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

I think that option only applies to CLOBs or other objects converted to text as part of the transfer.

Either cast the value before transferring or change the length after transfer.  If you use the later option watch out as SAS has a nasty habit of attaching $nnn formats to characters variables pulled from remote databases.  So you might end up with the a character variable that is only 20 bytes long but still has the $32767. format attached to it.  Plus there is no syntax in PROC SQL for removing unwanted formats from variables.

View solution in original post

9 REPLIES 9
Tom
Super User Tom
Super User

I think that option only applies to CLOBs or other objects converted to text as part of the transfer.

Either cast the value before transferring or change the length after transfer.  If you use the later option watch out as SAS has a nasty habit of attaching $nnn formats to characters variables pulled from remote databases.  So you might end up with the a character variable that is only 20 bytes long but still has the $32767. format attached to it.  Plus there is no syntax in PROC SQL for removing unwanted formats from variables.

SASKiwi
PROC Star

This is one way:

PROC SQL;
CONNECT TO ODBC(DSN=SQLDSN USER='' PASS='' DBMAX_TEXT=1024);
CREATE TABLE SASOUT AS 
SELECT COLUMN1 length = 1024
      ,COLUMN2 length = 1024  
FROM CONNECTION TO ODBC
(SELECT COLUMN1, COLUMN2 FROM DB.DBO.TBL);
DISCONNECT FROM ODBC;
QUIT;

 Also since the major downside is the increased disk space, you could compress the output SAS dataset and not worry about column lengths:

CREATE TABLE SASOUT (compress = yes) .....

NN
Quartz | Level 8 NN
Quartz | Level 8

thanks @Tom and @SASKiwi 

Casting or manually defining the length of each the columns does work , it just needs an extra effort at our end as there are a lot of columns and multiple codes .

Also we have options compress=yes enabled for for entire session itself.
but these new datastets even though compressed  take up a lot of time during sorts.

Till now we were using a datadirect driver which somehow supported the dbmax_text option.

anyway i think casting seems like the best option i have for now.

Tom
Super User Tom
Super User

To do the work only once have the owner of the database create views for you to use that convert the types on the database side to something more reasonable.  Since it is their data they should know better what are the actual maximum lengths for those fields.

jimbarbour
Meteorite | Level 14

I've struggled with this same (very) frustrating issue. There are a couple of ways of limiting the lengths of character fields using SQL.  I'm using Hive tables, so the syntax below is Hive syntax, but presumably SQL Server would have similar functions available.

  1. Substring and cast as Char (or VarChar) with a specified length.  SAS handles Char/VarChar with a specified length quite well.  For example, if the maximum length of a type String column in Hive named Gmid is 40, then "SELECT Gmid" becomes "CAST(SUBSTR(Gmid, 1, 40) AS CHAR(40)) AS Gmid".  If you don't know the maximum length, then a default, such as 255 or whatever might work with your data, can be applied.
  2. As @SASKiwi suggests use LENGTH statements on the SAS portion of the explicit pass through query.  Since @SASKiwi has already gone over this, I refer you to his response in this thread.  I will also reply to @SASKiwi's response because I've experienced a bit of a complication there.

If you're interested, I have macros that will take an Excel spreadsheet or a SAS data set and build either the Substring and Cast as described in #1, above, or code the LENGTH statements as described in #2, above.  The macros save a lot of tedious coding.

 

Jim

SASKiwi
PROC Star

@NN  - I too have noticed the idiosyncrasies of different SQL Server database drivers. I guess it is something we just we have to live with. BTW we have COMPRESS = BINARY as our session default and that works pretty well. We don't bother trimming column lengths as this appears to be enough to give us good performance.

 

However it is possible with significant effort to trim column lengths automatically. We have a data dictionary of all tables and columns we wish to extract and then have SAS macros generate the required SQL automatically including setting column lengths on the SAS side if required. That avoids any CASTing on the database side.

jimbarbour
Meteorite | Level 14

@SASKiwi wrote:

...since the major downside is the increased disk space, you could compress the output SAS dataset and not worry about column lengths:

CREATE TABLE SASOUT (compress = yes) ...


@SASKiwi, maybe I'm doing something wrong, but I haven't found using COMPRESS=CHAR (or YES) to be as efficient as I would like.  Take a look at the below chart.  When I've used a SUBSTR and CAST to optimize the column lengths, I save about 15 GB per run.  My file sizes are fairly large, typically about 500 million per run, sometimes more.  I'm assuming that the encoding of the compression takes some room inside the file and that physically optimizing the columns is actually more efficient.

 

Jim

 

jimbarbour_0-1623268898214.png

 

SASKiwi
PROC Star

@jimbarbour  - Try COMPRESS = BINARY. With our SAS tables that improved on COMPRESS = YES by around 10 percent although results vary depending on the mix of character and numeric variables. Fortunately our data volumes are much smaller than yours so the effort of trimming or casting everything didn't seem to be worth the effort. we were happy just to wear the overhead.

jimbarbour
Meteorite | Level 14

@SASKiwi wrote:

@jimbarbour  - Try COMPRESS = BINARY. 


I'll give that a try.  I've read that if the majority of the columns are Character, then COMPRESS=CHAR should be used and if the majority of the columns are Numeric, then COMPRESS=BINARY should be used.  However, in actual practice, I didn't find that such a hard and fast rule existed.

 

This is easy enough to test -- if I can find the disk space for two whole sets of this data!

 

Jim

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2650 views
  • 8 likes
  • 4 in conversation