BookmarkSubscribeRSS Feed
RossW
Calcite | Level 5

I have the following code:

PROC SQL;

CONNECT TO ODBC(dsn='SomeDataSource');

CREATE TABLE Data_1 AS

SELECT * FROM CONNECTION TO ODBC(select col1, col2, col3 from SomeTable);

QUIT;

In the source DB col1, col2, col3 are all varchar(255)  It is MS SQL Server if that matters.

My problem is that the size of the created table in SAS is very large relative to the data.  More specifically, it looks like the table is creating all columns as char instead of varchar.  We do not have any records where the length of the contents of col1, col2, or col3 are the full 255 characters.  If I change the SQL portion to convert(varchar(10), col1) the resulting table is much smaller, and no data is lost as these columns are short.  I don't have the ability to modify the source DB, and am forced to convert many of the columns to reduce the size of the table.  I am able to get it down to under 20 MB instead of over 100 MB.

I am sure I am doing something wrong and that there must be a better way for me to do it but I can't for the life of me figure it out.

Any help would be appreciated.

8 REPLIES 8
PGStats
Opal | Level 21

How about telling SAS what you want the string lengths to be :

PROC SQL;

CONNECT TO ODBC(dsn='SomeDataSource');

CREATE TABLE Data_1 AS

SELECT col1 length=10, col2 length=10, col3 length=10

FROM CONNECTION TO ODBC(select col1, col2, col3 from SomeTable);

QUIT;

PG

PG
RossW
Calcite | Level 5

Disclaimer, I am totally new to this and I may be totally out of touch with reality.

I am hoping there is an easier and more maintainable way.  Since I don't have control of source DBs, I don't know when they change and when my defined data types are no longer correct.

In my perfect world, SAS would inspect the data types of the data set and create variable length columns based on that.

PGStats
Opal | Level 21

Internally, SAS supports only two data types: numeric (float, 1 to 8 bytes) and character (fixed length).Upon import, everything is converted to these datatypes.  Boolean, date and integer datatypes are numerics with different formats. Other datatypes can be created when exporting, depending on the format associated with columns.

PG

PG
RossW
Calcite | Level 5

Thanks for the info.  I see that SAS is doing the best it can with it... sadly it is very inefficient.

PGStats
Opal | Level 21

I also thought that at first but it turns out to be efficient enough for most purposes. Look at the COMPRESS= dataset option if you need to save space.

PG

PG
PGStats
Opal | Level 21

In some circumstances it may be more efficient to define your dataset as a view (CREATE VIEW Data_1 AS...) that will take no space at all but will access your database every time you use it.

PG

PG
RossW
Calcite | Level 5

Awesome feedback and suggestions everyone!!!

I really appreciate the advice.  I am going to try both views and compression and see which works best for us.

Thank you again!

SASKiwi
PROC Star

In this situation, as PGStats has already mentioned, suggest you set OPTIONS COMPRESS = BINARY; prior to importing all of your tables via ODBC. Using this we compact most of the database tables we import by 80% or more.

In fact we maintain OPTIONS COMPRESS = BINARY; as our session default so that all downstream datasets are similary compressed and this enhances runtime performance as well as saving space. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 905 views
  • 6 likes
  • 3 in conversation