BookmarkSubscribeRSS Feed
mitchell_keener
Obsidian | Level 7

Hi all, 

I am using SAS9.4 TS1M5.

 

This is just a general question I had that is not necessarily a coding question that I am struggling to get an answer to. I have a 3gb dataset with roughly 100 variables and about 1 million rows. This is not a terribly large dataset I would think, but simply reading it into SAS is taking about 5 minutes of real time and only 3.5 seconds of CPU time. I have been researching some ways to improve efficiency and I came across discussion of the memsize alloted to the user. I checked and my admin has only allotted 2gb of memsize to me. Most of the SAS discussion with memsize seems to come from error messages that arise when memory is maxed. I am not getting those errors, but I was wondering if increasing my memsize could be helpful here even if I am not getting errors. I do not want to bother the administrators if this would not be the problem. I also tried compressing the dataset which led to a large reduction in size, but still did not seem to impact how long it was taking to load. If anyone has any suggestions I would really appreciate the advice. Thanks. 

6 REPLIES 6
ballardw
Super User

By "reading" do you mean reading an external file to create a SAS data set or using an existing SAS data set?

In data step or Proc Code (and possibly which procs).

 

Some details may be needed.

 

In some cases you may find that network or disk IO performance is more of an issue then memory. Also if you are connecting to a non-SAS data source such as an Oracle or DB2 database may have options to improve through put.

mitchell_keener
Obsidian | Level 7

Thanks for the reply. In this case, I was reading an existing permanent SAS dataset. This was just as basic a datastep as they come, just reading the permanent dataset into a working dataset. Part of this is just I do not necessarily know what would be about par real time for this size dataset. Is 5 minutes extremely excessive indicating a problem or is that just the time it takes. Working through these kinds of issues is a bit new to me, and the memsize seemed to be a popular problem and many complained 2gb was too little for them. If there are any other details I could provide you that would help me get to the bottom of it, please let me know. 

 

 

Kurt_Bremser
Super User

A data step needs next to no memory at all, only space for the compiled code and the PDV, which is usually equal to the observation size if your dataset.

The difference between your CPU time and real time points to a problem with I/O; if your source dataset resides on a network share or in a remote DBMS, the network will be your bottleneck.

For more detailed analysis please provide your data step code and the log from it when you run it with

options fullstimer;
andreas_lds
Jade | Level 19
Sounds like a serious problem with the connection between your sas session an the place where the data is stored. I have seen similar problems when a dataset is opened via windows explorer.
Astounding
PROC Star

The size numbers that you posted suggest that the data was read in poorly when constructing the SAS data set.  You can verify this by running PROC CONTENTS on the data set and inspecting the size of the variables  I would expect that many variables use much more space than they require.  That would be the starting point, then we can examine ways to reduce variable size once you confirm that it's a problem.  When you examine the output of PROC CONTENTS, I expect you would find (for example) character variables taking up 200 bytes, when they only require 1 byte.  

ChrisNZ
Tourmaline | Level 20

I agree with the comments so far. I doubt memory is the issue, please check as advised by @Kurt_Bremser .

Furthermore:

1. How long does it take the copy the file from its present location to its new location using a windows/unix command?

2. You can see the compression ratio of the existing table by running:

 

proc sql;
  select PCOMPRESS
  from DICTIONARY.TABLES 
  where LIBNAME="YOURLIB" & MEMNAME='YOURTAB';

3. You can test the read speed by reading without any processing or output.

 

 

data _null_;
  set YOURLIB.YOURTAB;
run;

4. You might be able to increase the I/O speed by increasing the number of read buffers and/or disabling caching (windows assumed here).

 

data _null_;
  set YOURLIB.YOURTAB(bufno=128 sgio=yes);
run;

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 926 views
  • 1 like
  • 6 in conversation