BookmarkSubscribeRSS Feed
abasbabatunde
Calcite | Level 5

Hi,

 

I am new to SAS but I have been getting up really fast. I am working with a dataset that I consider large for SAS because its taking 6 hours to set it into my work librabry. Can you help on what I can do to load it faster. its more than 20Million observation and just 93 variables but its taking forever to load. Help please. This is my code

 

data work.c2019;

set tmp1.ccaeo19;

run;

12 REPLIES 12
SASKiwi
PROC Star

Try dataset compression to improve performance. If that helps compress the TMP1 version too. You should also drop variables and / or rows you don't need. 

data work.c2019 (compress = yes);
  set tmp1.ccaeo19;
run;

I suspect you are running this on a PC given the slow performance. At the end of the day you should process large datasets on a server, not a PC to get best performance.

abasbabatunde
Calcite | Level 5
Thank you so much. I will try the compress and can you give me a clue as to processing data on a server. How please?
SASKiwi
PROC Star

Not sure what you mean exactly. Processing this size data on a remote SAS server should take a few minutes at most - I would guess maybe 2 or 3 minutes.

 

What SAS user interface are you using? SAS Studio, SAS Enterprise Guide? If you don't know, post a screenshot.

ballardw
Super User

Where is this "loading" from?

abasbabatunde
Calcite | Level 5
from a share drive on my computer. Thats where the data is stored. if that answers your question
Patrick
Opal | Level 21

Instead of....

data work.c2019;
  set tmp1.ccaeo19;
run;

...you could code

proc datasets lib=work nolist nowarn;
  delete c2019;
  run;
quit;
proc append base=work.c2019(compress=yes) data=tmp1.ccaeo19;
run;quit;

This will read/write the data more efficiently. 

But.... I assume the bottleneck is network throughput and if you would just copy the SAS file with a copy command not using SAS it would take as long as it takes now using SAS.

 

I've added the Proc Datasets/Delete in above code so that if you re-run your code it first removes the table in work potentially created in an earlier run.

Patrick
Opal | Level 21

Following my assumption that network throughput is the bottleneck you could also use Robocopy (latest version) or something similar (if available) that allows for compression of data in transit.

Reeza
Super User

Your code is a data step which is designed to process data. What you're actually trying to do, is copy the data set from an external drive to your working library in SAS (C drive). This is good, but not super necessary, you could either work with it where it is, or you could copy it using PROC COPY/DATASETS. This would copy the data set over in chunks of data, whereas a data step is literally doing it line by line. 

 

 

proc datasets lib=tmp1 nolist;
copy out=work;
select ccaeo19;
run; quit;

 

 

You could also add the compression option via PROC DATASETS as well. 

 

The other thing you may want to do, if you have enough memory which you may not, is to load the dataset to your memory to work with as much as possible. Otherwise you may need to split a data set that big to work on a desktop efficiently. A server is basically a bigger computer with extra power, mine has 512GB RAM and 3TB of storage for example, compared to my laptop which is 16GB RAM, 1TB space. 

 

The SASFILE statement lets you load the data to memory. 

 

I'd also spend a bit of time optimizing your data set, ensuring the variable have the smallest length and any unneeded variable are dropped. 

 


@abasbabatunde wrote:

Hi,

 

I am new to SAS but I have been getting up really fast. I am working with a dataset that I consider large for SAS because its taking 6 hours to set it into my work librabry. Can you help on what I can do to load it faster. its more than 20Million observation and just 93 variables but its taking forever to load. Help please. This is my code

 

data work.c2019;

set tmp1.ccaeo19;

run;


 

 

abasbabatunde
Calcite | Level 5
Thanks so much. I tried as it is but it wont let me, It requires some authorization to work as it is stored in the drive which I dont have. I willn try the proc dataset.
Also can you explain or direct me to where I can learn how to reduce variable lenght and size even before loading into work area?
Kurt_Bremser
Super User

Your problem is the network connection, and there's nothing you can do about it. Changing the dataset on the remote share will take even longer, as your SAS process will have to read and write over the network.

 

How do you start SAS Studio? This will tell us a lot about your SAS setup.

I suspect that you run Studio from a server that has the network share mounted, but you can't use it from your local PC.

Reeza
Super User

I don't know what that means.

What did you try and what didn't work?

 

You're using SAS Studio, do you know if it's installed locally on your computer or is it running off a server?

This should tell you some of that information:

 

proc product_status;run;

 


@abasbabatunde wrote:
Thanks so much. I tried as it is but it wont let me, It requires some authorization to work as it is stored in the drive which I dont have. I willn try the proc dataset.
Also can you explain or direct me to where I can learn how to reduce variable lenght and size even before loading into work area?

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 2871 views
  • 9 likes
  • 6 in conversation