BookmarkSubscribeRSS Feed
anshulgoel
Calcite | Level 5

There is a code...

data temp;

set temp1(keep = a b c d);

run;

Can we reduce the time by any means??

Actually the dataset(temp1) has 7 million rows and it takes a long time....is there anything we can do about it?...please help....

17 REPLIES 17
DBailey
Lapis Lazuli | Level 10

The only thing your doing is making a copy of temp1 with a subset of the columns?

anshulgoel
Calcite | Level 5

no...i am copying the dataset form server.....otsia huge dataset....so it takes a lot of time...i am creating new dataset at my local machine...

LinusH
Tourmaline | Level 20

If this is a common task, you want to do an infrastructure review, so you can avoid any bottlenecks. This would include the server (CPU, I/O), network and speed of your PC (again, CPU and disk I/O).to

Do you copy the table because the server connection is slow, or do you have laptop and want to bring your data from the office?

Data never sleeps
Astounding
PROC Star

A couple of things to consider ...

Do you really need to create this data set at all, or could you use TEMP1 for analysis?

The setting for the COMPRESS option can affect many of the resources required.  You could run a PROC OPTIONS to see your default setting for COMPRESS.  While this may reduce the time, it may increase the use of other resources:

data temp (compress=NO);

set temp1 (keep=a b c d);

run;

The results will depend on characteristics of your data, so it is difficult to predict ahead of time.

Good luck.

LinusH
Tourmaline | Level 20

If you need to do this copy at all (what is the requirement for this task?), why is both source ad target tables in SASwork? What is the application?

If you have read from a permanent location, you could consider moving your source table to SPDE, which allow you to do multi-threaded table scans.

Data never sleeps
OS2Rules
Obsidian | Level 7

Hi:

It also depends on where your temp1 dataset is located.  If it is on a SAN or in Oracle for example, the I/O for the file could be what is taking all the time.

I found that a SQL sometimes works faster than a data step in these cases....

PROC SQL;

CREATE TABLE TEMP as

SELECT A, B,. C, D

FROM TEMP1;

QUIT;

anshulgoel
Calcite | Level 5

hi guyss.....actually i am taking data from server....so the temp1 dataset is at server location.....and i am creating the new dataset at my local desktop......so just wanted to check up if there is any technique which can be applied so that time is reduced to get those columns.....its a huge data......so it takes 2 hours for only this set of statements.....

Astounding
PROC Star

Unless you have a very old PC, 2 hours is longer than it would take to run the program if both data sets were located on the PC.

One possibility would be to use PROC COPY to copy the data set.  You would have to copy the entire data set, and then run a second program later on the PC to subset the variables.  That combination could easily be faster, but it would depend on how many variables are in the original data set.

If you have the storage space on your PC, it's worth testing how long PROC COPY would take.

anshulgoel
Calcite | Level 5

Hi....

Actually the dataset is too big to copy it on local drive....it will take a very long time....around 2 days....

so the connection to the server will break after 2-3 hours....

i am getting the data frm the server database....so i need to think of a measure where the execution time is less.....

i cop the data because i use laptop and the whole data is on the server....

LinusH
Tourmaline | Level 20

Is there a SAS server involved in the setup, or do you just have a bunch of PC licenses and a shared disk on the network.

Sounds like that you are in need for a more centralized data store, with a compute server. This kind of setup is less dependant on network bandwidth. Such setup can be solved either by SAS/CONNECT or by having Enterprise Guide clients talking to a SAS WorksSpace Server (part of Intelligence platform). The actual SAS module for this is Integration Technologies, bot often part of SAS offered bundles (BI Server, DI Server etc).

Data never sleeps
Patrick
Opal | Level 21

The code sample you've given us in your original post...

data temp;

set temp1(keep = a b c d);

run;

...creates SAS table "temp" in SAS Work using table "temp1" also in SAS Work. So this would run on the same server and on the same file system. Also you're only copying 4 columns in your sample code. From what you describe this sample code does not represent your reality and you will need to tell us more in detail where your data is coming from and how your code actually looks like.

7M rows are not that much if it's only about 4 variables so I must assume that the way you've written your real code actually copies the full data to your laptop before it sub-sets it.

If you tell us exactly how your environment looks like - eg: Source data resides in a DBMS (which one), there is a remote server to which you can connect via SAS\Connect, etc. - and if you post your real code then I'm sure someone here can come up with some ideas of how to improve performance (eg: readbuf, multithreading, spde enging, ....).

By the way: Which SAS version are you on? Since SAS 9.4 PROC DS2 is production and allows for multithreading with SAS datasets.

ChrisNZ
Tourmaline | Level 20

To download a data set, proc download is the fastest method.

Patrick
Opal | Level 21

Hi Chris

I believe with Proc DS2 now being production in SAS9.4 and allowing for multi-threading also for SAS data sets, using such an approach could be even faster than Proc Download.

ChrisNZ
Tourmaline | Level 20

I seriously doubt it Patrick.

1- We are transfering data between hosts, proc DS2 was never optimised for this at all, proc download was

2- Multithreading only makes sense if the CPU is the bottleneck, which I doubt is the case here (what are the log figures?)

   Usually I/O is the bottleneck, and for a download, the network is even slower and becomes the bottleneck.

   If you run 10 processes instead of one when the network or the disk is saturated, there is no point. It actually makes things worse.

3- Even for data steps that run on one host, using proc DS2 with several threads is no guarantee things will speed up. Quite the opposite in many cases.

    Proc DS2 is a solution to some problems, but certainly not a cure-all.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 1202 views
  • 1 like
  • 8 in conversation