10-10-2013 12:16 PM
There is a code...
set temp1(keep = a b c d);
Can we reduce the time by any means??
Actually the dataset(temp1) has 7 million rows and it takes a long time....is there anything we can do about it?...please help....
10-10-2013 04:22 PM
If this is a common task, you want to do an infrastructure review, so you can avoid any bottlenecks. This would include the server (CPU, I/O), network and speed of your PC (again, CPU and disk I/O).to
Do you copy the table because the server connection is slow, or do you have laptop and want to bring your data from the office?
10-10-2013 01:24 PM
A couple of things to consider ...
Do you really need to create this data set at all, or could you use TEMP1 for analysis?
The setting for the COMPRESS option can affect many of the resources required. You could run a PROC OPTIONS to see your default setting for COMPRESS. While this may reduce the time, it may increase the use of other resources:
data temp (compress=NO);
set temp1 (keep=a b c d);
The results will depend on characteristics of your data, so it is difficult to predict ahead of time.
10-10-2013 03:08 PM
If you need to do this copy at all (what is the requirement for this task?), why is both source ad target tables in SASwork? What is the application?
If you have read from a permanent location, you could consider moving your source table to SPDE, which allow you to do multi-threaded table scans.
10-10-2013 03:10 PM
It also depends on where your temp1 dataset is located. If it is on a SAN or in Oracle for example, the I/O for the file could be what is taking all the time.
I found that a SQL sometimes works faster than a data step in these cases....
CREATE TABLE TEMP as
SELECT A, B,. C, D
10-10-2013 03:18 PM
hi guyss.....actually i am taking data from server....so the temp1 dataset is at server location.....and i am creating the new dataset at my local desktop......so just wanted to check up if there is any technique which can be applied so that time is reduced to get those columns.....its a huge data......so it takes 2 hours for only this set of statements.....
10-10-2013 04:00 PM
Unless you have a very old PC, 2 hours is longer than it would take to run the program if both data sets were located on the PC.
One possibility would be to use PROC COPY to copy the data set. You would have to copy the entire data set, and then run a second program later on the PC to subset the variables. That combination could easily be faster, but it would depend on how many variables are in the original data set.
If you have the storage space on your PC, it's worth testing how long PROC COPY would take.
10-11-2013 05:08 AM
Actually the dataset is too big to copy it on local drive....it will take a very long time....around 2 days....
so the connection to the server will break after 2-3 hours....
i am getting the data frm the server database....so i need to think of a measure where the execution time is less.....
10-11-2013 07:12 AM
Is there a SAS server involved in the setup, or do you just have a bunch of PC licenses and a shared disk on the network.
Sounds like that you are in need for a more centralized data store, with a compute server. This kind of setup is less dependant on network bandwidth. Such setup can be solved either by SAS/CONNECT or by having Enterprise Guide clients talking to a SAS WorksSpace Server (part of Intelligence platform). The actual SAS module for this is Integration Technologies, bot often part of SAS offered bundles (BI Server, DI Server etc).
10-11-2013 07:21 AM
The code sample you've given us in your original post...
set temp1(keep = a b c d);
...creates SAS table "temp" in SAS Work using table "temp1" also in SAS Work. So this would run on the same server and on the same file system. Also you're only copying 4 columns in your sample code. From what you describe this sample code does not represent your reality and you will need to tell us more in detail where your data is coming from and how your code actually looks like.
7M rows are not that much if it's only about 4 variables so I must assume that the way you've written your real code actually copies the full data to your laptop before it sub-sets it.
If you tell us exactly how your environment looks like - eg: Source data resides in a DBMS (which one), there is a remote server to which you can connect via SAS\Connect, etc. - and if you post your real code then I'm sure someone here can come up with some ideas of how to improve performance (eg: readbuf, multithreading, spde enging, ....).
By the way: Which SAS version are you on? Since SAS 9.4 PROC DS2 is production and allows for multithreading with SAS datasets.
10-23-2013 07:44 PM
I believe with Proc DS2 now being production in SAS9.4 and allowing for multi-threading also for SAS data sets, using such an approach could be even faster than Proc Download.
10-24-2013 01:42 AM
I seriously doubt it Patrick.
1- We are transfering data between hosts, proc DS2 was never optimised for this at all, proc download was
2- Multithreading only makes sense if the CPU is the bottleneck, which I doubt is the case here (what are the log figures?)
Usually I/O is the bottleneck, and for a download, the network is even slower and becomes the bottleneck.
If you run 10 processes instead of one when the network or the disk is saturated, there is no point. It actually makes things worse.
3- Even for data steps that run on one host, using proc DS2 with several threads is no guarantee things will speed up. Quite the opposite in many cases.
Proc DS2 is a solution to some problems, but certainly not a cure-all.