DATA Step, Macro, Functions and more

More efficient way to copy with a where statement

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 128
Accepted Solution

More efficient way to copy with a where statement

Hi,

 

I'm taking a dataset with 54 million records and want to subset it to another location with a where statement.  By default, i'm doing this:

    data out.PS&year._2015 (where=(hcfasaf="P"));
        set clm2015.ps&year.;
    run;

 

But it's taking about 45 minutes and i have to do 11 years worth.  I looked into proc copy, but there's no where= statement that works with that.

 

Any ideas?

Thanks,

Megan


Accepted Solutions
Solution
‎06-13-2016 09:10 AM
Super User
Posts: 5,497

Re: More efficient way to copy with a where statement

Your use of the WHERE statement could be improved.  By putting it on the DATA statement, the program reads in all observations and only subsets the output.  Move it here:

 

   data out.PS&year._2015;
        set clm2015.ps&year. (where=(hcfasaf="P"));
    run;

 

 

That way, the program will subset the observations on the way in instead of on the way out.

View solution in original post


All Replies
Super Contributor
Posts: 436

Re: More efficient way to copy with a where statement

Have you considered creating an index on column hcfasaf? Once you want to subset your data most approaches take the same path behind the scenes. If you say "another location" have a look at other aspecs as well, like network bandwidth and I/O contention. If your source and target reside on the same on the Sam disk(s) then that may actually be your problem and not SAS.

Regards Jan.
Frequent Contributor
Posts: 128

Re: More efficient way to copy with a where statement

Posted in reply to jklaverstijn

There already is an index on the dataset, but on different variables. If were to create an index, i'd have to create a new copy anyway, b/c i can't modify what's there or it'll remove what currently exists and what's there is standard.

 

And i'm copying to and from the same server.  I don't know what an aspec is.

Super User
Posts: 7,758

Re: More efficient way to copy with a where statement

If the data has to travel over the network (ie SAS and the dataset do not reside in the same host), then 45 minutes is actually quite fast.

If everything is on the same server, you may have the problem of competing disk accesses.

For operations like this, I try to make source and target storage physically separate (source on SAN, target on internal disks; or source on one physical volume, target on another).

Have you tried the (compress=yes) dataset option on the target dataset? With (large) string variables present this can significantly reduce storage use and therefore I/O.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super Contributor
Posts: 436

Re: More efficient way to copy with a where statement

With "aspects" I mean all those factors that determine the performance of the task at hand. There are many ways to do things in SAS and often performance relies heaviliy on the choices you make. But your SAS code it is not the only thing to look at. Your system components (CPU, memory, I/O) are important players here. If your source and target are on the same server there is a chance they are also on the same disks. For best performance it is important that this situation is avoided.

 

On a side note it is surprising to hear you need to recreate all indexes when adding one. Could you explain a bit more why this would be the case?

 

And please keep in mind that at the end of the day you may just have to wait, as the task you try to perform takes a while because of the large volumes you are processing.

 

- Jan.

Frequent Contributor
Posts: 128

Re: More efficient way to copy with a where statement

Posted in reply to jklaverstijn

I just meant that i can't modify existing indexes.  This isn't "my" dataset, it belongs to the department.  I can't go changing things like that.  Other people use these datasets.

PROC Star
Posts: 1,759

Re: More efficient way to copy with a where statement

[ Edited ]

1- You can create a new index without removing anything.

    For example using  proc sql; create index ...

 

2- Please post the log times so we know whether the issue is I/O (likely) or CPU.
    If you are copying to the same disk sub-system, speed will be low. To such en extend that it can be faster to
    make two copy operations. For example to the WORK library and then from WORK to the final destination.


3- As outlined by @Astounding, the where clause should probably be moved.

 

4- See https://communities.sas.com/t5/Base-SAS-Programming/How-to-speed-up-performance-of-Sas-9-4-TS1M3-thr... where speeding up data access is also discussed. Compression and memory usage in particular could make a difference in your case.

Solution
‎06-13-2016 09:10 AM
Super User
Posts: 5,497

Re: More efficient way to copy with a where statement

Your use of the WHERE statement could be improved.  By putting it on the DATA statement, the program reads in all observations and only subsets the output.  Move it here:

 

   data out.PS&year._2015;
        set clm2015.ps&year. (where=(hcfasaf="P"));
    run;

 

 

That way, the program will subset the observations on the way in instead of on the way out.

Frequent Contributor
Posts: 128

Re: More efficient way to copy with a where statement

Posted in reply to Astounding

Thanks!  I ran this on a test case and my original way was 2 minutes, this one was 14 seconds.  Smiley Happy

Trusted Advisor
Posts: 1,117

Re: More efficient way to copy with a where statement

I would follow @Astounding's suggestion if the subset is comparably small (i.e., say, only <50% of the observations meet the WHERE condition). If, on the contrary, the subset is large, a subsetting IF (if hcfasaf='P';) might be faster than the approach using WHERE.

Super User
Posts: 10,018

Re: More efficient way to copy with a where statement

[ Edited ]

This is best situation for PROC DS2.

 

 


libname y v9 '/folders/myfolders/';
data y.class;
 set sashelp.class;
run;

proc ds2;
data Female(overwrite=yes);
method run();
 set y.class;
 if sex='F';
end;
enddata;
run;
quit;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 366 views
  • 1 like
  • 7 in conversation