BookmarkSubscribeRSS Feed
juanvg1972
Pyrite | Level 9

Hi,

 

I am using views in some data steps, in order to reduce I/O operations and improve performance when using large datasets.

 

I have tested this in some programs, in my program I have a result datasets and other than are intermediate (stored in work library).

I am changing the intermediate dataset for a view and I have found that if I change this intermediate dataset for a view in one or two steps then the performance is better, but if I made more than two steps with views then the performance is equal than using datasets or even worse. In both situations I create a results dataset (nor view) in the final step. 

 

I would like to know best practices of using views, if there are good noly for one or tow steps or for some type of operations. I am doing data /set with new fileds, joins, agregations.

 

Thanks in advance

1 REPLY 1
mkeintz
PROC Star

If you re-use a data set view, then it is reconstructed each time from the original source, which is probably why you don't see expected benefits sometimes when using the view multiple times.  This would be especially true when the view is a merge of multiple datasets, or the view is a small subset of the original.

 

One way to mitigate this problem is to simultaneously define a matching data set view (VNEED) and a data set file (NEED), as below.  The first use (proc freq) call VNEED, which in the background would read BIGDATA, stream in VNEED, and write NEED.  Note the proc freq doesn't have to wait for dataset NEED to be completely generated.  The subsequent univariate proc would read dataset file NEED.

 

  data need  vneed / view=vneed;

    set bigdata;
    where ....;

     ....

   run;

 

  proc freq data=vneed;

     tables .... ;

  run;

 

  proc univariate data=need;

     ....

  run;

 

  1. The input/output cost of the above is that 
    1. BIGDATA is read once
    2. NEED is written to disk once, and
    3. NEED is read from disk once
  2. But if you used view both times, then
    1. BIGDATA is read twice
  3. Or if you used the data set file NEED twice, then
    1. BIGDATA is read once
    2. NEED is written to disk once
    3. NEED is read from disk twice

So depending on the relative size of bigdata vs need, you might save considerable time by using the data file for 2nd and further access, and use the view only for the first access.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 1135 views
  • 1 like
  • 2 in conversation