Re: SAS Views - Define Once?

shl007 · Posted 07-21-2023 02:45 PM

I'm researching on how often this kind of definition needs to be done ... (below)

Hi,

I'm researching views and trying to use them for the first time.

I am wondering if, in each program where I am using the view, I will need a definition like below - or is just a one-time initialization below enough, and then the view name can just be referenced across multiple programs (but not redefining it below each time)?

Because from what I've read, defining it once below means I can then just reference the view name, which will refresh the view automatically in any other program?

data lib.vwrefund_detail/view= lib.vwrefund_detail;
set lib.dataset1
lib.dataset2;
run;

ballardw · Posted 07-21-2023 02:58 PM

If your view is placed in a permanent library then you do not need to execute the code unless you need to change the definition.

Caveat: any data sources must also be available when used. So if your lib.dataset1 or lib.dataset2 are removed you have problems.

You can look in your SASHELP library for examples of SAS supplied views that persist:

Voption, Vslib, Vstable for example.

You view can also reference an external file, such as using a data step to read a text file that is periodically replaced. As long as that external file is where the data step expects to find it referencing the view will execute the read (note: large files take time).

@shl007 wrote:

I'm researching on how often this kind of definition needs to be done ... (below)

Hi,

I'm researching views and trying to use them for the first time.

I am wondering if, in each program where I am using the view, I will need a definition like below - or is just a one-time initialization below enough, and then the view name can just be referenced across multiple programs (but not redefining it below each time)?

Because from what I've read, defining it once below means I can then just reference the view name, which will refresh the view automatically in any other program?

data lib.vwrefund_detail/view= lib.vwrefund_detail;
set lib.dataset1
lib.dataset2;
run;

SASKiwi · Posted 07-22-2023 06:25 PM

I would be very careful about DATA step views. They are only useful if they are being used multiple times in your applications, with changing data selections. They are most definitely not worth it if you are only using them once as they are slower to create and run, rather than just running the original code.

mkeintz · Posted 07-22-2023 08:49 PM

@SASKiwi wrote:

I would be very careful about DATA step views. They are only useful if they are being used multiple times in your applications, with changing data selections. They are most definitely not worth it if you are only using them once as they are slower to create and run, rather than just running the original code.

"most definitely not worth it if you are only using them once"?

That's not been my experience. Often, I find using a dataset view for a sequence of multiple analytical procedures as far less efficient than creating and using the equivalent dataset file. Say dataset WIDE has one obs per BY variable(s), while NARROW has many obs per BY. Then, assuming disk storage is not an issue, I would want to merge them only once, especially if drawing a subset:

data vtemp/view=vtemp;
  merge narrow wide;
  by byvar1 byvar2;
  if wide_var_x>10;
run;

proc summary data=vtemp;
  class ...;
  var ... ;
  output out=summary_table ..... ;
run;

proc univariate data=vtemp;
  .... ;
run;

proc reg data=vtemp;
  model ..... ;
quit;

This requires the merge and subsetting to be performed three times - a lot more work, although it would save some disk space. To get the advantage of both dataset view and dataset file, in this case I would create both - use the view for the first PROC and the file on the subsequent ones:

data need vtemp/view=vtemp  ;
  merge narrow wide;
  by byvar1 byvar2;
  if wide_var_x>10;
run;

proc summary data=vtemp;
  class ...;
  var ... ;
  output out=summary_table ..... ;
run;

proc univariate data=need;
  .... ;
run;

proc reg data=need;
  model ..... ;
quit;

The advantage of this is that the proc summary reads VTEMP, which is executed DURING the creation of the merged data NEED. Or more properly stated, NEED is generated when VTEMP is called. Tehn the following proc's use dataset need, and don't re-invoke the merge.

And I find it counterintuitive to think that dataset views are slower to create than the analogous files - after all there is almost 100% less writing to disk. The data is simply "piped" in memory to the calling PROC. BTW, by "create", I don't mean "define", which takes no time at all. I mean generating the data in the view.

But I also would agree that it is likely much faster to read a dataset file, once created, than the analogous view, which would have to process the parent datasets.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SASKiwi · Posted 07-22-2023 09:39 PM

@mkeintz - I think you may be misinterpreting what I said (your posted example is using views more than once with different PROCS and that's a great justification). Of course the create view step is a lot faster as there is no IO. However if you compare the run time of BOTH the create view and read steps with just reading without a view, then it will always be slower because of the extra time to create the view. The read steps, with or without views, will be similar in process time.

Your use case looks like a great justification and that is exactly my point.

mkeintz · Posted 07-22-2023 10:16 PM

@SASKiwi

Glad to know we are in agreement. My point is to clarify that using a view only once is more efficient than creating the analogous dataset file just to read it once. Steps 1A and 1B below (dataset view) took 19.2 seconds. Steps 2A and 2B (dataset file) took 46.6. Of course, a second use of the file (step 2B alone) took only 8.5.

/*** Step 0 ****/
%let n=200000;
data b g;
  set sashelp.class;
  length widevar $4000; 
  widevar=repeat(put(_n_,z2.),1999);
  if sex='M' then do i=1 to &n;   output b; end;
  else do i=1 to &n;              output g; end;
run;

/* Step 1A */
data vtest / view=vtest;
  set b g;
  by name;
  if age>13;
run;

/* Step 1B */
data _null_; set vtest; run;


/* Step 2A */
data test;
  set b g;
  by name;
  if age>13;
run;

/* Step 2B */
data _null_; set test; run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SASKiwi · Posted 07-23-2023 07:19 PM

@mkeintz - Nice comparison. I ran some tests with your code. It appears there is a significant caching benefit to creating and using a view immediately afterwards. However that is no different to reading datasets multiple times in succession - you get the same caching benefit. When I separate view creation and use, then compare it with a "normal" DATA step read, then performance is much the same, which is what I would expect. Running your example code this is a typical run:

So I far as I can see, there is a major coding benefit to using DATA step views but any performance benefits relate to memory caching.

Patrick · Posted 07-23-2023 12:06 AM

@shl007 wrote:

I'm researching on how often this kind of definition needs to be done ... (below)

Hi,

I'm researching views and trying to use them for the first time.

I am wondering if, in each program where I am using the view, I will need a definition like below - or is just a one-time initialization below enough, and then the view name can just be referenced across multiple programs (but not redefining it below each time)?

Because from what I've read, defining it once below means I can then just reference the view name, which will refresh the view automatically in any other program?

data lib.vwrefund_detail/view= lib.vwrefund_detail;
set lib.dataset1
lib.dataset2;
run;

I look at a views - whether SQL or SAS datastep - as encapsulated code that gets executed when you use it.

Your code above is the view definition (=storing this encapsulated code under libref lib). That's very fast because it only defines the view but doesn't process the data. You then can use this view in subsequent steps or programs. When you use it the view executes the code and though processes data.

If creating and using a view is worth it depends on your usage. Let's say your view need to process a lot of data but the result is a much smaller data volume (like some inner join) then it's often more efficient to create a table once and use it several times than to execute the view seral times where each time the same expensive process needs to run.

shl007 · Posted 08-02-2023 11:05 AM

Thanks for all the replies. Three quick follow-up questions:

1) When I use proc sql "create view" - is there any way to avoid the "loading" wait time as the view gets created? That is hanging up my SAS session. This is a pretty large dataset. My use case is, I have two separate datasets that I need to stack together for various downstream reports. Using a view to stack.

2) Defining an index in a view (or regular dataset) - only has to happen once and will stick even as rows get appended, correct?

3) if I use a union within a proc sql "create view" with columns of varying length, I'm finding the length used in the resulting view respects the widest column.

E.g., I have a character column that is 20 chars in the first union select but 250 chars in the second union select. The view sets that column to 250 chars.

I'm glad it is happening this way, just wanted to confirm that is expected and behavior I can rely upon.

Thanks everyone!

Patrick · Posted 08-02-2023 06:31 PM

@shl007

For 1: The view creation is very quick. If you have any "loading wait time" then the view gets executed. This could for example happen if your SAS EG/SAS Studio preferences are automatically adding/opening tables. It will of course also happen if you use the view as input anywhere in your code.

For 2: Correct. Look at the view as encapsulated SQL code that gets executed every single time you access the view.

For 3: Yes, that's the expected behavior for a SAS SQL UNION and a very useful one.

SAS Innovate 2025: Call for Content

Classroom Training Available!