Definition of a DICTIONARY Table

Quentin · Posted 05-12-2020 12:29 PM

Is there documentation or a paper or similar that describes when / how dictionary tables are updated?

For example, I think I've always assumed that dictionary.columns would be updated when a LIBNAME statement executes, or when datasets are added to a library. But in playing around, it seems like it actually might be updated in real time when it is used.

For example, below code creates an empty library FOO, and it writes 1,000 datasets to it. It uses the windows COPY command to write the datasets to it, so I don't think there is anything that would trigger SAS to update dictionary.columns (to add the variables in datasets in the FOO library).

libname foo "%sysfunc(pathname(work))/foo" ;
libname foo2 "%sysfunc(pathname(work))/foo2" ;

*outside of sas, copy some data to foo ;

%macro mkdata(n=) ;
%local i ;

%do i=1 %to &n ;
  data foo2.x&i ;
    array X{10000} ;
  run ;
%end;

%mend mkdata ;
%mkdata(n=1000)

x "copy ""%sysfunc(pathname(work))\foo2\*"" ""%sysfunc(pathname(work))\foo"" " ;
libname foo2 clear ;

If I then query dictionary.tables, I think the PROC SQL step causes dictionary.tables to be updated when the PROC SQL step runs:

proc sql ;
  create table want as
  select * from dictionary.columns
  where name IN ('X1' 'X10' 'X100') ;
  ;
quit ;

Similarly if I assign a library to a remote SQL server database:

libname mydb ODBC DSN="remotedb" ... ;

And then query dictionary.columns, the query against dictionary.columns take much longer. And yes, of course there are more columns now, but from the increase in the processing time, it looks to me like the PROC SQL query against dictionary.columns might have to hit the remote database when it runs so that it can update dictionary.columns.

I used dictionary.columns as an example, just because it's big and slow, but really my questions is a general one about dictionary tables.

When I submit an options statement, is dictionary.options being updated in the background? Is there even a real "table" dictionary.options, or when I refer to dictionary.options, does SAS just look up the options at that point in time and return them?

It seems there are plenty of papers on how to use dictionary tables, but I can't find much about what they actually are, or how SAS maintains them. I've seen general efficiency tips like "use dictionary tables rather than views", and "don't have too many libraries defined cuz dictionary tables get big and slow", but not much beyond that that hint at the inner workings of dictionary tables.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

ChrisNZ · Posted 05-12-2020 05:36 PM

@Quentin These tables are actually views. Each time you query them, the paths and files are scanned to retrieve what's there at that exact time.

@Tom Like all VS views, SASHELP.VSLIB points to the MEMBERS view afaik. Just to retrieve library paths.

I have no idea why someone thought it would be a good idea to retrieve libname data from a member-level repository when creating this view.

For this purpose (retrieve the path), it's much faster to point to SASHELP.VLIBNAME.

I must admit of not using the dictionary tables. I use SASHELP.VLIBNAME for library information, and proc contents for anything lower. I don't trust that it won't waste time looping through large or remote libraries.

Large SPDE libraries with large medatada files are also an issue: SAS scans these metadata files even if there is no need. I had to scale down how much SPDE is used just because of this mess, very sadly.

High-Performance SAS Coding - Third Edition

View solution in original post

Tom · Posted 05-12-2020 01:06 PM

I don't think it really matters how they work internally, but you should treat them as giving you access to the current state of whatever metadata they provide.

In terms on efficiency I find it is critical for those that are gathering information from librefs that you give proper values to key variables, especially LIBNAME, so it only needs to query the things you want.

And avoid using the data step views in SASHELP. SAS cannot pass any WHERE conditions you use in your code through to the DICTIONARY "tables" when you use the SASHELP views. I recently switched a program from using SASHELP.VSLIB to DICTIONARY.MEMBERS because it was taking forever if I had any Oracle connections open.

ChrisNZ · Posted 05-12-2020 05:36 PM

@Quentin These tables are actually views. Each time you query them, the paths and files are scanned to retrieve what's there at that exact time.

@Tom Like all VS views, SASHELP.VSLIB points to the MEMBERS view afaik. Just to retrieve library paths.

I have no idea why someone thought it would be a good idea to retrieve libname data from a member-level repository when creating this view.

For this purpose (retrieve the path), it's much faster to point to SASHELP.VLIBNAME.

I must admit of not using the dictionary tables. I use SASHELP.VLIBNAME for library information, and proc contents for anything lower. I don't trust that it won't waste time looping through large or remote libraries.

Large SPDE libraries with large medatada files are also an issue: SAS scans these metadata files even if there is no need. I had to scale down how much SPDE is used just because of this mess, very sadly.

High-Performance SAS Coding - Third Edition

FreelanceReinh · Posted 05-12-2020 04:34 PM

Hi @Quentin,

@Quentin wrote:

... when I refer to dictionary.options, does SAS just look up the options at that point in time and return them?

I think you're right. I'm not aware of a paper about that either, but here's another striking example, which I came across last year, supporting the "just-in-time-retrieval" conjecture (and demonstrating that even the extent of a query can have an impact on the contents of a DICTIONARY table):

Background: DICTIONARY.MACROS stores the values of macro variables in a character variable VALUE of (good old) length 200, so that multiple records are necessary if the value is longer, which in turn requires an auxiliary numeric variable OFFSET to store the order of the 200-character blocks: Its values are 0, 200, 400, ... for the first, second, third, ... block.

The code in the log below creates a macro variable LONGMV of length 216, so that a physical DICTIONARY.MACROS table would contain a record with OFFSET=200 (and the last 16 characters of LONGMV in variable VALUE). Contrary to my naïve expectations at that time, this record is not found in the first query (selecting NAME, but not VALUE). Only if VALUE is selected, that additional record is put into reality -- but only temporarily, as the third query demonstrates.

1    %let longmv=%sysfunc(repeat(ABCDEF,35));
2    %put Length of LONGMV: %length(&longmv);
Length of LONGMV: 216
3
4    proc sql;
5    select name        from dictionary.macros where offset>0;
NOTE: No rows were selected.
6    select name, value from dictionary.macros where offset>0;
7    select name        from dictionary.macros where offset>0;
NOTE: No rows were selected.
8    quit;

Output from the second query (as expected):

Macro Variable Name               Macro Variable Value
--------------------------------------------------------------------------------
LONGMV                            CDEFABCDEFABCDEF

However, a DATA step using SASHELP.VMACRO always finds the additional record:

10   data sel;
11   set sashelp.vmacro(keep=name offset);
12   where offset>0;
13   run;

NOTE: There were 1 observations read from the data set SASHELP.VMACRO.
      WHERE offset>0;

Quentin · Posted 05-12-2020 05:16 PM

Thanks @FreelanceReinh , mind blown! I never would have predicted that result, but agree it supports the just-in-time theory. Thanks!

That's similar to some stuff I noticed when playing with the JSON engine.

If you query from dictionary.tables, it works if you select libname, memname, and/or memtype. But if you try to select other columns, you get 0 records returned. It's as if the JSON engine doesn't know how to populate those columns, so it just doesn't return anything to the dictionary "table".

filename myjson temp;
data _null_;
  file myjson ;
  put '[' ;
  put  '{"firstName": "John", "lastName": "Smith", "age": 25}';
  put  ',' ;
  put  '{"firstName": "Joho", "lastName": "Smiti", "age": 26}';
  put ']';
run;
libname myj JSON fileref=myjson;


*returns 0 ;
proc sql ;
  select * from dictionary.tables
  where libname='MYJ' ;
quit ;

*returns 2 ;
proc sql ;
  select memname,memtype 
  from dictionary.tables
  where libname='MYJ' ;
quit ;

*returns 0 if you add the crdate column (or many others);
proc sql ;
  select memname,memtype,crdate
  from dictionary.tables 
  where libname='MYJ' ;
quit ;

I guess the conclusion here is that, like so many other SAS 'objects', the dictionary tables are really logical constructs. It might be helpful to think of them as a table (since you can query them). But really they're just being surfaced as a table. But I wish we could know more about how they work, as it would help people predict some of these gotchas. That's wild about dictionary.macros. Wow.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

Ksharp · Posted 05-13-2020 08:07 AM

It is nothing about 'updated'.
dictionary tables is just a SQL VIEW , when you execute it ,it will return the info dynamicly .

Quentin · Posted 05-13-2020 10:22 AM

Thanks @Ksharp . If a dictionary table is a SQL view, what is it a view of?

That is, I thought SQL views were views of a SQL query that read tables (or other views).

Are you saying a dictionary "table" is a view of a query, but the underlying query is not a SQL query against a data table, it's some sort of logical query against metadata, or system options, or defined titles/formats, or whatever?

Or is it a more generic 'view' in the sense of it's an interface which allows some defined information/values to be queried as if it was a table?

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

Ksharp · Posted 05-14-2020 06:48 AM

"Are you saying a dictionary "table" is a view of a query, but the underlying query is not a SQL query"
Yes. I think so .Just as you mentioned " it is a more generic 'view' ".
Maybe it is SQL query ,maybe NOT .

Quentin · Posted 05-14-2020 07:23 AM

Fair enough, thanks. I guess I should have read the docs, which agree:

https://documentation.sas.com/?docsetId=lrcon&docsetTarget=p11h9yhja6t25an1mkx8xgy3wcwm.htm&docsetVe...

Definition of a DICTIONARY Table

A DICTIONARY table is a read-only SAS view that contains information about SAS libraries, SAS data sets, SAS macros, and external files that are in use or available in the current SAS session. A DICTIONARY table also contains the settings for SAS system options that are currently in effect.

When you access a DICTIONARY table, SAS determines the current state of the SAS session and returns the desired information accordingly. This process is performed each time a DICTIONARY table is accessed, so that you always have current information.

and:

https://documentation.sas.com/?docsetId=lrcon&docsetTarget=p00cato8pe46ein1bjcimkkx6hzd.htm&docsetVe...

DICTIONARY Tables and Performance

When you query a DICTIONARY table, SAS gathers information that is pertinent to that table. Depending on the DICTIONARY table that is being queried, this process can include searching libraries, opening tables, and executing SAS views...

Note: SAS does not maintain DICTIONARY table information between queries. Each query of a DICTIONARY table launches a new discovery process.

I think maybe the fact that there are sashelp views of dictionary tables convinced me that the dictionary tables were not themselves views.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

When are dictionary tables updated?

Re: When are dictionary tables updated?

Re: When are dictionary tables updated?

Re: When are dictionary tables updated?

Re: When are dictionary tables updated?

Re: When are dictionary tables updated?

Re: When are dictionary tables updated?

Re: When are dictionary tables updated?

Re: When are dictionary tables updated?

Re: When are dictionary tables updated?

Definition of a DICTIONARY Table

DICTIONARY Tables and Performance

Definition of a DICTIONARY Table

DICTIONARY Tables and Performance

Classroom Training Available!