BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Is there documentation or a paper or similar that describes when / how dictionary tables are updated?  

 

For example, I think I've always assumed that dictionary.columns would be updated when a LIBNAME statement executes, or when datasets are added to a library.  But in playing around, it seems like it actually might be updated in real time when it is used.

 

For example, below code creates an empty library FOO, and it writes 1,000 datasets to it.  It uses the windows COPY command to write the datasets to it, so I don't think there is anything that would trigger SAS to update dictionary.columns (to add the variables in datasets in the FOO library).

 

libname foo "%sysfunc(pathname(work))/foo" ;
libname foo2 "%sysfunc(pathname(work))/foo2" ;

*outside of sas, copy some data to foo ;

%macro mkdata(n=) ;
%local i ;

%do i=1 %to &n ;
  data foo2.x&i ;
    array X{10000} ;
  run ;
%end;

%mend mkdata ;
%mkdata(n=1000)

x "copy ""%sysfunc(pathname(work))\foo2\*"" ""%sysfunc(pathname(work))\foo"" " ;
libname foo2 clear ;

If I then query dictionary.tables, I think the PROC SQL step causes dictionary.tables to be updated when the PROC SQL step runs:

proc sql ;
  create table want as
  select * from dictionary.columns
  where name IN ('X1' 'X10' 'X100') ;
  ;
quit ;

 

Similarly if I assign a library to a remote SQL server database:

libname mydb ODBC DSN="remotedb" ... ;

And then query dictionary.columns, the query against dictionary.columns take much longer.  And yes, of course there are more columns now, but from the increase in the processing time, it looks to me like the PROC SQL query against dictionary.columns might have to hit the remote database when it runs so that it can update dictionary.columns.

 

I used dictionary.columns as an example, just because it's big and slow, but really my questions is a general one about dictionary tables.

 

When I submit an options statement, is dictionary.options being updated in the background?  Is there even a real "table" dictionary.options, or when I refer to dictionary.options, does SAS just look up the options at that point in time and return them?

 

It seems there are plenty of papers on how to use dictionary tables, but I can't find much about what they actually are, or how SAS maintains them.  I've seen general efficiency tips like "use dictionary tables rather than views", and "don't have too many libraries defined cuz dictionary tables get big and slow", but not much beyond that that hint at the inner workings of dictionary tables.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

@Quentin These tables are actually views. Each time you query them, the paths and files are scanned to retrieve what's there at that exact time.

 

@Tom  Like all VS views, SASHELP.VSLIB points to the MEMBERS view afaik. Just to retrieve library paths. 

I have no idea why someone thought it would be a good idea to retrieve libname data from a member-level repository when creating this view.

For this purpose (retrieve the path), it's much faster to point to SASHELP.VLIBNAME.

 

I must admit of not using the dictionary tables. I use SASHELP.VLIBNAME for library information, and proc contents for anything lower. I don't trust that it won't waste time looping through large or remote libraries.

 

Large SPDE libraries with large medatada files are also an issue: SAS scans these metadata files even if there is no need. I had to scale down how much SPDE is used just because of this mess, very sadly.

View solution in original post

8 REPLIES 8
Tom
Super User Tom
Super User

I don't think it really matters how they work internally, but you should treat them as giving you access to the current state of whatever metadata they provide.

In terms on efficiency I find it is critical for those that are gathering information from librefs that you give proper values to key variables, especially LIBNAME, so it only needs to query the things you want.

And avoid using the data step views in SASHELP.  SAS cannot pass any WHERE conditions you use in your code through to the DICTIONARY "tables" when you use the SASHELP views.  I recently switched a program from using SASHELP.VSLIB to DICTIONARY.MEMBERS because it was taking forever if I had any Oracle connections open.

ChrisNZ
Tourmaline | Level 20

@Quentin These tables are actually views. Each time you query them, the paths and files are scanned to retrieve what's there at that exact time.

 

@Tom  Like all VS views, SASHELP.VSLIB points to the MEMBERS view afaik. Just to retrieve library paths. 

I have no idea why someone thought it would be a good idea to retrieve libname data from a member-level repository when creating this view.

For this purpose (retrieve the path), it's much faster to point to SASHELP.VLIBNAME.

 

I must admit of not using the dictionary tables. I use SASHELP.VLIBNAME for library information, and proc contents for anything lower. I don't trust that it won't waste time looping through large or remote libraries.

 

Large SPDE libraries with large medatada files are also an issue: SAS scans these metadata files even if there is no need. I had to scale down how much SPDE is used just because of this mess, very sadly.

FreelanceReinh
Jade | Level 19

Hi @Quentin,

 


@Quentin wrote:

... when I refer to dictionary.options, does SAS just look up the options at that point in time and return them?


I think you're right. I'm not aware of a paper about that either, but here's another striking example, which I came across last year, supporting the "just-in-time-retrieval" conjecture (and demonstrating that even the extent of a query can have an impact on the contents of a DICTIONARY table):

 

Background: DICTIONARY.MACROS stores the values of macro variables in a character variable VALUE of (good old) length 200, so that multiple records are necessary if the value is longer, which in turn requires an auxiliary numeric variable OFFSET to store the order of the 200-character blocks: Its values are 0, 200, 400, ... for the first, second, third, ... block.

 

The code in the log below creates a macro variable LONGMV of length 216, so that a physical DICTIONARY.MACROS table would contain a record with OFFSET=200 (and the last 16 characters of LONGMV in variable VALUE). Contrary to my naïve expectations at that time, this record is not found in the first query (selecting NAME, but not VALUE). Only if VALUE is selected, that additional record is put into reality -- but only temporarily, as the third query demonstrates.

1    %let longmv=%sysfunc(repeat(ABCDEF,35));
2    %put Length of LONGMV: %length(&longmv);
Length of LONGMV: 216
3
4    proc sql;
5    select name        from dictionary.macros where offset>0;
NOTE: No rows were selected.
6    select name, value from dictionary.macros where offset>0;
7    select name        from dictionary.macros where offset>0;
NOTE: No rows were selected.
8    quit;

Output from the second query (as expected):

Macro Variable Name               Macro Variable Value
--------------------------------------------------------------------------------
LONGMV                            CDEFABCDEFABCDEF

However, a DATA step using SASHELP.VMACRO always finds the additional record:

10   data sel;
11   set sashelp.vmacro(keep=name offset);
12   where offset>0;
13   run;

NOTE: There were 1 observations read from the data set SASHELP.VMACRO.
      WHERE offset>0;
Quentin
Super User

Thanks @FreelanceReinh , mind blown!  I never would have predicted that result, but agree it supports the just-in-time theory.  Thanks!

 

That's similar to some stuff I noticed when playing with the JSON engine.

 

If you query from dictionary.tables, it works if you select libname, memname, and/or memtype.  But if you try to select other columns, you get 0 records returned.  It's as if the JSON engine doesn't know how to populate those columns, so it just doesn't return anything to the dictionary "table".  

 

filename myjson temp;
data _null_;
  file myjson ;
  put '[' ;
  put  '{"firstName": "John", "lastName": "Smith", "age": 25}';
  put  ',' ;
  put  '{"firstName": "Joho", "lastName": "Smiti", "age": 26}';
  put ']';
run;
libname myj JSON fileref=myjson;


*returns 0 ;
proc sql ;
  select * from dictionary.tables
  where libname='MYJ' ;
quit ;

*returns 2 ;
proc sql ;
  select memname,memtype 
  from dictionary.tables
  where libname='MYJ' ;
quit ;

*returns 0 if you add the crdate column (or many others);
proc sql ;
  select memname,memtype,crdate
  from dictionary.tables 
  where libname='MYJ' ;
quit ;

I guess the conclusion here is that, like so many other SAS 'objects', the dictionary tables are really logical constructs.  It might be helpful to think of them as a table (since you can query them).  But really they're just being surfaced as a table.  But I wish we could know more about how they work, as it would help people predict some of these gotchas.  That's wild about dictionary.macros.  Wow.

 

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Ksharp
Super User
It is nothing about 'updated'.
dictionary tables is just a SQL VIEW , when you execute it ,it will return the info dynamicly .
Quentin
Super User

Thanks @Ksharp .  If a dictionary table is a SQL view, what is it a view of?  


That is, I thought SQL views were views of a SQL query that read tables (or other views).

 

Are you saying a dictionary "table" is a view of a query, but the underlying query is not a SQL query against a data table, it's some sort of logical query against metadata, or system options, or defined titles/formats, or whatever?

 

Or is it a more generic 'view' in the sense of it's an interface which allows some defined information/values to be queried as if it was a table?

 

 

 

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Ksharp
Super User
"Are you saying a dictionary "table" is a view of a query, but the underlying query is not a SQL query"
Yes. I think so .Just as you mentioned " it is a more generic 'view' ".
Maybe it is SQL query ,maybe NOT .
Quentin
Super User

Fair enough, thanks.  I guess I should have read the docs, which agree:

 

 

https://documentation.sas.com/?docsetId=lrcon&docsetTarget=p11h9yhja6t25an1mkx8xgy3wcwm.htm&docsetVe...

Definition of a DICTIONARY Table

A DICTIONARY table is a read-only SAS view that contains information about SAS libraries, SAS data sets, SAS macros, and external files that are in use or available in the current SAS session. A DICTIONARY table also contains the settings for SAS system options that are currently in effect.

 

When you access a DICTIONARY table, SAS determines the current state of the SAS session and returns the desired information accordingly. This process is performed each time a DICTIONARY table is accessed, so that you always have current information.

 

and:

 

https://documentation.sas.com/?docsetId=lrcon&docsetTarget=p00cato8pe46ein1bjcimkkx6hzd.htm&docsetVe...

DICTIONARY Tables and Performance

When you query a DICTIONARY table, SAS gathers information that is pertinent to that table. Depending on the DICTIONARY table that is being queried, this process can include searching libraries, opening tables, and executing SAS views...

 

Note: SAS does not maintain DICTIONARY table information between queries. Each query of a DICTIONARY table launches a new discovery process.

 

 

I think maybe the fact that there are sashelp views of dictionary tables convinced me that the dictionary tables were not themselves views.

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2128 views
  • 10 likes
  • 5 in conversation