Solved: Look for the latest table available

t30 · Posted 08-15-2018 05:49 AM

Hi

It's common to have tables with different refreshed dates.

For example, I have a group of tables which captures the active accounts on a weekly basis.

The tables are stored in a specific folder. Let's say the folder is ACTIVE.

Within this ACTIVE folder, I would have all the weekly refresh of the tables. Table naming convention (accounts_YYMMDD)

ACTIVE.accounts_180720

ACTIVE.accounts_180727

ACTIVE.accounts_180803

ACTIVE.accounts_180810

Some others, would have the refresh frequency on a monthly basis. Table naming convention (AUM_YYMMDD)

ACTIVE.AUM_1801

ACTIVE.AUM_1802

ACTIVE.AUM_1803

ACTIVE.AUM_1804

The idea is, to automatically select the latest table from each group.

From the two examples above, the latest tables should be ACTIVE.accounts_180810 and ACTIVE.AUM_1804.

Is there a more intelligent way to find out which one is the latest available table?

For example, in an actual script, there's a proc sql

proc sql;
select count(distinct fieldA) from ACTIVE.AUM_1801
;quit;

As of now, I need to manually changed it to whatever latest (ACTIVE.AUM_1804).

Thanks

s_lassen · Posted 08-15-2018 08:06 AM

PROC SQL is the way to go, but I would suggest putting the table names into macro variables. Fortunately, you just have to look for the maximum of the MEMNAME variable to get the latest table:

proc sql noprint;
  select max(memname) into :accounts from dictionary.tables
  where libname='ACTIVE' and memname like 'ACCOUNTS%';
  select max(memname) into :aum from dictionary.tables
  where libname='ACTIVE' and memname like 'AUM%';
quit;

You can then refer to the macro variables with &ACCOUNTS and &AUM:

proc sql;
  select Count(distinct FieldA) from Active.&aum;
quit;

proc summary data=active.&accounts nway missing;
  /* or whatever you want to do with your data */

View solution in original post

RW9 · Posted 08-15-2018 05:55 AM

"Is there a more intelligent way" - yes there is, its not storing data like that. Having one large master dataset with a column for date means simpler faster access to the data. As it is now, your going to have to write code to identify the file you want - which means more coding resource, cpu time running etc. purely through bad data modelling.

proc sort data=sashelp.vtable (obs=1);
  by descending memname;
  where libname="ACTIVE" and scan(memname,1,"_")="accounts";
run;

This takes all the memnames from the given libname, sorts them by name where prefixed with accounts, then the obs tells the output to only have one (which will be the lowest or highest depending on the sort - in this case biggest first). Same thing for your other request, just change libname and scan.

t30 · Posted 08-15-2018 05:57 AM

Unfortunately, the data is stored in that way, a legacy thing and I myself a mere user of the tables.

As of now, the tables are in that way to represent the snapshots at each date.

For example a table capturing the customers' account balances of a bank. The bank would want to know what's the snapshot of the account balance for all customers at month end, every month.

RW9 · Posted 08-15-2018 06:37 AM

Well, its your time. Anyways did the sort work?

t30 · Posted 08-15-2018 09:48 PM

Pretty much my employer's time and ressources ^^. The sort doesn't seem to produce any result though.

s_lassen · Posted 08-15-2018 08:06 AM

PROC SQL is the way to go, but I would suggest putting the table names into macro variables. Fortunately, you just have to look for the maximum of the MEMNAME variable to get the latest table:

proc sql noprint;
  select max(memname) into :accounts from dictionary.tables
  where libname='ACTIVE' and memname like 'ACCOUNTS%';
  select max(memname) into :aum from dictionary.tables
  where libname='ACTIVE' and memname like 'AUM%';
quit;

You can then refer to the macro variables with &ACCOUNTS and &AUM:

proc sql;
  select Count(distinct FieldA) from Active.&aum;
quit;

proc summary data=active.&accounts nway missing;
  /* or whatever you want to do with your data */

t30 · Posted 08-15-2018 11:52 PM

This would do, the solution uses dictionary.tables, which I'm not aware of.

Though I'm still working on it because it seems the code for the actual situation faces some other issues like

 Data file XXXXX is in a format native to another host or the
      file encoding does not match the session encoding. Cross Environment Data Access will be used,
      which may require additional CPU resources and reduce performance.

Thanks!

t30 · Posted 08-16-2018 01:18 AM

@s_lassen

the max function, when applied to memname (which is a string), what's the behavior?

ballardw · Posted 08-15-2018 10:59 AM

The idea is, to automatically select the latest table from each group.

From the two examples above, the latest tables should be ACTIVE.accounts_180810 and ACTIVE.AUM_1804.

Is there a more intelligent way to find out which one is the latest available table?

Define "latest". Do you mean that NAME of the file defines latest or the actual date that the dataset was updated or corrected?

A likely source in either case is to look at dictionary tables that SAS maintains for you. This will show what you can request from dictionary.tables.

proc sql;
   describe table dictionary.tables;
quit;

Notice that you can get MEMNAME (the table name) for a given LIBNAME (where clause), the Date Created, Date Modified if needed.

So you select the MEMNAME and LIBNAME similar to the ones you want, parse the name for the date portion and select the latest date if using the table name OR the date created or modified if that is what you mean. In a perfect world they would be the same but we all know how that goes. Note that you can even check if the created and modified dates are the same if that is an issue.

ChrisNZ · Posted 08-15-2018 10:27 PM

1. SAS will sort the data for you when you read all tables.

This works:

data ACCOUNTS_180720      
     ACCOUNTS_180727
     ACCOUNTS_180815
     ACCOUNTS_170811  ;
run;
data _null_;
  set ACCOUNTS_1: (obs=1) indsname=INDSNAME end=LASTOBS;
  if LASTOBS then call symput('lastest_period',scan(INDSNAME,2,'_'));
run;
%put &=lastest_period;

LASTEST_PERIOD=180815

2. I disagree with @RW9's sweeping statement.

> Having one large master dataset with a column for date means simpler faster access to the data.

Large tables can be extremely slow to access, and the most recent data -which only represent a fraction of all the data- is the most often used. In this (very common) case, it makes a lot of sense to use the structure detailed here. When needed, accessing all the data sets in one go is a matter a creating a view.

3. In any case, these tables should indeed contain a PERIOD_ID field of some sort. It is easier to use that than to depend on such metadata as the table name.

High-Performance SAS Coding - Third Edition

t30 · Posted 08-15-2018 11:24 PM

@ChrisNZ

What do you intend to do with this data step?

data ACCOUNTS_180720      
     ACCOUNTS_180727
     ACCOUNTS_180815
     ACCOUNTS_170811  ;
run;

I'm not familiar with a data step without the set statement. Thanks

ChrisNZ · Posted 08-16-2018 12:08 AM

This is just to create the sample data.

High-Performance SAS Coding - Third Edition

t30 · Posted 08-15-2018 11:26 PM

As for the master dataset table, I'm sure there are pros and cons to each design.

But I leave it to the data engineer equivalents to sort that out. ^^

Look for the latest table available

Re: Look for the latest table available

Re: Look for the latest table available

Re: Look for the latest table available

Re: Look for the latest table available

Re: Look for the latest table available

Re: Look for the latest table available

Re: Look for the latest table available

Re: Look for the latest table available

Re: Look for the latest table available

Re: Look for the latest table available

Re: Look for the latest table available

Re: Look for the latest table available

Re: Look for the latest table available

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away