BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
t30
Fluorite | Level 6 t30
Fluorite | Level 6

Hi

 

It's common to have tables with different refreshed dates.

 

For example, I have a group of tables which captures the active accounts on a weekly basis.

 

The tables are stored in a specific folder. Let's say the folder is ACTIVE.

 

Within this ACTIVE folder, I would have all the weekly refresh of the tables. Table naming convention (accounts_YYMMDD)

 

ACTIVE.accounts_180720

ACTIVE.accounts_180727

ACTIVE.accounts_180803

ACTIVE.accounts_180810

 

Some others, would have the refresh frequency on a monthly basis. Table naming convention (AUM_YYMMDD)

 

ACTIVE.AUM_1801

ACTIVE.AUM_1802

ACTIVE.AUM_1803

ACTIVE.AUM_1804

 

The idea is, to automatically select the latest table from each group.

From the two examples above, the latest tables should be ACTIVE.accounts_180810 and ACTIVE.AUM_1804.

 

Is there a more intelligent way to find out which one is the latest available table?

 

For example, in an actual script, there's a proc sql

proc sql;
select count(distinct fieldA) from ACTIVE.AUM_1801
;quit;

As of now, I need to manually changed it to whatever latest (ACTIVE.AUM_1804). 

 

Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

PROC SQL is the way to go, but I would suggest putting the table names into macro variables. Fortunately, you just have to look for the maximum of the MEMNAME variable to get the latest table:

proc sql noprint;
  select max(memname) into :accounts from dictionary.tables
  where libname='ACTIVE' and memname like 'ACCOUNTS%';
  select max(memname) into :aum from dictionary.tables
  where libname='ACTIVE' and memname like 'AUM%';
quit;

You can then refer to the macro variables with &ACCOUNTS and &AUM:

proc sql;
  select Count(distinct FieldA) from Active.&aum;
quit;

proc summary data=active.&accounts nway missing;
  /* or whatever you want to do with your data */

View solution in original post

12 REPLIES 12
RW9
Diamond | Level 26 RW9
Diamond | Level 26

"Is there a more intelligent way" - yes there is, its not storing data like that.  Having one large master dataset with a column for date means simpler faster access to the data.  As it is now, your going to have to write code to identify the file you want - which means more coding resource, cpu time running etc. purely through bad data modelling.

proc sort data=sashelp.vtable (obs=1);
  by descending memname;
  where libname="ACTIVE" and scan(memname,1,"_")="accounts";
run;

This takes all the memnames from the given libname, sorts them by name where prefixed with accounts, then the obs tells the output to only have one (which will be the lowest or highest depending on the sort - in this case biggest first).  Same thing for your other request, just change libname and scan.

t30
Fluorite | Level 6 t30
Fluorite | Level 6

Unfortunately, the data is stored in that way, a legacy thing and I myself a mere user of the tables.

 

As of now, the tables are in that way to represent the snapshots at each date.

 

 

For example a table capturing the customers' account balances of a bank. The bank would want to know what's the snapshot of the account balance for all customers at month end, every month.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, its your time.  Anyways did the sort work?

t30
Fluorite | Level 6 t30
Fluorite | Level 6
Pretty much my employer's time and ressources ^^. The sort doesn't seem to produce any result though.
s_lassen
Meteorite | Level 14

PROC SQL is the way to go, but I would suggest putting the table names into macro variables. Fortunately, you just have to look for the maximum of the MEMNAME variable to get the latest table:

proc sql noprint;
  select max(memname) into :accounts from dictionary.tables
  where libname='ACTIVE' and memname like 'ACCOUNTS%';
  select max(memname) into :aum from dictionary.tables
  where libname='ACTIVE' and memname like 'AUM%';
quit;

You can then refer to the macro variables with &ACCOUNTS and &AUM:

proc sql;
  select Count(distinct FieldA) from Active.&aum;
quit;

proc summary data=active.&accounts nway missing;
  /* or whatever you want to do with your data */
t30
Fluorite | Level 6 t30
Fluorite | Level 6

This would do, the solution uses dictionary.tables, which I'm not aware of.

 

Though I'm still working on it because it seems the code for the actual situation faces some other issues like

 Data file XXXXX is in a format native to another host or the
      file encoding does not match the session encoding. Cross Environment Data Access will be used,
      which may require additional CPU resources and reduce performance.

Thanks!

t30
Fluorite | Level 6 t30
Fluorite | Level 6

@s_lassen

the max function, when applied to memname (which is a string), what's the behavior? 

ballardw
Super User

 

The idea is, to automatically select the latest table from each group.

From the two examples above, the latest tables should be ACTIVE.accounts_180810 and ACTIVE.AUM_1804.

 

Is there a more intelligent way to find out which one is the latest available table?

 

 


Define "latest". Do you mean that NAME of the file defines latest or the actual date that the dataset was updated or corrected?

 

A likely source in either case is to look at dictionary tables that SAS maintains for you. This will show what you can request from dictionary.tables.

proc sql;
   describe table dictionary.tables;
quit;

Notice that you can get MEMNAME (the table name) for a given LIBNAME (where clause), the Date Created, Date Modified if needed.

 

So you select the MEMNAME and LIBNAME similar to the ones you want, parse the name for the date portion and select the latest date if using the table name OR the date created or modified if that is what you mean. In a perfect world they would be the same but we all know how that goes. Note that you can even check if the created and modified dates are the same if that is an issue.

ChrisNZ
Tourmaline | Level 20

1. SAS will sort the data for you when you read all tables.

This works:

data ACCOUNTS_180720      
     ACCOUNTS_180727
     ACCOUNTS_180815
     ACCOUNTS_170811  ;
run;
data _null_;
  set ACCOUNTS_1: (obs=1) indsname=INDSNAME end=LASTOBS;
  if LASTOBS then call symput('lastest_period',scan(INDSNAME,2,'_'));
run;
%put &=lastest_period;

LASTEST_PERIOD=180815

2. I disagree with @RW9's sweeping statement.

> Having one large master dataset with a column for date means simpler faster access to the data. 

Large tables can be extremely slow to access, and the most recent data -which only represent a fraction of all the data- is the most often used. In this (very common) case, it makes a lot of sense to use the structure detailed here. When needed, accessing all the data sets in one go is a matter a creating a view. 

 

3. In any case, these tables should indeed contain a PERIOD_ID field of some sort. It is easier to use that than to depend on such metadata as the table name.

 

 

 

 

t30
Fluorite | Level 6 t30
Fluorite | Level 6

@ChrisNZ

What do you intend to do with this data step?

data ACCOUNTS_180720      
     ACCOUNTS_180727
     ACCOUNTS_180815
     ACCOUNTS_170811  ;
run;

I'm not familiar with a data step without the set statement. Thanks

ChrisNZ
Tourmaline | Level 20

This is just to create the sample data.

t30
Fluorite | Level 6 t30
Fluorite | Level 6
As for the master dataset table, I'm sure there are pros and cons to each design.

But I leave it to the data engineer equivalents to sort that out. ^^

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1311 views
  • 2 likes
  • 5 in conversation