Hi
It's common to have tables with different refreshed dates.
For example, I have a group of tables which captures the active accounts on a weekly basis.
The tables are stored in a specific folder. Let's say the folder is ACTIVE.
Within this ACTIVE folder, I would have all the weekly refresh of the tables. Table naming convention (accounts_YYMMDD)
ACTIVE.accounts_180720
ACTIVE.accounts_180727
ACTIVE.accounts_180803
ACTIVE.accounts_180810
Some others, would have the refresh frequency on a monthly basis. Table naming convention (AUM_YYMMDD)
ACTIVE.AUM_1801
ACTIVE.AUM_1802
ACTIVE.AUM_1803
ACTIVE.AUM_1804
The idea is, to automatically select the latest table from each group.
From the two examples above, the latest tables should be ACTIVE.accounts_180810 and ACTIVE.AUM_1804.
Is there a more intelligent way to find out which one is the latest available table?
For example, in an actual script, there's a proc sql
proc sql;
select count(distinct fieldA) from ACTIVE.AUM_1801
;quit;
As of now, I need to manually changed it to whatever latest (ACTIVE.AUM_1804).
Thanks
PROC SQL is the way to go, but I would suggest putting the table names into macro variables. Fortunately, you just have to look for the maximum of the MEMNAME variable to get the latest table:
proc sql noprint;
select max(memname) into :accounts from dictionary.tables
where libname='ACTIVE' and memname like 'ACCOUNTS%';
select max(memname) into :aum from dictionary.tables
where libname='ACTIVE' and memname like 'AUM%';
quit;
You can then refer to the macro variables with &ACCOUNTS and &AUM:
proc sql;
select Count(distinct FieldA) from Active.&aum;
quit;
proc summary data=active.&accounts nway missing;
/* or whatever you want to do with your data */
"Is there a more intelligent way" - yes there is, its not storing data like that. Having one large master dataset with a column for date means simpler faster access to the data. As it is now, your going to have to write code to identify the file you want - which means more coding resource, cpu time running etc. purely through bad data modelling.
proc sort data=sashelp.vtable (obs=1); by descending memname; where libname="ACTIVE" and scan(memname,1,"_")="accounts"; run;
This takes all the memnames from the given libname, sorts them by name where prefixed with accounts, then the obs tells the output to only have one (which will be the lowest or highest depending on the sort - in this case biggest first). Same thing for your other request, just change libname and scan.
Unfortunately, the data is stored in that way, a legacy thing and I myself a mere user of the tables.
As of now, the tables are in that way to represent the snapshots at each date.
For example a table capturing the customers' account balances of a bank. The bank would want to know what's the snapshot of the account balance for all customers at month end, every month.
Well, its your time. Anyways did the sort work?
PROC SQL is the way to go, but I would suggest putting the table names into macro variables. Fortunately, you just have to look for the maximum of the MEMNAME variable to get the latest table:
proc sql noprint;
select max(memname) into :accounts from dictionary.tables
where libname='ACTIVE' and memname like 'ACCOUNTS%';
select max(memname) into :aum from dictionary.tables
where libname='ACTIVE' and memname like 'AUM%';
quit;
You can then refer to the macro variables with &ACCOUNTS and &AUM:
proc sql;
select Count(distinct FieldA) from Active.&aum;
quit;
proc summary data=active.&accounts nway missing;
/* or whatever you want to do with your data */
This would do, the solution uses dictionary.tables, which I'm not aware of.
Though I'm still working on it because it seems the code for the actual situation faces some other issues like
Data file XXXXX is in a format native to another host or the
file encoding does not match the session encoding. Cross Environment Data Access will be used,
which may require additional CPU resources and reduce performance.
Thanks!
the max function, when applied to memname (which is a string), what's the behavior?
The idea is, to automatically select the latest table from each group.
From the two examples above, the latest tables should be ACTIVE.accounts_180810 and ACTIVE.AUM_1804.
Is there a more intelligent way to find out which one is the latest available table?
Define "latest". Do you mean that NAME of the file defines latest or the actual date that the dataset was updated or corrected?
A likely source in either case is to look at dictionary tables that SAS maintains for you. This will show what you can request from dictionary.tables.
proc sql; describe table dictionary.tables; quit;
Notice that you can get MEMNAME (the table name) for a given LIBNAME (where clause), the Date Created, Date Modified if needed.
So you select the MEMNAME and LIBNAME similar to the ones you want, parse the name for the date portion and select the latest date if using the table name OR the date created or modified if that is what you mean. In a perfect world they would be the same but we all know how that goes. Note that you can even check if the created and modified dates are the same if that is an issue.
1. SAS will sort the data for you when you read all tables.
This works:
data ACCOUNTS_180720
ACCOUNTS_180727
ACCOUNTS_180815
ACCOUNTS_170811 ;
run;
data _null_;
set ACCOUNTS_1: (obs=1) indsname=INDSNAME end=LASTOBS;
if LASTOBS then call symput('lastest_period',scan(INDSNAME,2,'_'));
run;
%put &=lastest_period;
LASTEST_PERIOD=180815
2. I disagree with @RW9's sweeping statement.
> Having one large master dataset with a column for date means simpler faster access to the data.
Large tables can be extremely slow to access, and the most recent data -which only represent a fraction of all the data- is the most often used. In this (very common) case, it makes a lot of sense to use the structure detailed here. When needed, accessing all the data sets in one go is a matter a creating a view.
3. In any case, these tables should indeed contain a PERIOD_ID field of some sort. It is easier to use that than to depend on such metadata as the table name.
What do you intend to do with this data step?
data ACCOUNTS_180720
ACCOUNTS_180727
ACCOUNTS_180815
ACCOUNTS_170811 ;
run;
I'm not familiar with a data step without the set statement. Thanks
This is just to create the sample data.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.