Hi Experts,
I have MULTIPLE ACTIVE libraries under Base SAS 9.4 (suppose A to Z).
For instance:
LibraryNames TableNames ColumnNames
Covid_19 patient_visit_info visit_date
Obesity patient_info patient_visit_date
Diabetes visitor_info visit_date_info
Assume there are hundreds of tables which contain columns containing DATE in them. I have limited knowledge of pulling relevant columns from multiple TABLES (via PROC SQL via LIKE operator in SQL- '...where ColumnNames LIKE % DATE% ...' ).
But I don't know which LIBRARY or TABLES they are in.
Could you please suggest methods/ways to query MULTIPLE LIBRARIES at one go to find TABLES and pull COLUMNS of interest ?
I would appreciate any codes, relevant links and tips on the same.
SASsy DATA STEP is preferred over resource consuming PROC SQL as far as possible 😀.
Thanks in advance!
Actually you should start with PROC SQL because that is where the DICTIONARY metadata library lives.
In particular DICTIONARY.COLUMNS will let you find the NAME of all of the variables in all of the MEMBERs in all of the LIBREFs.
For example you might look for variable that have the string DATE in their name.
proc sql;
create table datevars as
select *
from dictionary.columns
where upcase(name) like '%DATE%'
;
quit;
Or perhaps look for variables that have DATE type format attached to them.
proc sql;
create table datevars as
select *
from dictionary.columns
where 'date' = fmtinfo(prxchange('s/\d*\.*\d*$//',-1,trim(format)),'cat')
;
quit;
Once you have the list of variables you can use it to do whatever you want, but it is not at all clear to me what you want.
Note if you use PROC CONTENTS to get the list of variable names then finding the format category is a lot easier because the format name is in a field of its own without the digits and period appended.
But then you would have to query each libref separately.
proc contents data=covid19._all_ noprint out=content1;
run;
data datevar1;
set content1;
where 'date'=fmtinfo(format,'cat');
run;
Which means what exactly.
Show the code you would want to run for ONE of the variables.
Sounds like you want to scan the three libraries mentioned for any variable that contains the word date and then run a proc freq on that variable?
Do you care about the variable type?
Are there any variables that have DT instead of Date, ie Patient_Admit_Dt?
And what about variables that have a date format but not date in the name?
It would help if you clarified exactly what you want as output as well.
1) Do you care about the variable type?
Yes. I am only interested in the variable with date type/format.
1) Are there any variables that have DT instead of Date, i.e Patient_Admit_Dt?
Yes. The libraries have some (these are date type/format) ending in ..._DT as well.
3. And what about variables that have a date format but not date in the name?
I am only focused on the variables that apparently have date values only. I am not worried about the (rest of the) variables that may have date format/type data but no date in name.
4. It would help if you clarified exactly what you want as output as well.
Suppose, I have three libraries and there are 300 tables that have multiple date type/format variables pulled from various sources maintained (entered)by various end users increasing the propensity of data entry errors and typos. For instances, a user enters 05/02/3022 (intent was to enter 05-02-2022). Another user enters 05/02/4000. The database does not prevent the user from entering 05/02/3022 or 05/02/4000 as they are in correct date type/format.
While limiting the dates from ...2001... to ...2022... (current year, month, day etc.) for analysis, the records with 05/02/3022 or 05/02/4000 will be apparently excluded resulting in the incorrect analysis.
Hence, I need to catch the unusual (at least for my programming purpose) dates and correct them before analysis (and alert the leaders of the pertinent teams on this and have them correct these sorts of errors in the source systems as appropriate ).
In short, I need to catch/identify the odd (out of scope from my program/analysis) dates, if any. Simply we can say it's an anomaly detection phase.
Thanks.
proc sql;
create table date_variables as
select *
from sashelp.vcolumn
where name like ('%DATE%') or name like ('%DT%');
quit;
That will give you the list of tables.
Now, once you've figured out what analysis you want to do on each table/column you can do the following.
I'm assuming now you just want to run a proc freq on the data to see the year(2022) and yearmonth (202201) of the data.
Customize this to what you need - this part is up to you.
%macro analyze_date_vars(libname=, memname=, name=);
Title "Year analysis of Library: &libname. , Table: &memname., Variable: &name.";
proc freq data=&libname..&memname;
table &name;
format &name year.;
run;
Title "YearMonth analysis of Library: &libname. , Table: &memname., Variable: &name.";
proc freq data=&libname..&memname;
table &name;
format &name yymmn6.;
run;
%mend;
Then you can run the summary for each variable using CALL EXECUTE.
data run_reports;
set date_variables;
str = catt('%analyze_date_vars(libname='
libname,
', memname=',
memname,
', name=',
name,
');'
);
call execute(str);
run;
Untested, as no sample data.
Good Luck.
@Reeza ,
I very much appreciate your kind support.
You are an inspirational mentor who can read between the words and provide very useful tips, hints & codes without being too demanding (to show the actual logs or programs, which often are company's proprietary data, and hence not to be shared publicly.) once the scenario is well explained.
Thanks,
Babu
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.