BookmarkSubscribeRSS Feed
inquistive
Quartz | Level 8

Hi Experts,

I have MULTIPLE ACTIVE libraries under Base SAS 9.4 (suppose  A to Z).

For instance:

LibraryNames         TableNames                               ColumnNames

Covid_19                 patient_visit_info                           visit_date 

Obesity                    patient_info                                   patient_visit_date

Diabetes                  visitor_info                                     visit_date_info

Assume there are hundreds of tables which contain columns containing DATE in them. I have limited knowledge of  pulling relevant columns from multiple TABLES (via  PROC SQL via  LIKE operator in SQL-  '...where ColumnNames LIKE % DATE% ...'  ).

 

But I don't know which LIBRARY or  TABLES they are in.  

Could you please  suggest methods/ways to query MULTIPLE  LIBRARIES at one go to  find TABLES and pull  COLUMNS  of interest ?    

I would appreciate any codes, relevant links and  tips on the same.

SASsy DATA STEP is preferred over resource consuming PROC SQL as far as possible 😀

 

Thanks in advance!

 

8 REPLIES 8
Tom
Super User Tom
Super User

Actually you should start with PROC SQL because that is where the DICTIONARY metadata library lives. 

 

In particular DICTIONARY.COLUMNS will let you find the NAME of all of the variables in all of the MEMBERs in all of the LIBREFs.

For example you might look for variable that have the string DATE in their name.

proc sql;
create table datevars as
  select *
  from dictionary.columns
  where upcase(name) like '%DATE%'
;
quit;

Or perhaps look for variables that have DATE type format attached to them.

proc sql;
create table datevars as
  select *
  from dictionary.columns
  where  'date' = fmtinfo(prxchange('s/\d*\.*\d*$//',-1,trim(format)),'cat')
;
quit;

Once you have the list of variables you can use it to do whatever you want, but it is not at all clear to me what you want.

Tom
Super User Tom
Super User

Note if you use PROC CONTENTS to get the list of variable names then finding the format category is a lot easier because the format name is in a field of its own without the digits and period appended.

But then you would have to query each libref separately.

proc contents data=covid19._all_ noprint out=content1;
run;

data datevar1;
  set content1;
  where 'date'=fmtinfo(format,'cat');
run;
inquistive
Quartz | Level 8
@Tom,
Thank you for the quick reply and code. Sorry for being unclear. These codes seem to pull the metadata of columns containing ...date...My objective is to pull the values from the ...date...columns for further analysis.
Tom
Super User Tom
Super User

Which means what exactly.

Show the code you would want to run for ONE of the variables.

Reeza
Super User

Sounds like you want to scan the three libraries mentioned for any variable that contains the word date and then run a proc freq on that variable?

Do you care about the variable type? 

Are there any variables that have DT instead of Date, ie Patient_Admit_Dt?

And what about variables that have a date format but not date in the name?

 

It would help if you clarified exactly what you want as output as well. 

 

 

inquistive
Quartz | Level 8

@Reeza,

 

1) Do you care about the variable type? 

Yes. I am only interested in the variable with date type/format.

 

1) Are there any variables that have DT instead of Date, i.e Patient_Admit_Dt?

  Yes. The libraries have some (these are date type/format) ending in ..._DT  as well.

 

3. And what about variables that have a date format but not date in the name?

 I am only focused on the variables that apparently have date values only. I am not worried about the (rest of the) variables that may have date format/type data but no date in name.

4. It would help if you clarified exactly what you want as output as well. 

Suppose, I have three libraries and there are 300 tables that have multiple date type/format variables  pulled from various sources maintained (entered)by various end users increasing the propensity of data entry errors and typos. For instances, a user enters 05/02/3022 (intent was to enter 05-02-2022). Another user enters 05/02/4000. The database does not prevent the user from entering 05/02/3022 or  05/02/4000 as  they are in correct date type/format.

 

While  limiting the dates from ...2001... to ...2022... (current year, month, day etc.) for analysis, the records with 05/02/3022 or 05/02/4000 will be apparently excluded resulting in the incorrect analysis.

 

Hence, I need to catch the unusual (at least for my programming purpose) dates and correct them before analysis (and alert the leaders of the pertinent teams on this and have them correct these sorts of errors  in the source systems  as  appropriate ).

In short, I need to catch/identify the odd (out of scope from my program/analysis) dates, if any. Simply we can  say it's an anomaly detection phase. 

 

Thanks.

 

Reeza
Super User
  • Find all column/table names with DATE or DT
proc sql;
create table date_variables as
select *
from sashelp.vcolumn
where name like ('%DATE%') or name like ('%DT%');
quit;

That will give you the list of tables. 

Now, once you've figured out what analysis you want to do on each table/column you can do the following. 

I'm assuming now you just want to run a proc freq on the data to see the year(2022) and yearmonth (202201) of the data. 

Customize this to what you need - this part is up to you.

 

%macro analyze_date_vars(libname=, memname=, name=);

Title "Year analysis of Library: &libname. , Table: &memname., Variable: &name.";
proc freq data=&libname..&memname;
table &name;
format &name year.;
run;

Title "YearMonth analysis of Library: &libname. , Table: &memname., Variable: &name.";
proc freq data=&libname..&memname;
table &name;
format &name yymmn6.;
run;

%mend;



Then you can run the summary for each variable using CALL EXECUTE.

 

data run_reports;
set date_variables;

str = catt('%analyze_date_vars(libname='
             libname, 
            ', memname=',
             memname,
            ', name=',
             name,
             ');'
             );

call execute(str);

run;

Untested, as no sample data.

Good Luck. 

inquistive
Quartz | Level 8

@Reeza ,

I very much  appreciate your kind support. 

You are an inspirational mentor  who can read between the words and provide very useful tips, hints & codes without being too demanding (to show the actual logs or programs, which often are company's proprietary data, and hence not to be shared publicly.) once the scenario is well explained. 

Thanks,

Babu 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 981 views
  • 5 likes
  • 3 in conversation