BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mikepeters
Calcite | Level 5

I was wondering if anyone has come across the need to scan multiple directories/folders/libraries on their server in order to find and access the SAS datasets that live in those directories/folders. The tricky part is that these locations I am scanning all have different naming schema and as you move further into the directories/folders/libraries I am searching, the naming schema has no consistency at all. 

 

Ultimately I want to search every SAS data set in those directories/folders/libraries and see if they contain (or do not contain) specific variables. I need this to be an automated process that will run every day. My hunch is that it will be best to not hardcode anything if possible and use macro variables anywhere I can.

 

Any help on how to solve this quandary would be most welcome. Code examples would be even better.

 

I am a novice SAS user so I apologize if I did not use the correct terminology above. I am currently using SAS Enterprise Guide 7.11.

 

Thanks in advance!

- Mike

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@mikepeters 

I've missed that you wrote "I am a novice SAS user" so the sample code I've pointed you to is likely still a bit too advanced for you to use as a starting point. 

 

What you basically need to do: Starting from a top folder create a list of all the directories (recursive). That's actually quite simple to achieve with a Unix/Linux LS command or a Windows DIR command. Challenge is: A lot of sites don't allow SAS EG users to issue such OS commands out of SAS (Option NOXCMD set. You can run Proc OPTIONS; Run: to check if the option is XCMD or NOXCMD).

The SAS sample code I've pointed you to provides a SAS way for a directory listing which also works when NOXCMD is set.

 

Once you've got a directory listing you then issue as SAS libname statement per directory and then query the SAS dictionary tables for the just defined libref to the all the entries of column names you're searching for.

 

Writing such SAS code is in my opinion not beginner level. Below working code based on the sample code I've referenced earlier.

Just copy paste the code into your environment. The only thing you need to change is the following line of code:

%memlist(<path to top folder>, outtbl=memlist, varlist=<one or more variable names to search for>)     

First parameter needs to be the fully qualified path to top folder you want to search from.

 

I won't explain the details how below code works also because a beginner should first upskill in Base SAS language before diving into SAS Macro language.

%macro memlist(dir, first=Y, outtbl=_memlist, varlist=); 
  %local filrf rc did memcnt name i;     
  %let varlist=%unquote(&varlist);
 
  %if &first=Y %then
    %do;
      data &outtbl;
        stop;
        length path $500;
        set sashelp.vcolumn(keep=memname name rename=(memname=table name=variable_name));
      run;
      data _null_;
        length varlist $32767;
        varlist=compbl("&varlist");
        varlist=cats("'",tranwrd(strip(varlist),' ',"','"),"'");
        call symputx('varlist',upcase(strip(varlist)));
        stop;
      run;
    %end; 

  /* Assigns a fileref to the directory and opens the directory */                                                           
  %let rc=%sysfunc(filename(filrf,&dir));                                                                                               
  %let did=%sysfunc(dopen(&filrf));                                                                                                     
                                                                                                                                        
  /* Make sure directory can be opened */                                                                                                 
  %if &did eq 0 %then %do;                                                                                                              
   %put Directory &dir cannot be open or does not exist;                                                                                
   %return;                                                                                                                             
  %end;      

    libname temp1234 "&dir" access=readonly;
    proc sql;
      create table _v_&outtbl as
        select 
          "&dir" as Path length=500,
          memname as table,
          name as variable_name
        from dictionary.columns
        where 
          libname='TEMP1234' 
          and upcase(name) in (&varlist)
        ;
    quit;
    libname temp1234 clear;

    proc append base=&outtbl data=_v_&outtbl;
    run;quit;
    proc datasets lib=work nolist nowarn;
      delete _v_&outtbl;
    run;quit;
                                                                                                                                        
   /* Loops through entire directory */                                                                                                 
   %do i = 1 %to %sysfunc(dnum(&did));                                                                                                  
                                                                                                                                        
     /* Retrieve name of each file */                                                                                                   
     %let name=%qsysfunc(dread(&did,&i));                                                                                               
     /* If directory name call macro again */                                                                                           
      %if %qscan(&name,2,.) = %then %do; 
        data _null_;
          if findc("&dir",'/')>0 then call symputx('slash','/','l');
          else call symputx('slash','\','l');
          stop;
        run;
        %memlist(&dir.&slash.%unquote(&name), first=N, outtbl=&outtbl, varlist=%nrbquote(&varlist))                                                                                               
      %end;                                                                                                                             
                                                                                                                                        
   %end;                                                                                                                                
                                                                                                                                        
  /* Closes the directory and clear the fileref */                                                                                      
  %let rc=%sysfunc(dclose(&did));                                                                                                       
  %let rc=%sysfunc(filename(filrf));                                                                                                    
                                                                                                                                        
%mend memlist;                                                                                                                            
                                                                                                                                        
/* First parameter is the directory of where your table are stored. */                                                                  
/* Parameter "varlist" allows to pass in a blank separated list of variable names to search for */                                                                  
/* Parameter "outtbl" allows to define the name of the output table with the search result  */                                                                  
%memlist(<path to top folder>, outtbl=memlist, varlist=<one or more variable names to search for>)     

proc print data=memlist;
run;

 

View solution in original post

4 REPLIES 4
ballardw
Super User

If the data sets you refer to are actual SAS data sets and the libraries are currently defined in a SAS session then the SASHELP.Vcolumn data view or when using Proc SQL Dictionary.columns has all of the variables and characteristics with the library name and data set name.

Example: Find all the information about variable named AGE in all datasets:

proc sql;
   create table work.agevar as
   select * 
   from dictionary.columns
   where upcase(name)='AGE';
quit;

Upcase on the name, name of the variable, to search as it may be mixed case in the source table.

LIBNAME and MEMNAME for library and dataset name are stored as upper case though.

 

If you don't want to hardcode variable names then create a data set that has the names wanted and JOIN the dictionary.columns on the value of the variable holding the names of interest. Then changing the data set contents would be the only change needed to extract the information.

 

 

Reeza
Super User

1. Create a list of all files that have the specified extension and path, store filename and path

2. Loop through and create libnames for each path using the LIBNAME function (within a data step)

3. Use SASHELP.VCOLUMN to check for any variables you needed to.

 

Step 2 and 3 are illustrated below. To get the list of names, I highly recommend using an OS function and it depends on your OS. You can search for how to do that on here or the SAS Macro Appendix has an example that should work on any OS.

 

This would search for any data set variables that ended in _DATE. 

 

 

/*step 2*/
data _null_;
set have;

rc=libname('lib'||put(_n_, z5.), path_to_files);

if rc ne 0 then put 'Libname ' path_to_files 'was not assigned';

run;

/*step 3*/
proc sql;
create table want as
select *
from sashelp.vcolumn
where upcase(name) like '%_DATE';
quit;

@mikepeters wrote:

I was wondering if anyone has come across the need to scan multiple directories/folders/libraries on their server in order to find and access the SAS datasets that live in those directories/folders. The tricky part is that these locations I am scanning all have different naming schema and as you move further into the directories/folders/libraries I am searching, the naming schema has no consistency at all. 

 

Ultimately I want to search every SAS data set in those directories/folders/libraries and see if they contain (or do not contain) specific variables. I need this to be an automated process that will run every day. My hunch is that it will be best to not hardcode anything if possible and use macro variables anywhere I can.

 

Any help on how to solve this quandary would be most welcome. Code examples would be even better.

 

I am a novice SAS user so I apologize if I did not use the correct terminology above. I am currently using SAS Enterprise Guide 7.11.

 

Thanks in advance!

- Mike


 

Patrick
Opal | Level 21

@mikepeters 

You could use the following sample code as a starting point.

"Sample 45805: List all files within a directory including sub-directories", http://support.sas.com/kb/45/805.html 

 

Instead of listing the content of the folder you could issue a libname statement, then query the dictionary tables using the libref as selection criteria (either sashelp.vcolumn or dictionary.columns) and then append the result to some target dataset where you collect the result from all the queries. 

Patrick
Opal | Level 21

@mikepeters 

I've missed that you wrote "I am a novice SAS user" so the sample code I've pointed you to is likely still a bit too advanced for you to use as a starting point. 

 

What you basically need to do: Starting from a top folder create a list of all the directories (recursive). That's actually quite simple to achieve with a Unix/Linux LS command or a Windows DIR command. Challenge is: A lot of sites don't allow SAS EG users to issue such OS commands out of SAS (Option NOXCMD set. You can run Proc OPTIONS; Run: to check if the option is XCMD or NOXCMD).

The SAS sample code I've pointed you to provides a SAS way for a directory listing which also works when NOXCMD is set.

 

Once you've got a directory listing you then issue as SAS libname statement per directory and then query the SAS dictionary tables for the just defined libref to the all the entries of column names you're searching for.

 

Writing such SAS code is in my opinion not beginner level. Below working code based on the sample code I've referenced earlier.

Just copy paste the code into your environment. The only thing you need to change is the following line of code:

%memlist(<path to top folder>, outtbl=memlist, varlist=<one or more variable names to search for>)     

First parameter needs to be the fully qualified path to top folder you want to search from.

 

I won't explain the details how below code works also because a beginner should first upskill in Base SAS language before diving into SAS Macro language.

%macro memlist(dir, first=Y, outtbl=_memlist, varlist=); 
  %local filrf rc did memcnt name i;     
  %let varlist=%unquote(&varlist);
 
  %if &first=Y %then
    %do;
      data &outtbl;
        stop;
        length path $500;
        set sashelp.vcolumn(keep=memname name rename=(memname=table name=variable_name));
      run;
      data _null_;
        length varlist $32767;
        varlist=compbl("&varlist");
        varlist=cats("'",tranwrd(strip(varlist),' ',"','"),"'");
        call symputx('varlist',upcase(strip(varlist)));
        stop;
      run;
    %end; 

  /* Assigns a fileref to the directory and opens the directory */                                                           
  %let rc=%sysfunc(filename(filrf,&dir));                                                                                               
  %let did=%sysfunc(dopen(&filrf));                                                                                                     
                                                                                                                                        
  /* Make sure directory can be opened */                                                                                                 
  %if &did eq 0 %then %do;                                                                                                              
   %put Directory &dir cannot be open or does not exist;                                                                                
   %return;                                                                                                                             
  %end;      

    libname temp1234 "&dir" access=readonly;
    proc sql;
      create table _v_&outtbl as
        select 
          "&dir" as Path length=500,
          memname as table,
          name as variable_name
        from dictionary.columns
        where 
          libname='TEMP1234' 
          and upcase(name) in (&varlist)
        ;
    quit;
    libname temp1234 clear;

    proc append base=&outtbl data=_v_&outtbl;
    run;quit;
    proc datasets lib=work nolist nowarn;
      delete _v_&outtbl;
    run;quit;
                                                                                                                                        
   /* Loops through entire directory */                                                                                                 
   %do i = 1 %to %sysfunc(dnum(&did));                                                                                                  
                                                                                                                                        
     /* Retrieve name of each file */                                                                                                   
     %let name=%qsysfunc(dread(&did,&i));                                                                                               
     /* If directory name call macro again */                                                                                           
      %if %qscan(&name,2,.) = %then %do; 
        data _null_;
          if findc("&dir",'/')>0 then call symputx('slash','/','l');
          else call symputx('slash','\','l');
          stop;
        run;
        %memlist(&dir.&slash.%unquote(&name), first=N, outtbl=&outtbl, varlist=%nrbquote(&varlist))                                                                                               
      %end;                                                                                                                             
                                                                                                                                        
   %end;                                                                                                                                
                                                                                                                                        
  /* Closes the directory and clear the fileref */                                                                                      
  %let rc=%sysfunc(dclose(&did));                                                                                                       
  %let rc=%sysfunc(filename(filrf));                                                                                                    
                                                                                                                                        
%mend memlist;                                                                                                                            
                                                                                                                                        
/* First parameter is the directory of where your table are stored. */                                                                  
/* Parameter "varlist" allows to pass in a blank separated list of variable names to search for */                                                                  
/* Parameter "outtbl" allows to define the name of the output table with the search result  */                                                                  
%memlist(<path to top folder>, outtbl=memlist, varlist=<one or more variable names to search for>)     

proc print data=memlist;
run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1730 views
  • 1 like
  • 4 in conversation