macro or array

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

macro or array

Hello, I'm wondering what the best logic is for this problem. I wrote a program to read all the .sh files within a subfolder of a directory. It works well for what I need but I would like this program repeated for other subfolders now. What's the best way to do this? An array or macro array with a do over? Nested macros *gasp!*? I've tried a few things but I'm struggling and can't think of the whole logic.

 

The part I want to repeat is in the second line where &myFolder will change to folder1, then folder2, then folder3 etc. within the path. I've also tried doing this by replacing /path/&myFolder/* with /path/* and then changing the way programs_summary table is created so it has ALL of the .sh from just /path/ but then the macro variable runs out of space and I get a warning "maximum length has been exceded" or something.

 

Here's the code:

%let myFolder = folder1;
filename pipedir pipe "ls -l /path/&myFolder/*";


data indata;
 infile pipedir pad missover;
 input line $char1000.;
 if scan(line,-1)='sh' then output;
run;


data programs_summary (keep=path program size date);
set indata;
length path $500;
format date mmddyy10.;
size 	  = scan(line,5,' ');
program 	  = scan(line,-1,'/');
date 	  = input(substr(line,45,12), anydtdte12.);
path		  = scan(line,-1,' ');
run;


proc sql ;*NOPRINT;
SELECT '%_pgms(infile='||trim(path)||');'
   INTO : readFile SEPARATED BY '  '
   FROM programs_summary;
quit;


/* define macro to use for everything */
%macro _pgms(infile=);

/* run for each shell script */
data out(keep=VAR1 VAR2 VAR3 PATH);
  length VAR1 VAR2 VAR3 $60
  	    txtline $100
	    path $500;
  retain VAR1 VAR2 VAR3;
  infile "&infile" missover dsd lrecl=1000;
  input txtline $;
  path = "&infile";
  /* MORE CODE TO GRAB DATALINES WITHIN FILE */
  /* VAR1 = */
  /* VAR2 = */
  /* VAR3 = */
  if index(txtline,'exit')>0 then output;
run;

/* accumulate */
data all;
  set out all;
run;

%mend;


/* initialize accumulator data set */
data all;
delete;
run;


/* call all macros */
&readFile;

 


Accepted Solutions
Solution
‎01-06-2016 08:13 AM
Super User
Posts: 19,167

Re: macro or array

Try the following (untested):

 

file temp '/main/prod';
data
mydata (keep=var1 var2 var3 script);
set my_file_list(obs=10); length var1 var2 var3 $60 path $500; retain var1 var2 var3; infile temp filevar=my_file_name filename=path missover dsd lrecl=1000; input txtline $; if strip(txtline) =: 'term1' then var1 = txtline; if strip(txtline) =: 'term2' then var2 = txtline; if strip(txtline) =: 'term3' then var3 = txtline; script = path; if index(txtline,'exit')>0 then output; run;
 

 

Remove the obs=10 to run for all files, it's only there for testing.

View solution in original post


All Replies
Super User
Posts: 19,167

Re: macro or array

1. Are you creating one SAS data set or multiple, from the code it looks like one, since a second macro call will overwrite the dataset.
2. Have you looked into the ls options to recursively list all files/folders ( I believe it's -LR)
http://stackoverflow.com/questions/105212/linux-recursively-list-all-files-in-a-directory-including-...

3. Look into the filevar/filename options in the infile statement to read multiple files at once and/or keep the filenames for QA.
Trusted Advisor
Posts: 1,116

Re: macro or array

Last month, a somewhat similar topic was discussed in this thread:

https://communities.sas.com/t5/Base-SAS-Programming/txt-files-I-need-to-pattern-match-contents-on-wi...

 

Key message is what Reeza has mentioned already: Read all your .sh files with a single data step (taking the paths either from a suitable LS output via PIPE or from a text file to be created first). Thus, you don't need a macro, no "accumulation" and no macro variable containing macro calls.

 

As in the abovementioned thread I suggest to look at Example 10 of the INFILE statement documentation.

Super User
Posts: 11,134

Re: macro or array

If all of the files within a directory are of the same structure and are to be read into the same output dataset then perhaps a wild card approach would be more efficient. Some details arise depending on whether your input files have header rows or not but a valid option. Options exist to even identify which specific file contributed each record.

Super User
Posts: 9,874

Re: macro or array

LS command have some parameter to list all the sub-directory  Like windows command:     dir c:\temp\*.sas /S /B 

Type :   man ls   and check its parameter. Once you get all the  sub-directory path . That would be easy to feed them into SAS .

Super User
Super User
Posts: 7,720

Re: macro or array

As an alternative, you could just concatenate the files using the OS, and then read one file in, this has the benefits of creating an actual file of the compiled results before reading in the data (as hidden characters etc. might affect on the import).

http://stackoverflow.com/questions/11711569/windows-batch-file-concatenate-all-files-in-subdirectori...

 

Note, it is for DOS, but I am sure there is a Unix equivalent.

Occasional Contributor
Posts: 15

Re: macro or array

I am able to get a list of .sh files now from all of my subfolders within the same directory. I think I understand the concept of outputting to a text file, then using infile with filevar= to read the contents of each line in the text file, which will have my file path. If I am wrong, please correct me.

 

If that is the logic to go with, I can't figure out how to output to a text file now. When I use FILE 'external-file-name' I get an error along the lines of "ERROR: Insufficient authorization to access /sas/9.4/config/Lev1/......". 

 

I am really new to this and would appreciate a breakdown. Thanks in advance!

Occasional Contributor
Posts: 15

Re: macro or array

Also, this is how I'm getting my list of files. I can't immediately use another infile like I saw in the example from Haikuo because it ends up reading from the original piped list, which outputs like this:

 

/main/prod/programs/folder1:
total 224
-rwxr-x--- type owner group size date file11.sas
-rwxr-x--- type owner group size date file11.sh
-rwxr-x--- type owner group size date file12.sas
-rwxr-x--- type owner group size date file12.sh
-rwxr-x--- type owner group size date file13.sql

 

/main/prod/programs/folder2:
total 224
-rwxr-x--- type owner group size date file21.sas
-rwxr-x--- type owner group size date file21.sh
-rwxr-x--- type owner group size date file22.sas
-rwxr-x--- type owner group size date file22.sh
-rwxr-x--- type owner group size date file23.sql

 

 

...etc...

filename pipedir pipe "ls -l /main/prod/programs/*";

data indata;
retain path;
infile pipedir pad missover;
input line $char1000.;

*clean up piped output; if line =: '/main/prod/programs/' then path = scan(line,1,':'); script = strip(path)||'/'||scan(strip(line),-1,' '); if scan(line,-1)='sh' then output; run;

 

 

Super User
Posts: 19,167

Re: macro or array

Step1: Get your list of files in a SAS data set, do you have this accomplished? You don't need a text file as far as I know.
Step2: Read in all data sets, into one data set, adding a record to identify the source file. If you need help with this, please post your code to read one file and we can help modify the program.
Occasional Contributor
Posts: 15

Re: macro or array

I have the list of files/filepaths in a SAS dataset. Here's the code to read one file at a time. I will need to do this for about 1,200 other files. Some variables will be missing for some files.

data mydata (keep=var1 var2 var3 script);
length var1 var2 var3 $60 path $500;
retain var1 var2 var3;
infile '/main/prod/programs/folder1/myfile1.sh' filename=path missover dsd lrecl=1000;
input txtline $;
if strip(txtline) =: 'term1' then var1 = txtline;
if strip(txtline) =: 'term2' then var2 = txtline;
if strip(txtline) =: 'term3' then var3 = txtline;
script = path;
if index(txtline,'exit')>0 then output;
run;
Solution
‎01-06-2016 08:13 AM
Super User
Posts: 19,167

Re: macro or array

Try the following (untested):

 

file temp '/main/prod';
data
mydata (keep=var1 var2 var3 script);
set my_file_list(obs=10); length var1 var2 var3 $60 path $500; retain var1 var2 var3; infile temp filevar=my_file_name filename=path missover dsd lrecl=1000; input txtline $; if strip(txtline) =: 'term1' then var1 = txtline; if strip(txtline) =: 'term2' then var2 = txtline; if strip(txtline) =: 'term3' then var3 = txtline; script = path; if index(txtline,'exit')>0 then output; run;
 

 

Remove the obs=10 to run for all files, it's only there for testing.

Occasional Contributor
Posts: 15

Re: macro or array

[ Edited ]

Thank you so much for the help! This is my final and tested code to get list of .sh files within a directory with multiple subfolders and then reading each of the files to abstract information I need. This is MUCH MUCH simpler than using a macro. Thanks!

/*define the directory to pull list of programs from*/
filename pipedir pipe "ls -l /main/prod/programs/*";

/*get list of .sh files*/
data indata;
retain path;
infile pipedir pad missover;
input line $char1000.;

if line =: '/main/prod/programs/' then path = scan(line,1,':');
script = strip(path)||'/'||scan(strip(line),-1,' ');
if scan(line,-1)='sh' then output;
run;

/*read from SAS dataset table of .sh list to get relevant information from each .sh file*/
data want (keep=var1 var2 var3 script);
set indata;*(obs=10);
length txtline $250;

*did not need second filename statement since temp is used in infile; infile temp filevar=script missover end=eof; do while (^eof); input txtline $; if strip(txtline) =: 'term1' then var1 = txtline; if strip(txtline) =: 'term2' then var2 = txtline; if strip(txtline) =: 'term3' then var3 = txtline; prgm_script = script; end; rename prgm_script=script; run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 629 views
  • 3 likes
  • 6 in conversation