SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

concatenating external files based on file name

Accepted Solution Solved
Reply
Super Contributor
Posts: 413
Accepted Solution

concatenating external files based on file name

Hi,

In a previous question I asked how to concatenate files and from the answers given realized that it can be done directly with command prompt with : type "file1" "file2" ... "filen" > "allfiles"

 

But suppose that I want to concatenate files based on their names. For example, I have the files f1,f2, fh1, fh2, is it possible to dynamically concatenate all the f's together and the fh's together etc?

 

Thank you!

 


Accepted Solutions
Solution
‎04-09-2016 08:58 PM
Super User
Super User
Posts: 6,500

Re: concatenating external files based on file name

Do you care the order that the files are concatenated?  Are you just interested in concatenating the raw source files or are you actually interested in generating a concatenated DATASET?

 

If you just want to use SAS to copy the raw files together and you can use a simple (single wildcard) pattern to match the file names then a simple date step will do. 

data _null_;
   infile 'fh*.txt' ;
   file 'all_fh.txt' ;
   input;
   put _infile_;
run;

If it is more complex then make a dataset with the list of filenames and then use that to drive the step that reads and copies the files.

data filelist ;
   infile 'dir /b f*.txt' pipe truncover ;
   input filename $256. ;
   if filename =: 'fh' then do; 
     target = 'fh_files.txt';
     number = input(substr(scan(filename,1,'.'),3),32.);
     output;
   end;
   else if filename =: 'f' then do; 
     target = 'f_files.txt';
     number = input(substr(scan(filename,1,'.'),2),32.);
     output;
   end;
run;
proc sort ; by target number ; run;
data _null_;
   set filelist ;
   infile source filevar=filename end=eof ;
   file target filevar=target ;
   do while (not eof);
     input;
     put _infile_;
  end;
run;

   

 

View solution in original post


All Replies
Super Contributor
Posts: 408

Re: concatenating external files based on file name

How about "type fh*.* > allfhfiles". But don't you have a SAS question? Those are more fun.

 

- Jan.

Super Contributor
Posts: 413

Re: concatenating external files based on file name

Hi Jan,

 

Actuallt it is a very SASy question because here I realize that I will need some sort of a macro but I can't figure out how to write and would greatly appreciate a first push!! 

 

 

thanks!

Super Contributor
Posts: 408

Re: concatenating external files based on file name

Hi @ilikesas no problem. I have done many a file- and directory manipulation in my life. What proved the most efficient, reliable and auditable (a big thing in the branches I work) is the following:

 

  • Design a dataset that keeps track of files and their properties, status in the process (new, done, error, ...) with corrsponding timestamps etc
  • Run a macro that generates a directory listing. Update above dataset and det4ermine what files need work (status new) based on whatever criteria you desire.
  • In a datastep you can eg. read files with names like fh* by using
    • a set statement of above dataset; WHERE status=new
    • Read the selected files using the INPUT/PUT statements and the FILEVAR option. This way you can have a data driven process of reading and writing files in a single datastep. You can also use CALL EXECUTE on each individual file to run macro's using the name as a parameter.
    • The datastep creates a table of files processed.
  • Use the generated dataset to update the dataset used to keep track of work done. Update status to whatever is next.

Many macros exist on the web that create directory listings. The gist of my suggestion is using the powerful FILEVAR option to cycle through a list of files. TS-DOC 581 gives a good idea of the possibilities.

 

Hope this helps,

- Jan.

Super User
Posts: 17,823

Re: concatenating external files based on file name

@ilikesas How is it a SAS question? You can wildcard the command and execute from SAS I suppose? It seems very OS related, witht he exception of calling the command from SAS.

 

If you're concatenating so you can read the files via SAS that's not required since you can use the FILEVAR option in an infile statement.

Super Contributor
Posts: 413

Re: concatenating external files based on file name

Hi Reeza,

 

its true that just concateneating the files is purely OS (you actually showed me how to do it in my previous question!)

 

But here I need to dynamically concatenate based on file names and I guess that here SAS is the program that actually does the management of which files get concatenated together (and personally to me it seems to be the only way I can think of, but again I am a relative beginner and my knowledge is somewhat limited...)

 

Thanks! 

Super User
Posts: 17,823

Re: concatenating external files based on file name

data filelist ;
   infile 'dir /b *.txt' pipe truncover ;
   input filename $256. ;
run;

The snippet above (from @Tom) creates the  file list dataset. H

His code is correct, but you may want to start from the above to understand what's going on.

Trusted Advisor
Posts: 1,115

Re: concatenating external files based on file name

Hi @ilikesas,

 

If this is a recurring task, you could write a SAS macro which takes the common prefix (e.g. fh), the input and output folder names, the name of the output file and possibly instructions regarding the numbering as parameters. Then you could call the macro once for each set of files to be concatenated and it would build and execute the appropriate X statement.

Super Contributor
Posts: 413

Re: concatenating external files based on file name

Hi,

 

Do I have first to import the names of the files? I know how to import files into SAs but here I don't need to imprt the actual files, here I use SAS as an intermediary sorter. Its just that I have difficulty starting and would greatly appreciate if you could give me some simple code as a hint.

 

 

Thanks!

Solution
‎04-09-2016 08:58 PM
Super User
Super User
Posts: 6,500

Re: concatenating external files based on file name

Do you care the order that the files are concatenated?  Are you just interested in concatenating the raw source files or are you actually interested in generating a concatenated DATASET?

 

If you just want to use SAS to copy the raw files together and you can use a simple (single wildcard) pattern to match the file names then a simple date step will do. 

data _null_;
   infile 'fh*.txt' ;
   file 'all_fh.txt' ;
   input;
   put _infile_;
run;

If it is more complex then make a dataset with the list of filenames and then use that to drive the step that reads and copies the files.

data filelist ;
   infile 'dir /b f*.txt' pipe truncover ;
   input filename $256. ;
   if filename =: 'fh' then do; 
     target = 'fh_files.txt';
     number = input(substr(scan(filename,1,'.'),3),32.);
     output;
   end;
   else if filename =: 'f' then do; 
     target = 'f_files.txt';
     number = input(substr(scan(filename,1,'.'),2),32.);
     output;
   end;
run;
proc sort ; by target number ; run;
data _null_;
   set filelist ;
   infile source filevar=filename end=eof ;
   file target filevar=target ;
   do while (not eof);
     input;
     put _infile_;
  end;
run;

   

 

Super User
Posts: 17,823

Re: concatenating external files based on file name

Wildcards in both SAS are valid as someone indicated.

Use the SCAN funtion within Toms code to extract just the filename.
Super Contributor
Posts: 413

Re: concatenating external files based on file name

[ Edited ]

Hi Reeza,

 

here is the code that I did by modifying Tom's code and I ALMOST got what I wanted:

data filelist ;
   infile 'dir C:\files\ /b *.txt' pipe truncover ;
   input filename $256. ;
directory = 'C:\files\';
file_path=directory || filename; /*create the pathname */
name=substr(filename, 1, length(filename)-4); /*delete the .txt*/
file_number = compress(name,'','A'); /*extract the file number*/
unique_name = compress(name, file_number); /*extract the unique name*/
target_extension= '_all.txt';
target = directory || unique_name || target_extension;
run;


data _null_;
   set filelist ;
   infile source filevar=file_path end=eof ;
   file target filevar=target ;
   do while (not eof);
     input;
     put _infile_;
  end;
run;

The only minor problem is that when I execute the code I get my 2 files (one concatenating f1 and f2, and one concatenating fh1 and fh2) but they are not in txt format, to open them I actually need to choose the program with which to open.

 

its probably related to :

target_extension= '_all.txt';
target = directory || unique_name || target_extension;

 because when I looked at my SAS table "filelist" the target_extension is not fully concatenated to the other part, and I get something like:

C:\files\f

 

             _all.txt

 

for the variable "target"

Thanks!

Super User
Posts: 17,823

Re: concatenating external files based on file name

Don't use the double pipe for concatenation use Catt or Cats function. For one they remove extra spaces and another is they deal with conversion from numeric to character so you can avoid explicitly converting types. 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 794 views
  • 3 likes
  • 5 in conversation