DATA Step, Macro, Functions and more

Processing all the files from a folder

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

Processing all the files from a folder

I am using Enterprise Guide 7.4 running SAS 9.4 under Windows 10.

 

Problem: I have hundreds of files with the same format (I'll attach three examples). There is a line of text alternating with a line of numbers. I want to read all the files in a folder. Each file needs to be reorganized by making the text string a new variable paired with the number string. The name of each file is also added for each row of data in the data file. The results from all files are appended and saved.

 

This example comes from http://support.sas.com/kb/41/880.html, with minor modifications.

 

filename DIRLIST pipe 'dir "C:\Users\tebert\Desktop\Data_Folder\*.txt" /b';

data dirlist ;

infile dirlist lrecl=200 truncover;

input file_name $100.;

run;

data _null_;

set dirlist end=end;

count+1;

call symputx('read'||put(count,10.-l),cats('C:\Users\tebert\Desktop\Data_Folder\',file_name));

call symputx('dset'||put(count,10.-l),scan(file_name,1,'.'));

if end then call symputx('max',count);

run;

options mprint symbolgen;

%macro readin;

%do i=1 %to &max;

data &&dset&i;

infile "&&read&i" lrecl=1000 truncover dsd missover;

input var1 $ var2 $ var3 $ var4 $ var5 $ var6 $ var7 $ var8 $ var9 $ var10 $;

run;

%end;

%mend readin;

%readin;

run;

 

It runs, but I have three questions:

1) I would like to run the attached macro on each file.

 

2) The program reads each file saved in dirlist. I would like to have the file name added to each row of data in the file.

 

3) I would like to append all of the files to make one data file.

 

Here is the macro:

 

%Macro Manip;

Data one; set one;

retain insectno2;

if insectno2<1 then insectno2=0;

insectno2=insectno2+1;

Data two three; set one;

if mod(insectno2,2) eq 1 then output two;

if mod(insectno2,2) eq 0 then output three;

Data two; set two; drop Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 insectno2;

Data two; set two;

retain insectno2;

if insectno2<1 then insectno2=0;

insectno2=insectno2+1;

data three; set three;

var9=var8; var8=var7; var7=var6; var6=var5; var5=var4; var4=var3; var3=var2; var2=var1; var1=var1; /*Use "waveform;" in place of "var1;" to switch to TBF values.*/;

data three; set three; drop waveform insectno2;

data three; set three;

retain insectno2;

if insectno2<1 then insectno2=0;

insectno2=insectno2+1;

data two; set two three; merge two three; by insectno2;

data two; set two; drop Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9;

%mend;

 

Thank you for your help.

Regards,

Tim

 


Accepted Solutions
Solution
‎08-21-2017 10:19 PM
Super User
Posts: 13,023

Re: Processing all the files from a folder

I am not going to pretend to get what the "insect" coding is doing.

You might want to take a look at this example and see how close it gets to your 2 and 3 requirements.

Note that reading from the same folder with the same file format into a single data set is very simple when header rows are not involved.

 

data want;
   length file fn $ 100;
  infile "C:\Users\tebert\Desktop\Data_Folder\*.txt" lrecl=1000 dsd missover dlm=',' filename=fn;
  informat line $100. var1-var10 best16.;
  input line
        / var1-var10;
  file=fn;
run;

The filename option on the INFILE statement keeps track of the name of the current file read but is temporary (it will not be added to the output data set) so you have to create another variable to save the value in. Since the path + file is more than 8 characters you want to set a length for the filename and permanent variable BEFORE it is referenced the first time.

 

The / in the input is to read to two line structure in to a single record. The dlm=',' is because with the first line in quotes it is easy to read as comma delimited as the first line will terminate with the end of line character after the second quote.

I have no idea why you were reading the data into character variables. I you need that then set the informat accordingly.

View solution in original post


All Replies
Super User
Posts: 22,844

Re: Processing all the files from a folder

https://communities.sas.com/t5/SAS-Communities-Library/How-do-I-write-a-macro-to-import-multiple-tex...

 

See this post on how to modify your current code to create an automated process to read all in at once.

Solution
‎08-21-2017 10:19 PM
Super User
Posts: 13,023

Re: Processing all the files from a folder

I am not going to pretend to get what the "insect" coding is doing.

You might want to take a look at this example and see how close it gets to your 2 and 3 requirements.

Note that reading from the same folder with the same file format into a single data set is very simple when header rows are not involved.

 

data want;
   length file fn $ 100;
  infile "C:\Users\tebert\Desktop\Data_Folder\*.txt" lrecl=1000 dsd missover dlm=',' filename=fn;
  informat line $100. var1-var10 best16.;
  input line
        / var1-var10;
  file=fn;
run;

The filename option on the INFILE statement keeps track of the name of the current file read but is temporary (it will not be added to the output data set) so you have to create another variable to save the value in. Since the path + file is more than 8 characters you want to set a length for the filename and permanent variable BEFORE it is referenced the first time.

 

The / in the input is to read to two line structure in to a single record. The dlm=',' is because with the first line in quotes it is easy to read as comma delimited as the first line will terminate with the end of line character after the second quote.

I have no idea why you were reading the data into character variables. I you need that then set the informat accordingly.

Contributor
Posts: 22

Re: Processing all the files from a folder

Thank you. This collapsed a somewhat convoluted program down into a few lines. The old program read the first variable as character because that was the first value. The other variables are then missing. The next line is read using the same format. Alternate lines are moved to two different data sets. Set1 has the character variable, Set2 has the numeric values. The sets are then merged. Your solution is much better. Thank you for describing how the program works. Awesome.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 325 views
  • 6 likes
  • 3 in conversation