Solved: How to automate and import files that are located in date sequential f...

rjnc13 · Posted 03-05-2014 02:08 AM

I currently have 700 folders that are all sequentially named.

The naming convention of the folders are as follows:-

2011-08-15_2011-08-15

2011-08-16_2011-08-16

2011-08-17_2011-08-17

"

2013-09-20_2013-09-20

There are 10 txt files within each folder that have the same naming convention.

With the txt files all being the same, what i am trying to achieve is to automate the infile and then use the name of the folder eg 2011-08-15_2011-08-15 or part of eg. 2011-08-15 to then be the name of the created data set.

I can successfully import all the txt files so there is no issue there, the issue is i dont want to be changing the folder name each time in the infile step.

'C:\SAS data\Extract\2011-08-17_2011-08-17\abc.txt'

is there an easier way to read these files in? I can find code for sequential txt/csv files but can find nothing to reference a folder and then rename the data set.

Any assistance would be greatly appreciated!!

Patrick · Posted 03-06-2014 05:57 PM

All the info you need is contained in variable "this_file". You only need to extract it. See below code example.

data alldata;

set flist;

infile dummy lrecl=1000 truncover filevar=this_file

filename=myinfile end=done;

attrib

Folder_Name length=$40

Date_From Date_To format=yymmdd10.

Source length=$40

;

Folder_Name=scan(this_file,-2,'/\');

Source=scan(this_file,-1,'/\');

Date_From=input(scan(Folder_Name,1,'_'),yymmdd10.);

Date_To=input(scan(Folder_Name,2,'_'),yymmdd10.);

/* DONE set to 1 when last input record read */

do while(not done);

/* Read all input records from the currently */

/* opened input file, write to alldata */

input string $1000.;

output;

end;

run;

View solution in original post

UrvishShah · Posted 03-05-2014 05:37 AM

Can you please let us know the naming conventions of .TXT files within each of the folders...And if you are saying that you want to create the SAS Dataset with the name of folder name so my question is like:

If i have 10 .txt files in folder 2011-08-15_2011-08-15 and if i created one SAS Dataset with name of 2011-08-15 so it is like this dataset should include all 10 .txt files or you want separate dataset with each .txt files...

You can easily achive this task once you fix the naming conventions of .txt files and you want separate dataset for folder wise or file wise...

-Urvish

RW9 · Posted 03-05-2014 06:10 AM

Hi,

Something like:

filename in_dir pipe 'dir "S:\Temp\Rob\Test" /s /b';

data all;
attrib buff format=$200.;
infile in_dir;
input buff $;
if index(upcase(buff),".TXT")=0 then delete;
run;

proc sql;
create table bse (var1 char(20),var2 char(20));
quit;

data _null_;
set all;
call execute('proc import datafile="'||strip(buff)||'" out=tmp dbms=dlm;
                  getnames=no;
                  delimiter=",";
                run;
                proc append base=bse data=tmp;
                run;');
run;

rjnc13 · Posted 03-05-2014 06:46 AM

Hi Urvish,

The folder will be named

2011-08-15_2011-08-15

The txt file names are as follows:-

Order.txt

plugin.txt

product.txt

registration.txt

details.txt

cart.txt

technical.txt

page.txt

click.txt

So what im trying to achieve is 10 datasets per folder, they will be named with the folder heading in the name:-

2011-08-15_Order

2011-08-15_Plugin

2011-08-15_Product

etc.

So with 700 different folders im hoping there is a way to append it and automate the import process of each directory and the files within.

Hope this makes sense.

Kind Regards,

Ryan

Patrick · Posted 03-05-2014 03:28 PM

1. Do you now want 1 big data set or 10 or 700?

2. Is the structure of these 10 text files per folder the same or do we need a separate input statement per text file?

About 1) I would go for as few data sets as possible, eventually adding 2 new variables: one for the name of the data source, one with a date derived from the folder name.

This can all be done quite easily with a variation to the code I've already posted earlier.

rjnc13 · Posted 03-05-2014 09:17 PM

Hi Patrick,

Thanks for your speedy response.

To answer your latest question i was going to create the latter 700, as i wasnt sure how to create the loop to obtain the 10.

I have tried to implement your code and it came back with an error, i have attached it below.

NOTE: The infile FLIST is:

Unnamed Pipe Access Device,

PROCESS=dir /s/b/a-d C:\Desktop\SAS data\Extract\*.txt,

RECFM=V,LRECL=500

Is there something that i am doing wrong, it is SAS for the PC. So im not sure if that is an issue or not.

I appreciate your help.

Patrick · Posted 03-05-2014 11:16 PM

I believe the issue here is that you have a blank in the pathname "\SAS data\" which needs extra quoting. Below should do:

filename flist pipe 'dir /s/b/a-d ''C:\Desktop\SAS data\Extract\*.txt''';

rjnc13 · Posted 03-06-2014 02:39 AM

Thanks Patrick that did the trick. I have used the code and it works! It has made just one data set which is great.

The only issue i have now, which is from my original request, is i need to somehow use the date in the folder name to create a variable in the file. Each folder has the date range in the naming convention, and without a date within the txt file it is the only way to identify to timestamp the data.

Is there a way that i can append the date from the folder to the file so i can identify data by date.

eg. folder name

2011-08-15_2011-08-15

file

order.txt

datalines

order revenue date

123456 75.00 2011-08-15

Thanks again

Ryan

RW9 · Posted 03-06-2014 04:10 AM

Hi,

Well, not sure wildcards can do that. This is why I tend to go for the approach of create the list of items to process then generate the necessary code as below (updated to data step read as proc import seems to be out of favour). Suppose it depends on how complicated/much needed these things are. For instance if you need to create reports on the files, i.e. how many obs, titles, getting SVN trails etc. Just seems simpler to create the list first.

--

filename in_dir pipe 'dir "S:\Temp\Rob\Test" /s /b';

data all;
attrib buff format=$200.;
infile in_dir;
input buff $;
if index(upcase(buff),".TXT")=0 then delete;
date_var=scan(buff,5,"\");
run;

proc sql;
create table bse (var1 char(1000),var2 char(1000),var3 char(10));
quit;

data _null_;
set all;
call execute('data tmp;
                  attrib var1 var2 format=$1000. var3 format=$10.;
                  infile "'||strip(buff)||'" lrecl=1000 truncover dlm=",";
                  input var1 $ var2 $;
                  var3="'||strip(date_var)||'";
                run;
                proc append base=bse data=tmp;
                run;');
run;

Patrick · Posted 03-06-2014 05:57 PM

All the info you need is contained in variable "this_file". You only need to extract it. See below code example.

data alldata;

set flist;

infile dummy lrecl=1000 truncover filevar=this_file

filename=myinfile end=done;

attrib

Folder_Name length=$40

Date_From Date_To format=yymmdd10.

Source length=$40

;

Folder_Name=scan(this_file,-2,'/\');

Source=scan(this_file,-1,'/\');

Date_From=input(scan(Folder_Name,1,'_'),yymmdd10.);

Date_To=input(scan(Folder_Name,2,'_'),yymmdd10.);

/* DONE set to 1 when last input record read */

do while(not done);

/* Read all input records from the currently */

/* opened input file, write to alldata */

input string $1000.;

output;

end;

run;

rjnc13 · Posted 03-06-2014 10:53 PM

Thanks Patrick, really appreciate all your help.

It worked a treat!

Cheers,

Ryan

Patrick · Posted 03-07-2014 04:31 AM

As I am someone who likes to add a ReadMe.txt to top folders or eventually creates a backup copy of a folder during development, below a bit a more robust code variant (adding some checks):

/* create a data set with the name and path of all files in all folders */
%let date_RegEx=(19|20)\d{2}-(0[1-9]|1[012])-([012]\d|3[01]);

filename flist pipe 'dir /s/b/a-d "c:\temp\*.txt"';

data flist;
infile flist lrecl=500 truncover;
input this_file $500.;
attrib
    Folder_Name length=$40
    Date_From Date_To format=yymmdd10.
    Source length=$40
;

/* remove all rows with folder names not complying with naming convention */
Folder_Name=scan(this_file,-2,'/\');
if prxmatch("/^&date_RegEx._&date_RegEx$/oi",strip(Folder_Name))<1 then delete;

Source=scan(this_file,-1,'/\');
Date_From=input(scan(Folder_Name,1,'_'),yymmdd10.);
Date_To=input(scan(Folder_Name,2,'_ '),yymmdd10.);
run;

filename flist clear;

data alldata;
set flist;
infile dummy lrecl=1000 truncover filevar=this_file end=done;
/* DONE set to 1 when last input record read */
do while(not done);
    /* Read all input records from the currently */
    /* opened input file, write to alldata       */
    input string $1000.;
    output;
end;
run;

UrvishShah · Posted 03-06-2014 05:48 AM

Hi,

Here is my try based on previous comments...

I tried to fully automate the process but unfortunately, SAS is not supporting Macro Variable creation in DIR Command of FILENAME Statement...otherwise it will be very easy to automate...

So as you said you are having 700 Folders, what you can do, you can simply copy and paste the path in following code where i mentioned the comments...It's not tedious task...

After that %FILES_IMPORT Macro will automatically append all the files in a folder and create the SAS dataset based on the folder name...

filename in_dir pipe

'dir "C:\Documents and Settings\2011-08-15_2011-08-15"/b'; /*Change the folder here*/

data all;

attrib buff format=$200.;

infile in_dir;

input buff $;

run;

proc sql noprint;

select buff into :all_files separated by " "

from all;

quit;

%put &all_files.;

%let path = C:\Documents and Settings\2011-08-15_2011-08-15; /*Change the folder here*/

%macro files_import;

%global dsn;

%let dsn = %qscan(&path.,-1,"_");

%let dsn = %sysfunc(compress(&dsn.,"-"));

%put &dsn.;

%let tot_files = %eval(%qsysfunc(countc(&all_files.," "))+1);

%put &tot_files.;

filename myfiles (%do i = 1 %to %eval(&tot_files.-1);

"&path.\%qscan(%str(&all_files.),&i.," ")",

%end;

"&path.\%qscan(%str(&all_files.),&tot_files.," ")");

data ALL_&dsn.;

infile myfiles;

input name $ age;

run;

%mend;

%files_import;

Hope it meets your requirement and will reduce the time to do the repititive task...

-Urvish

Patrick · Posted 03-05-2014 06:00 AM

First create a data set with all required source files (full path and filename), then read one file after the other. In the example below variable "myinfile" contains the name of the file currently read. You can use this information to populate a permanent "output" variable containing the data source name.

If you want to create multiple SAS data sets (eg. 1 per folder), then you would have to pre-sort data set "flist" and in the second data step read the source and write the output to a hash table. You then use hash method "output". This allows you to dynamically create output data sets during data step execution time (using hash method "output" is the only way allowing this I'm aware of).

/* create a data set with the name and path of all files in all folders */
filename flist pipe 'dir /s/b/a-d c:\temp\*.txt';
data flist;
infile flist lrecl=500 truncover;
input this_file $500.;
/* if not possible to filter sufficiently with the DIR command then add some additional */
/* logic here to only select the desired files */
run;
filename flist clear;

/* now read the data from all files, one at a time                                                       */
/* this code based on: "Reading from Multiple Input Files"                           */
/* http://support.sas.com/documentation/cdl/en/lestmtsref/67175/HTML/default/viewer.htm#n1rill4udj0tfun... */
data alldata;
set flist;
infile dummy lrecl=1000 truncover filevar=this_file
    filename=myinfile end=done;

/* DONE set to 1 when last input record read */
do while(not done);
    /* Read all input records from the currently */
    /* opened input file, write to alldata       */
    input string $1000.;
    output;
end;

put 'Finished reading ' myinfile=;
run;

Ron_MacroMaven · Posted 03-06-2014 05:36 PM

This SAS community wiki page

http://www.sascommunity.org/wiki/Making_Lists

has programs for making lists of both folders and files.

Ron Fehd macro maven

How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Re: How to automate and import files that are located in date sequential folders into SAS?

Registration is open

SAS Training: Just a Click Away