I currently have 700 folders that are all sequentially named.
The naming convention of the folders are as follows:-
2011-08-15_2011-08-15
2011-08-16_2011-08-16
2011-08-17_2011-08-17
"
"
2013-09-20_2013-09-20
There are 10 txt files within each folder that have the same naming convention.
With the txt files all being the same, what i am trying to achieve is to automate the infile and then use the name of the folder eg 2011-08-15_2011-08-15 or part of eg. 2011-08-15 to then be the name of the created data set.
I can successfully import all the txt files so there is no issue there, the issue is i dont want to be changing the folder name each time in the infile step.
'C:\SAS data\Extract\2011-08-17_2011-08-17\abc.txt'
is there an easier way to read these files in? I can find code for sequential txt/csv files but can find nothing to reference a folder and then rename the data set.
Any assistance would be greatly appreciated!!
All the info you need is contained in variable "this_file". You only need to extract it. See below code example.
data alldata;
set flist;
infile dummy lrecl=1000 truncover filevar=this_file
filename=myinfile end=done;
attrib
Folder_Name length=$40
Date_From Date_To format=yymmdd10.
Source length=$40
;
Folder_Name=scan(this_file,-2,'/\');
Source=scan(this_file,-1,'/\');
Date_From=input(scan(Folder_Name,1,'_'),yymmdd10.);
Date_To=input(scan(Folder_Name,2,'_'),yymmdd10.);
/* DONE set to 1 when last input record read */
do while(not done);
/* Read all input records from the currently */
/* opened input file, write to alldata */
input string $1000.;
output;
end;
run;
Can you please let us know the naming conventions of .TXT files within each of the folders...And if you are saying that you want to create the SAS Dataset with the name of folder name so my question is like:
If i have 10 .txt files in folder 2011-08-15_2011-08-15 and if i created one SAS Dataset with name of 2011-08-15 so it is like this dataset should include all 10 .txt files or you want separate dataset with each .txt files...
You can easily achive this task once you fix the naming conventions of .txt files and you want separate dataset for folder wise or file wise...
-Urvish
Hi,
Something like:
filename in_dir pipe 'dir "S:\Temp\Rob\Test" /s /b';
data all;
attrib buff format=$200.;
infile in_dir;
input buff $;
if index(upcase(buff),".TXT")=0 then delete;
run;
proc sql;
create table bse (var1 char(20),var2 char(20));
quit;
data _null_;
set all;
call execute('proc import datafile="'||strip(buff)||'" out=tmp dbms=dlm;
getnames=no;
delimiter=",";
run;
proc append base=bse data=tmp;
run;');
run;
Hi Urvish,
The folder will be named
2011-08-15_2011-08-15
The txt file names are as follows:-
Order.txt
plugin.txt
product.txt
registration.txt
details.txt
cart.txt
technical.txt
page.txt
click.txt
So what im trying to achieve is 10 datasets per folder, they will be named with the folder heading in the name:-
2011-08-15_Order
2011-08-15_Plugin
2011-08-15_Product
etc.
So with 700 different folders im hoping there is a way to append it and automate the import process of each directory and the files within.
Hope this makes sense.
Kind Regards,
Ryan
1. Do you now want 1 big data set or 10 or 700?
2. Is the structure of these 10 text files per folder the same or do we need a separate input statement per text file?
About 1) I would go for as few data sets as possible, eventually adding 2 new variables: one for the name of the data source, one with a date derived from the folder name.
This can all be done quite easily with a variation to the code I've already posted earlier.
Hi Patrick,
Thanks for your speedy response.
To answer your latest question i was going to create the latter 700, as i wasnt sure how to create the loop to obtain the 10.
I have tried to implement your code and it came back with an error, i have attached it below.
NOTE: The infile FLIST is:
Unnamed Pipe Access Device,
PROCESS=dir /s/b/a-d C:\Desktop\SAS data\Extract\*.txt,
RECFM=V,LRECL=500
Is there something that i am doing wrong, it is SAS for the PC. So im not sure if that is an issue or not.
I appreciate your help.
I believe the issue here is that you have a blank in the pathname "\SAS data\" which needs extra quoting. Below should do:
filename flist pipe 'dir /s/b/a-d ''C:\Desktop\SAS data\Extract\*.txt''';
Thanks Patrick that did the trick. I have used the code and it works! It has made just one data set which is great.
The only issue i have now, which is from my original request, is i need to somehow use the date in the folder name to create a variable in the file. Each folder has the date range in the naming convention, and without a date within the txt file it is the only way to identify to timestamp the data.
Is there a way that i can append the date from the folder to the file so i can identify data by date.
eg. folder name
2011-08-15_2011-08-15
file
order.txt
datalines
order revenue date
123456 75.00 2011-08-15
Thanks again
Ryan
Hi,
Well, not sure wildcards can do that. This is why I tend to go for the approach of create the list of items to process then generate the necessary code as below (updated to data step read as proc import seems to be out of favour). Suppose it depends on how complicated/much needed these things are. For instance if you need to create reports on the files, i.e. how many obs, titles, getting SVN trails etc. Just seems simpler to create the list first.
--
filename in_dir pipe 'dir "S:\Temp\Rob\Test" /s /b';
data all;
attrib buff format=$200.;
infile in_dir;
input buff $;
if index(upcase(buff),".TXT")=0 then delete;
date_var=scan(buff,5,"\");
run;
proc sql;
create table bse (var1 char(1000),var2 char(1000),var3 char(10));
quit;
data _null_;
set all;
call execute('data tmp;
attrib var1 var2 format=$1000. var3 format=$10.;
infile "'||strip(buff)||'" lrecl=1000 truncover dlm=",";
input var1 $ var2 $;
var3="'||strip(date_var)||'";
run;
proc append base=bse data=tmp;
run;');
run;
All the info you need is contained in variable "this_file". You only need to extract it. See below code example.
data alldata;
set flist;
infile dummy lrecl=1000 truncover filevar=this_file
filename=myinfile end=done;
attrib
Folder_Name length=$40
Date_From Date_To format=yymmdd10.
Source length=$40
;
Folder_Name=scan(this_file,-2,'/\');
Source=scan(this_file,-1,'/\');
Date_From=input(scan(Folder_Name,1,'_'),yymmdd10.);
Date_To=input(scan(Folder_Name,2,'_'),yymmdd10.);
/* DONE set to 1 when last input record read */
do while(not done);
/* Read all input records from the currently */
/* opened input file, write to alldata */
input string $1000.;
output;
end;
run;
Thanks Patrick, really appreciate all your help.
It worked a treat!
Cheers,
Ryan
As I am someone who likes to add a ReadMe.txt to top folders or eventually creates a backup copy of a folder during development, below a bit a more robust code variant (adding some checks):
/* create a data set with the name and path of all files in all folders */
%let date_RegEx=(19|20)\d{2}-(0[1-9]|1[012])-([012]\d|3[01]);
filename flist pipe 'dir /s/b/a-d "c:\temp\*.txt"';
data flist;
infile flist lrecl=500 truncover;
input this_file $500.;
attrib
Folder_Name length=$40
Date_From Date_To format=yymmdd10.
Source length=$40
;
/* remove all rows with folder names not complying with naming convention */
Folder_Name=scan(this_file,-2,'/\');
if prxmatch("/^&date_RegEx._&date_RegEx$/oi",strip(Folder_Name))<1 then delete;
Source=scan(this_file,-1,'/\');
Date_From=input(scan(Folder_Name,1,'_'),yymmdd10.);
Date_To=input(scan(Folder_Name,2,'_ '),yymmdd10.);
run;
filename flist clear;
data alldata;
set flist;
infile dummy lrecl=1000 truncover filevar=this_file end=done;
/* DONE set to 1 when last input record read */
do while(not done);
/* Read all input records from the currently */
/* opened input file, write to alldata */
input string $1000.;
output;
end;
run;
Hi,
Here is my try based on previous comments...
I tried to fully automate the process but unfortunately, SAS is not supporting Macro Variable creation in DIR Command of FILENAME Statement...otherwise it will be very easy to automate...
So as you said you are having 700 Folders, what you can do, you can simply copy and paste the path in following code where i mentioned the comments...It's not tedious task...
After that %FILES_IMPORT Macro will automatically append all the files in a folder and create the SAS dataset based on the folder name...
filename in_dir pipe
'dir "C:\Documents and Settings\2011-08-15_2011-08-15"/b'; /*Change the folder here*/
data all;
attrib buff format=$200.;
infile in_dir;
input buff $;
run;
proc sql noprint;
select buff into :all_files separated by " "
from all;
quit;
%put &all_files.;
%let path = C:\Documents and Settings\2011-08-15_2011-08-15; /*Change the folder here*/
%macro files_import;
%global dsn;
%let dsn = %qscan(&path.,-1,"_");
%let dsn = %sysfunc(compress(&dsn.,"-"));
%put &dsn.;
%let tot_files = %eval(%qsysfunc(countc(&all_files.," "))+1);
%put &tot_files.;
filename myfiles (%do i = 1 %to %eval(&tot_files.-1);
"&path.\%qscan(%str(&all_files.),&i.," ")",
%end;
"&path.\%qscan(%str(&all_files.),&tot_files.," ")");
data ALL_&dsn.;
infile myfiles;
input name $ age;
run;
%mend;
%files_import;
Hope it meets your requirement and will reduce the time to do the repititive task...
-Urvish
First create a data set with all required source files (full path and filename), then read one file after the other. In the example below variable "myinfile" contains the name of the file currently read. You can use this information to populate a permanent "output" variable containing the data source name.
If you want to create multiple SAS data sets (eg. 1 per folder), then you would have to pre-sort data set "flist" and in the second data step read the source and write the output to a hash table. You then use hash method "output". This allows you to dynamically create output data sets during data step execution time (using hash method "output" is the only way allowing this I'm aware of).
/* create a data set with the name and path of all files in all folders */
filename flist pipe 'dir /s/b/a-d c:\temp\*.txt';
data flist;
infile flist lrecl=500 truncover;
input this_file $500.;
/* if not possible to filter sufficiently with the DIR command then add some additional */
/* logic here to only select the desired files */
run;
filename flist clear;
/* now read the data from all files, one at a time */
/* this code based on: "Reading from Multiple Input Files" */
/* http://support.sas.com/documentation/cdl/en/lestmtsref/67175/HTML/default/viewer.htm#n1rill4udj0tfun... */
data alldata;
set flist;
infile dummy lrecl=1000 truncover filevar=this_file
filename=myinfile end=done;
/* DONE set to 1 when last input record read */
do while(not done);
/* Read all input records from the currently */
/* opened input file, write to alldata */
input string $1000.;
output;
end;
put 'Finished reading ' myinfile=;
run;
This SAS community wiki page
http://www.sascommunity.org/wiki/Making_Lists
has programs for making lists of both folders and files.
Ron Fehd macro maven
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.