BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rjnc13
Calcite | Level 5

I currently have 700 folders that are all sequentially named.

The naming convention of the folders are as follows:-

2011-08-15_2011-08-15

2011-08-16_2011-08-16

2011-08-17_2011-08-17

"

"

2013-09-20_2013-09-20

There are 10 txt files within each folder that have the same naming convention.

With the txt files all being the same, what i am trying to achieve is to automate the infile and then use the name of the folder eg 2011-08-15_2011-08-15 or part of eg. 2011-08-15 to then be the name of the created data set.

I can successfully import all the txt files so there is no issue there, the issue is i dont want to be changing the folder name each time in the infile step.

'C:\SAS data\Extract\2011-08-17_2011-08-17\abc.txt'

is there an easier way to read these files in? I can find code for sequential txt/csv files but can find nothing to reference a folder and then rename the data set.

Any assistance would be greatly appreciated!!

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

All the info you need is contained in variable "this_file". You only need to extract it. See below code example.

data alldata;

  set flist;

  infile dummy lrecl=1000 truncover filevar=this_file

    filename=myinfile end=done;

  attrib

    Folder_Name length=$40

    Date_From Date_To format=yymmdd10.

    Source length=$40

    ;

  Folder_Name=scan(this_file,-2,'/\');

  Source=scan(this_file,-1,'/\');

  Date_From=input(scan(Folder_Name,1,'_'),yymmdd10.);

  Date_To=input(scan(Folder_Name,2,'_'),yymmdd10.);

  /* DONE set to 1 when last input record read  */

  do while(not done);

    /* Read all input records from the currently  */

    /* opened input file, write to alldata       */

    input string $1000.;

    output;

  end;

run;

View solution in original post

15 REPLIES 15
UrvishShah
Fluorite | Level 6

Can you please let us know the naming conventions of .TXT files within each of the folders...And if you are saying that you want to create the SAS Dataset with the name of folder name so my question is like:

If i have 10 .txt files in folder 2011-08-15_2011-08-15 and if i created one SAS Dataset with name of 2011-08-15 so it is like this dataset should include all 10 .txt files or you want separate dataset with each .txt files...

You can easily achive this task once you fix the naming conventions of .txt files and you want separate dataset for folder wise or file wise...

-Urvish

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Something like:

filename in_dir pipe 'dir "S:\Temp\Rob\Test" /s /b';

data all;
  attrib buff format=$200.;
  infile in_dir;
  input buff $;
  if index(upcase(buff),".TXT")=0 then delete;
run;

proc sql;
  create table bse (var1 char(20),var2 char(20));
quit;

data _null_;
  set all;
  call execute('proc import datafile="'||strip(buff)||'" out=tmp dbms=dlm;
                  getnames=no;
                  delimiter=",";
                run;
                proc append base=bse data=tmp;
                run;');
run;

rjnc13
Calcite | Level 5

Hi Urvish,

The folder will be named

2011-08-15_2011-08-15

The txt file names are as follows:-

Order.txt

plugin.txt

product.txt

registration.txt

details.txt

cart.txt

technical.txt

page.txt

click.txt

So what im trying to achieve is 10 datasets per folder, they will be named with the folder heading in the name:-

2011-08-15_Order

2011-08-15_Plugin

2011-08-15_Product

etc.

So with 700 different folders im hoping there is a way to append it and automate the import process of each directory and the files within.


Hope this makes sense.


Kind Regards,

Ryan

Patrick
Opal | Level 21

1. Do you now want 1 big data set or 10 or 700?

2. Is the structure of these 10 text files per folder the same or do we need a separate input statement per text file?

About 1) I would go for as few data sets as possible, eventually adding 2 new variables: one for the name of the data source, one with a date derived from the folder name.

This can all be done quite easily with a variation to the code I've already posted earlier.

rjnc13
Calcite | Level 5

Hi Patrick,

Thanks for your speedy response.

To answer your latest question i was going to create the latter 700, as i wasnt sure how to create the loop to obtain the 10.

I have tried to implement your code and it came back with an error, i have attached it below.

NOTE: The infile FLIST is:

      Unnamed Pipe Access Device,

      PROCESS=dir /s/b/a-d C:\Desktop\SAS data\Extract\*.txt,

      RECFM=V,LRECL=500

Is there something that i am doing wrong, it is SAS for the PC. So im not sure if that is an issue or not.

I appreciate your help.

Patrick
Opal | Level 21

I believe the issue here is that you have a blank in the pathname "\SAS data\" which needs extra quoting. Below should do:

filename flist pipe 'dir /s/b/a-d ''C:\Desktop\SAS data\Extract\*.txt''';

rjnc13
Calcite | Level 5

Thanks Patrick that did the trick. I have used the code and it works! It has made just one data set which is great.

The only issue i have now, which is from my original request, is i need to somehow use the date in the folder name to create a variable in the file. Each folder has the date range in the naming convention, and without a date within the txt file it is the only way to identify to timestamp the data.

Is there a way that i can append the date from the folder to the file so i can identify data by date.

eg. folder name

2011-08-15_2011-08-15

file

order.txt

datalines

order           revenue              date

123456           75.00              2011-08-15


Thanks again


Ryan

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Well, not sure wildcards can do that.  This is why I tend to go for the approach of create the list of items to process then generate the necessary code as below (updated to data step read as proc import seems to be out of favour).  Suppose it depends on how complicated/much needed these things are.  For instance if you need to create reports on the files, i.e. how many obs, titles, getting SVN trails etc.  Just seems simpler to create the list first.

--

filename in_dir pipe 'dir "S:\Temp\Rob\Test" /s /b';

data all;
  attrib buff format=$200.;
  infile in_dir;
  input buff $;
  if index(upcase(buff),".TXT")=0 then delete;
  date_var=scan(buff,5,"\");
run;

proc sql;
  create table bse (var1 char(1000),var2 char(1000),var3 char(10));
quit;

data _null_;
  set all;
  call execute('data tmp;
                  attrib var1 var2 format=$1000. var3 format=$10.;
                  infile "'||strip(buff)||'" lrecl=1000 truncover dlm=",";
                  input var1 $ var2 $;
                  var3="'||strip(date_var)||'";
                run;
                proc append base=bse data=tmp;
                run;');
run;

Patrick
Opal | Level 21

All the info you need is contained in variable "this_file". You only need to extract it. See below code example.

data alldata;

  set flist;

  infile dummy lrecl=1000 truncover filevar=this_file

    filename=myinfile end=done;

  attrib

    Folder_Name length=$40

    Date_From Date_To format=yymmdd10.

    Source length=$40

    ;

  Folder_Name=scan(this_file,-2,'/\');

  Source=scan(this_file,-1,'/\');

  Date_From=input(scan(Folder_Name,1,'_'),yymmdd10.);

  Date_To=input(scan(Folder_Name,2,'_'),yymmdd10.);

  /* DONE set to 1 when last input record read  */

  do while(not done);

    /* Read all input records from the currently  */

    /* opened input file, write to alldata       */

    input string $1000.;

    output;

  end;

run;

rjnc13
Calcite | Level 5

Thanks Patrick, really appreciate all your help.

It worked a treat!

Cheers,

Ryan

Patrick
Opal | Level 21

As I am someone who likes to add a ReadMe.txt to top folders or eventually creates a backup copy of a folder during development, below a bit a more robust code variant (adding some checks):

/* create a data set with the name and path of all files in all folders */
%let date_RegEx=(19|20)\d{2}-(0[1-9]|1[012])-([012]\d|3[01]);

filename flist pipe 'dir /s/b/a-d "c:\temp\*.txt"';

data flist;
  infile flist lrecl=500 truncover;
  input this_file $500.;
  attrib
    Folder_Name length=$40
    Date_From Date_To format=yymmdd10.
    Source length=$40
  ;

  /* remove all rows with folder names not complying with naming convention */
  Folder_Name=scan(this_file,-2,'/\');
  if prxmatch("/^&date_RegEx._&date_RegEx$/oi",strip(Folder_Name))<1 then delete;

  Source=scan(this_file,-1,'/\');
  Date_From=input(scan(Folder_Name,1,'_'),yymmdd10.);
  Date_To=input(scan(Folder_Name,2,'_ '),yymmdd10.);
run;

filename flist clear;

data alldata;
  set flist;
  infile dummy lrecl=1000 truncover filevar=this_file end=done;
  /* DONE set to 1 when last input record read  */
  do while(not done);
    /* Read all input records from the currently  */
    /* opened input file, write to alldata       */
    input string $1000.;
    output;
  end;
run;

UrvishShah
Fluorite | Level 6

Hi,

Here is my try based on previous comments...

I tried to fully automate the process but unfortunately, SAS is not supporting Macro Variable creation in DIR Command of FILENAME Statement...otherwise it will be very easy to automate...

So as you said you are having 700 Folders, what you can do, you can simply copy and paste the path in following code where i mentioned the comments...It's not tedious task...

After that %FILES_IMPORT Macro will automatically append all the files in a folder and create the SAS dataset based on the folder name...

filename in_dir pipe

       'dir "C:\Documents and Settings\2011-08-15_2011-08-15"/b'; /*Change the folder here*/

data all;

  attrib buff format=$200.;

  infile in_dir;

  input buff $;

run;

proc sql noprint;

   select buff into :all_files separated by " "

   from all;

quit;

%put &all_files.;

%let path = C:\Documents and Settings\2011-08-15_2011-08-15; /*Change the folder here*/

%macro files_import;

   %global dsn;

   %let dsn = %qscan(&path.,-1,"_");

   %let dsn = %sysfunc(compress(&dsn.,"-"));

   %put &dsn.;

   %let tot_files = %eval(%qsysfunc(countc(&all_files.," "))+1);

   %put &tot_files.;

   filename myfiles (%do i = 1 %to %eval(&tot_files.-1);

                        "&path.\%qscan(%str(&all_files.),&i.," ")",

                    %end;

                        "&path.\%qscan(%str(&all_files.),&tot_files.," ")");

   data ALL_&dsn.;

      infile myfiles;

      input name $ age;

   run;

%mend;

%files_import;

Hope it meets your requirement and will reduce the time to do the repititive task...

-Urvish

Patrick
Opal | Level 21

First create a data set with all required source files (full path and filename), then read one file after the other. In the example below variable "myinfile" contains the name of the file currently read. You can use this information to populate a permanent "output" variable containing the data source name.

If you want to create multiple SAS data sets (eg. 1 per folder), then you would have to pre-sort data set "flist" and in the second data step read the source and write the output to a hash table. You then use hash method "output". This allows you to dynamically create output data sets during data step execution time (using hash method "output" is the only way allowing this I'm aware of).

/* create a data set with the name and path of all files in all folders */
filename flist pipe 'dir /s/b/a-d c:\temp\*.txt';
data flist;
  infile flist lrecl=500 truncover;
  input this_file $500.;
  /* if not possible to filter sufficiently with the DIR command then add some additional */
  /* logic here to only select the desired files                                        */
run;
filename flist clear;

/* now read the data from all files, one at a time                                                       */
/* this code based on: "Reading from Multiple Input Files"                           */
/* http://support.sas.com/documentation/cdl/en/lestmtsref/67175/HTML/default/viewer.htm#n1rill4udj0tfun... */
data alldata;
  set flist;
  infile dummy lrecl=1000 truncover filevar=this_file
    filename=myinfile end=done;

  /* DONE set to 1 when last input record read  */
  do while(not done);
    /* Read all input records from the currently  */
    /* opened input file, write to alldata       */
    input string $1000.;
    output;
  end;

  put 'Finished reading ' myinfile=;
run;

Ron_MacroMaven
Lapis Lazuli | Level 10

This SAS community wiki page

http://www.sascommunity.org/wiki/Making_Lists

has programs for making lists of both folders and files.

Ron Fehd  macro maven

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 9755 views
  • 7 likes
  • 5 in conversation