BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BlayLay
Obsidian | Level 7

Hi Everyone,

 

I've seen some posts about reading in txt files and excel files from a folder based on a date, so this may seem like a repeat, however, I'm not familiar enough with it to adapt to my current requirements, so I'm hoping to get a solid example that's applicable.

 

I'm looking for a data step that can scan a specific folder and pull in the csv with the newest date in the file name.

 

I've adapted my file names to read TestFile_20201230 as I've read on other threads that YYYYMMDD is the best approach (I can change this if someone has a better suggestion).

 

Example file path could be 'stage/documents/testreports/TestFile_20201230.csv'

 

Any and all help would be greatly appreciated.

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Do it in steps. 

 

First figure out how to get the list of filenames. 

If your SAS session allows you to run operating system commands then use the PIPE engine to read the output of the ls command.  Note that on Unix a path that does not start with the root node (that is a slash) is considered relative the the current working directory so for this example I have added the leading slash to your path.

data files ;
  input "ls /stage/documents/testreports/TestFile_*.csv" pipe truncover;
  input filename $256. ;
run;

Otherwise look for the method of using the DOPEN() and DREAD() statements to get the list of files in a directory in the SAS documentation.

 

Then you can figure out how to parse out the DATE from the name. 

So if the names always look exactly like your example simple SCAN() and INPUT() functions should do that. 

data file_dates;
  set files;
  date = input(scan(filename,-2,'._'),yymmdd10.);
  format date yymmdd10.;
run;

Note that if the names always use the same prefix and suffix then you can skip extracting the DATE and just sort by the NAME. That is the advantage of using YMD order for the date strings.

 

And then how to find the latest date. 

proc sort data=file_dates;
  by descending date ;
run;

Once you have done that then you can use that value to drive reading the file.

data want ;
  if _n_=1 then set file_dates(obs=1);
   infile csv filevar=filename dsd truncover firstobs=2;
  input var1 var2 .... ;
run;

View solution in original post

5 REPLIES 5
PhilC
Rhodochrosite | Level 12

This is where you the programmer interacts with the operating system  So this is for a UNIX shell (IDK which).  I assume by your slash orientation that you are using UNIX, am I correct?

 

Taking from Reading filenames, date and time from a directory - SAS Support Communities . I don't  use UNIX so I cant verify that this works for my system.  

 

%let subdir =/stage/documents/testreports/;
filename dir pipe "ls -1 &subdir | grep *.csv";

data Pfile;
  length filename $ 300 size $ 20 time $ 20;
  infile dir truncover expandtabs ;
  input filename size time$;
run;

 

But basically you are reading a text file that is the result of a UNIX command's output.  If one wants the file date of certain files, one uses the UNIX command LS.  The GREP command helps filter all lines in the output that contain "csv", hopefully that selects only ".csv" files.  This could be a WHERE statement, but that could get messy in SAS.  GREP is our friend here.

 

Once you read the text file and assign the columns of text to variables in your data step, then you need to select your filename and find the maximum date.  Since your files names have dates you can add a date variable .  You need to check and change my guess of "@10" in the INPUT statement works to get your dates.

data Pfile;
  length filename $ 300 size $ 20 time $ 20;
  infile dir truncover expandtabs ;
  input filename 
        @10 date yymmdd8.;
run;
Tom
Super User Tom
Super User

Do it in steps. 

 

First figure out how to get the list of filenames. 

If your SAS session allows you to run operating system commands then use the PIPE engine to read the output of the ls command.  Note that on Unix a path that does not start with the root node (that is a slash) is considered relative the the current working directory so for this example I have added the leading slash to your path.

data files ;
  input "ls /stage/documents/testreports/TestFile_*.csv" pipe truncover;
  input filename $256. ;
run;

Otherwise look for the method of using the DOPEN() and DREAD() statements to get the list of files in a directory in the SAS documentation.

 

Then you can figure out how to parse out the DATE from the name. 

So if the names always look exactly like your example simple SCAN() and INPUT() functions should do that. 

data file_dates;
  set files;
  date = input(scan(filename,-2,'._'),yymmdd10.);
  format date yymmdd10.;
run;

Note that if the names always use the same prefix and suffix then you can skip extracting the DATE and just sort by the NAME. That is the advantage of using YMD order for the date strings.

 

And then how to find the latest date. 

proc sort data=file_dates;
  by descending date ;
run;

Once you have done that then you can use that value to drive reading the file.

data want ;
  if _n_=1 then set file_dates(obs=1);
   infile csv filevar=filename dsd truncover firstobs=2;
  input var1 var2 .... ;
run;
BlayLay
Obsidian | Level 7

@Tom 

 

I was able to build a list of filenames from the directory, using the below code

 

data file(keep=files);
     rc=filename("mydir","/stage/documents/testreports";
     did = dopen("mydir");
     if did > 0 then
        do i = 1 to dnum(did);
        files=dread(did,i);
        output;
      end;
    rc=dclose(did);
run;

and then followed your remaining steps to parse out and find the latest date. The only remaining question I have is how to only open the file IF the prefix is TestFile_

Tom
Super User Tom
Super User

Just test if the beginning of the name matches that pattern.  For example you could only write the observation if the filename matches.

if files =: 'TestFile_' then output;

Watch out for the case of the letters used in the filenames. SAS string comparisons are case sensitive. And on a UNIX filesystems TestFile, TESTFILE and testfile are three different filenames.

andreas_lds
Jade | Level 19

@BlayLay wrote:

@Tom 

 

I was able to build a list of filenames from the directory, using the below code

 

data file(keep=files);
     rc=filename("mydir","/stage/documents/testreports";
     did = dopen("mydir");
     if did > 0 then
        do i = 1 to dnum(did);
        files=dread(did,i);
        output;
      end;
    rc=dclose(did);
run;

and then followed your remaining steps to parse out and find the latest date. The only remaining question I have is how to only open the file IF the prefix is TestFile_


Check the filename before writing it to the dataset.

Maybe like this:

data file(keep=files);
  rc = filename("mydir", "/stage/documents/testreports";
  did = dopen("mydir");
  
  if did > 0 then do;
    do i = 1 to dnum(did);
      files = dread(did, i);

      if files =: 'TestFile_' then output;
    end;
  end;
  
  rc = dclose(did);
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1325 views
  • 2 likes
  • 4 in conversation