proc import substr

Reply
Valued Guide
Posts: 854

proc import substr

I am pulling one field from a monthly file into a much larger flat file.  This files name changed every month. Looking at the history there is inconsistency in the naming convention of the month.  I told the end user to knock it off but it raised a question in my head.  I wonder if there is way to only pull a substring of a folder.  I'll put an example below:

v:\folder1\folder2\2-Feb\textfile.txt

v:\folder1\folder2\3-march\textfile.txt

is there a way to pull only the first charachter as a substring for month, formatted as month.?

I have the date macro for month formatted as month. but I haven't been able to pull the string in the filename.

Thanks,

Valued Guide
Posts: 2,173

Re: proc import substr

I wonder if there is way to only pull a substring of a folder.

v:\folder1\folder2\2-Feb\textfile.txt

v:\folder1\folder2\3-march\textfile.txt

%let monthpart= %scan( &pathandfile, -2, \ ) ;

part -1 would be the filename

and if you want that number that comes before the month abbr, (and assuming the number would become 10, 11 and 12)

%let num_part = %scan(&monthpart, 1, %str(-\) ) ;

I use %str() to protect that - from being mistaken for part of a formula

Super User
Super User
Posts: 6,304

Re: proc import substr

Is just the third folder name that is giving you problems?

Are you trying to convert the inconsistent date descriptions into dates?

Or are you trying to find the files based on a specific date request?

How are you getting the file names?  Are they passed to you as a macro variable? As data in a text file or existing dataset?

If you have the name already in a string then SCAN() is a good tool.  You can even truncate the month part at three characters using SUBSTR().

Let's play with your examples in a simple data step.

data dirs ;

  input filename $80.;

folder3=scan(filename,4,'\');

last_folder=scan(filename,-2,'\');

date = input(catx('-',scan(last_folder,1,'-')

                      ,substrn(scan(last_folder,2,'-'),1,3)

                      ,year(today())),date11.)

;

  format date yymmdd10.;

  put (_n_ _all_) (=/);

cards;

v:\folder1\folder2\2-Feb\textfile.txt

v:\folder1\folder2\3-march\textfile.txt

run;

_N_=1

filename=v:\folder1\folder2\2-Feb\textfile.txt

folder3=2-Feb

last_folder=2-Feb

date=2014-02-02

_N_=2

filename=v:\folder1\folder2\3-march\textfile.txt

folder3=3-march

last_folder=3-march

date=2014-03-03

NOTE: The data set WORK.DIRS has 2 observations and 4 variables.

Ask a Question
Discussion stats
  • 2 replies
  • 205 views
  • 0 likes
  • 3 in conversation