BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
alepage
Barite | Level 11

Hello, 

 

I am trying to modify / adapt a SAS code that I saw on the web to download many s3 bucket object.

But in is example, all the object name are save into a text file and are in the same s3 bucket folder which is not my case

 

In his example, the text file s3filestodownload.txt contains unique name such as :

 

List20.csv

List55.csv

List100.csv

 

while mine contains also the path.

 

Path1/MasterList_1.csv

Path2/MasterList_25.csv

and so on.

 

The difficulty is I don't perfectly understand the option he is using with the infile statement.

 

Heres's the script:

 

 

%macro test(filelist=);
filename filelist "&filelist";

data buildgets ;

/* Read the records containing the name of files to be retrieved from S3                */

infile filelist length=linelen end=eof;
input ObjectName $varying200. linelen;
;
run;
%mend test;
%test(filelist=/.../info/s3filestodownload.txt);

When I am executing his script the dataset buildgets containst only one variable ObjectName

 

 

ObjectName

Path1/MasterList_1.csv

Path2/MasterList_25.csv

 

I would like to have 

Path       ObjectName

Path1      MasterList_1.csv

Path2     MasterList_25.csv

 

How to adapt his script to obtains the two variables 

Please note that I am not famillar with most of the options he is using, so some explanation will be appreciated

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If you are running anything that looks like

data buildgets ;

/* Read the records containing the name of files to be retrieved from S3                */

infile filelist length=linelen end=eof;
input ObjectName $varying200. linelen;
;
run;

that only has one variable name, such as Objectname on the Input statement then of course there is only one variable populated.

IF your data looks exactly like:

Path1/MasterList_1.csv

Path2/MasterList_25.csv

then you could use the Infile option dlm='/' and then have two variables on the Input statement. Specify a length long enough to hold the longest expected path.

But I bet you have edited the "path" to be much shorter and possibly consolidated multiple / used as folder delimiters.
if that is not the case then state so.

If not you can use the automatic variable _infile_ and parse the data for the last /

Something like:

data example;
  input;
  length path object $ 50 ;
  position= find(_infile_,'/',-100);
  path = substr(_infile_,1,position-1);
  object= substr(_infile_,position+1);
  drop position; 
datalines;
some folder/subfoldername/something.txt
otherfolder/thisfolder/thatfolder/anotherfile.txt
;

You could use code such as your current to read the varying length and us it where I have used _infile_.

 

The use of $Varying and Length option I would expect to see accompanied by more code to parse the input line. The $Varying informat is used when you don't know how long a value might be and Length= option variable, or another variable, provides additional information to help input the value. I see know reason the $varying informat was needed. Most of the reasons behind $varying disappeared with the advent of delimited data but you might find a reason in some file formats where there is a variable that indicates how long as subsequent field might be.

Here is a totally contrived use of $varying

data example2;
  input fwidth 1. var1 $varying20. fwidth fwidth2 2. var2 $varying20. fwidth2;
datalines;
3abc14this is longer
1z20yetanotherlongerbit
010abcdefghij
;

Note the absence of any likely delimiter between values. The first character must be a single digit because of the way the informat is used on the input, that tells SAS how many characters to read with the $varying informat (the second appearance on the Input immediately after $varying to tell SAS how many characters) then a two digit number to specify the number of characters to read of the second variable.

Please do note the 0 for the  third line which means a zero-length string is read, i.e. no value.

 

There is a similar $varying format that could be used to create such an admittedly obnoxious file construct.

 

 

If your data in the source files doesn't exceed about 32K per line you could use the _infile_ option, which attempts to hold the current line of a source file when INPUT executes (with or without variables on the Input statement).

 

View solution in original post

1 REPLY 1
ballardw
Super User

If you are running anything that looks like

data buildgets ;

/* Read the records containing the name of files to be retrieved from S3                */

infile filelist length=linelen end=eof;
input ObjectName $varying200. linelen;
;
run;

that only has one variable name, such as Objectname on the Input statement then of course there is only one variable populated.

IF your data looks exactly like:

Path1/MasterList_1.csv

Path2/MasterList_25.csv

then you could use the Infile option dlm='/' and then have two variables on the Input statement. Specify a length long enough to hold the longest expected path.

But I bet you have edited the "path" to be much shorter and possibly consolidated multiple / used as folder delimiters.
if that is not the case then state so.

If not you can use the automatic variable _infile_ and parse the data for the last /

Something like:

data example;
  input;
  length path object $ 50 ;
  position= find(_infile_,'/',-100);
  path = substr(_infile_,1,position-1);
  object= substr(_infile_,position+1);
  drop position; 
datalines;
some folder/subfoldername/something.txt
otherfolder/thisfolder/thatfolder/anotherfile.txt
;

You could use code such as your current to read the varying length and us it where I have used _infile_.

 

The use of $Varying and Length option I would expect to see accompanied by more code to parse the input line. The $Varying informat is used when you don't know how long a value might be and Length= option variable, or another variable, provides additional information to help input the value. I see know reason the $varying informat was needed. Most of the reasons behind $varying disappeared with the advent of delimited data but you might find a reason in some file formats where there is a variable that indicates how long as subsequent field might be.

Here is a totally contrived use of $varying

data example2;
  input fwidth 1. var1 $varying20. fwidth fwidth2 2. var2 $varying20. fwidth2;
datalines;
3abc14this is longer
1z20yetanotherlongerbit
010abcdefghij
;

Note the absence of any likely delimiter between values. The first character must be a single digit because of the way the informat is used on the input, that tells SAS how many characters to read with the $varying informat (the second appearance on the Input immediately after $varying to tell SAS how many characters) then a two digit number to specify the number of characters to read of the second variable.

Please do note the 0 for the  third line which means a zero-length string is read, i.e. no value.

 

There is a similar $varying format that could be used to create such an admittedly obnoxious file construct.

 

 

If your data in the source files doesn't exceed about 32K per line you could use the _infile_ option, which attempts to hold the current line of a source file when INPUT executes (with or without variables on the Input statement).

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 278 views
  • 0 likes
  • 2 in conversation