Hello,
I am trying to modify / adapt a SAS code that I saw on the web to download many s3 bucket object.
But in is example, all the object name are save into a text file and are in the same s3 bucket folder which is not my case
In his example, the text file s3filestodownload.txt contains unique name such as :
List20.csv
List55.csv
List100.csv
while mine contains also the path.
Path1/MasterList_1.csv
Path2/MasterList_25.csv
and so on.
The difficulty is I don't perfectly understand the option he is using with the infile statement.
Heres's the script:
%macro test(filelist=);
filename filelist "&filelist";
data buildgets ;
/* Read the records containing the name of files to be retrieved from S3 */
infile filelist length=linelen end=eof;
input ObjectName $varying200. linelen;
;
run;
%mend test;
%test(filelist=/.../info/s3filestodownload.txt);
When I am executing his script the dataset buildgets containst only one variable ObjectName
ObjectName
Path1/MasterList_1.csv
Path2/MasterList_25.csv
I would like to have
Path ObjectName
Path1 MasterList_1.csv
Path2 MasterList_25.csv
How to adapt his script to obtains the two variables
Please note that I am not famillar with most of the options he is using, so some explanation will be appreciated
If you are running anything that looks like
data buildgets ; /* Read the records containing the name of files to be retrieved from S3 */ infile filelist length=linelen end=eof; input ObjectName $varying200. linelen; ; run;
that only has one variable name, such as Objectname on the Input statement then of course there is only one variable populated.
IF your data looks exactly like:
Path1/MasterList_1.csv Path2/MasterList_25.csv
then you could use the Infile option dlm='/' and then have two variables on the Input statement. Specify a length long enough to hold the longest expected path.
But I bet you have edited the "path" to be much shorter and possibly consolidated multiple / used as folder delimiters.
if that is not the case then state so.
If not you can use the automatic variable _infile_ and parse the data for the last /
Something like:
data example; input; length path object $ 50 ; position= find(_infile_,'/',-100); path = substr(_infile_,1,position-1); object= substr(_infile_,position+1); drop position; datalines; some folder/subfoldername/something.txt otherfolder/thisfolder/thatfolder/anotherfile.txt ;
You could use code such as your current to read the varying length and us it where I have used _infile_.
The use of $Varying and Length option I would expect to see accompanied by more code to parse the input line. The $Varying informat is used when you don't know how long a value might be and Length= option variable, or another variable, provides additional information to help input the value. I see know reason the $varying informat was needed. Most of the reasons behind $varying disappeared with the advent of delimited data but you might find a reason in some file formats where there is a variable that indicates how long as subsequent field might be.
Here is a totally contrived use of $varying
data example2; input fwidth 1. var1 $varying20. fwidth fwidth2 2. var2 $varying20. fwidth2; datalines; 3abc14this is longer 1z20yetanotherlongerbit
010abcdefghij
;
Note the absence of any likely delimiter between values. The first character must be a single digit because of the way the informat is used on the input, that tells SAS how many characters to read with the $varying informat (the second appearance on the Input immediately after $varying to tell SAS how many characters) then a two digit number to specify the number of characters to read of the second variable.
Please do note the 0 for the third line which means a zero-length string is read, i.e. no value.
There is a similar $varying format that could be used to create such an admittedly obnoxious file construct.
If your data in the source files doesn't exceed about 32K per line you could use the _infile_ option, which attempts to hold the current line of a source file when INPUT executes (with or without variables on the Input statement).
If you are running anything that looks like
data buildgets ; /* Read the records containing the name of files to be retrieved from S3 */ infile filelist length=linelen end=eof; input ObjectName $varying200. linelen; ; run;
that only has one variable name, such as Objectname on the Input statement then of course there is only one variable populated.
IF your data looks exactly like:
Path1/MasterList_1.csv Path2/MasterList_25.csv
then you could use the Infile option dlm='/' and then have two variables on the Input statement. Specify a length long enough to hold the longest expected path.
But I bet you have edited the "path" to be much shorter and possibly consolidated multiple / used as folder delimiters.
if that is not the case then state so.
If not you can use the automatic variable _infile_ and parse the data for the last /
Something like:
data example; input; length path object $ 50 ; position= find(_infile_,'/',-100); path = substr(_infile_,1,position-1); object= substr(_infile_,position+1); drop position; datalines; some folder/subfoldername/something.txt otherfolder/thisfolder/thatfolder/anotherfile.txt ;
You could use code such as your current to read the varying length and us it where I have used _infile_.
The use of $Varying and Length option I would expect to see accompanied by more code to parse the input line. The $Varying informat is used when you don't know how long a value might be and Length= option variable, or another variable, provides additional information to help input the value. I see know reason the $varying informat was needed. Most of the reasons behind $varying disappeared with the advent of delimited data but you might find a reason in some file formats where there is a variable that indicates how long as subsequent field might be.
Here is a totally contrived use of $varying
data example2; input fwidth 1. var1 $varying20. fwidth fwidth2 2. var2 $varying20. fwidth2; datalines; 3abc14this is longer 1z20yetanotherlongerbit
010abcdefghij
;
Note the absence of any likely delimiter between values. The first character must be a single digit because of the way the informat is used on the input, that tells SAS how many characters to read with the $varying informat (the second appearance on the Input immediately after $varying to tell SAS how many characters) then a two digit number to specify the number of characters to read of the second variable.
Please do note the 0 for the third line which means a zero-length string is read, i.e. no value.
There is a similar $varying format that could be used to create such an admittedly obnoxious file construct.
If your data in the source files doesn't exceed about 32K per line you could use the _infile_ option, which attempts to hold the current line of a source file when INPUT executes (with or without variables on the Input statement).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.