SAS Enterprise Guide

Desktop productivity for business analysts and programmers
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
alepage
Barite | Level 11

Hello, 

 

I am trying to modify / adapt a SAS code that I saw on the web to download many s3 bucket object.

But in is example, all the object name are save into a text file and are in the same s3 bucket folder which is not my case

 

In his example, the text file s3filestodownload.txt contains unique name such as :

 

List20.csv

List55.csv

List100.csv

 

while mine contains also the path.

 

Path1/MasterList_1.csv

Path2/MasterList_25.csv

and so on.

 

The difficulty is I don't perfectly understand the option he is using with the infile statement.

 

Heres's the script:

 

 

%macro test(filelist=);
filename filelist "&filelist";

data buildgets ;

/* Read the records containing the name of files to be retrieved from S3                */

infile filelist length=linelen end=eof;
input ObjectName $varying200. linelen;
;
run;
%mend test;
%test(filelist=/.../info/s3filestodownload.txt);

When I am executing his script the dataset buildgets containst only one variable ObjectName

 

 

ObjectName

Path1/MasterList_1.csv

Path2/MasterList_25.csv

 

I would like to have 

Path       ObjectName

Path1      MasterList_1.csv

Path2     MasterList_25.csv

 

How to adapt his script to obtains the two variables 

Please note that I am not famillar with most of the options he is using, so some explanation will be appreciated

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If you are running anything that looks like

data buildgets ;

/* Read the records containing the name of files to be retrieved from S3                */

infile filelist length=linelen end=eof;
input ObjectName $varying200. linelen;
;
run;

that only has one variable name, such as Objectname on the Input statement then of course there is only one variable populated.

IF your data looks exactly like:

Path1/MasterList_1.csv

Path2/MasterList_25.csv

then you could use the Infile option dlm='/' and then have two variables on the Input statement. Specify a length long enough to hold the longest expected path.

But I bet you have edited the "path" to be much shorter and possibly consolidated multiple / used as folder delimiters.
if that is not the case then state so.

If not you can use the automatic variable _infile_ and parse the data for the last /

Something like:

data example;
  input;
  length path object $ 50 ;
  position= find(_infile_,'/',-100);
  path = substr(_infile_,1,position-1);
  object= substr(_infile_,position+1);
  drop position; 
datalines;
some folder/subfoldername/something.txt
otherfolder/thisfolder/thatfolder/anotherfile.txt
;

You could use code such as your current to read the varying length and us it where I have used _infile_.

 

The use of $Varying and Length option I would expect to see accompanied by more code to parse the input line. The $Varying informat is used when you don't know how long a value might be and Length= option variable, or another variable, provides additional information to help input the value. I see know reason the $varying informat was needed. Most of the reasons behind $varying disappeared with the advent of delimited data but you might find a reason in some file formats where there is a variable that indicates how long as subsequent field might be.

Here is a totally contrived use of $varying

data example2;
  input fwidth 1. var1 $varying20. fwidth fwidth2 2. var2 $varying20. fwidth2;
datalines;
3abc14this is longer
1z20yetanotherlongerbit
010abcdefghij
;

Note the absence of any likely delimiter between values. The first character must be a single digit because of the way the informat is used on the input, that tells SAS how many characters to read with the $varying informat (the second appearance on the Input immediately after $varying to tell SAS how many characters) then a two digit number to specify the number of characters to read of the second variable.

Please do note the 0 for the  third line which means a zero-length string is read, i.e. no value.

 

There is a similar $varying format that could be used to create such an admittedly obnoxious file construct.

 

 

If your data in the source files doesn't exceed about 32K per line you could use the _infile_ option, which attempts to hold the current line of a source file when INPUT executes (with or without variables on the Input statement).

 

View solution in original post

1 REPLY 1
ballardw
Super User

If you are running anything that looks like

data buildgets ;

/* Read the records containing the name of files to be retrieved from S3                */

infile filelist length=linelen end=eof;
input ObjectName $varying200. linelen;
;
run;

that only has one variable name, such as Objectname on the Input statement then of course there is only one variable populated.

IF your data looks exactly like:

Path1/MasterList_1.csv

Path2/MasterList_25.csv

then you could use the Infile option dlm='/' and then have two variables on the Input statement. Specify a length long enough to hold the longest expected path.

But I bet you have edited the "path" to be much shorter and possibly consolidated multiple / used as folder delimiters.
if that is not the case then state so.

If not you can use the automatic variable _infile_ and parse the data for the last /

Something like:

data example;
  input;
  length path object $ 50 ;
  position= find(_infile_,'/',-100);
  path = substr(_infile_,1,position-1);
  object= substr(_infile_,position+1);
  drop position; 
datalines;
some folder/subfoldername/something.txt
otherfolder/thisfolder/thatfolder/anotherfile.txt
;

You could use code such as your current to read the varying length and us it where I have used _infile_.

 

The use of $Varying and Length option I would expect to see accompanied by more code to parse the input line. The $Varying informat is used when you don't know how long a value might be and Length= option variable, or another variable, provides additional information to help input the value. I see know reason the $varying informat was needed. Most of the reasons behind $varying disappeared with the advent of delimited data but you might find a reason in some file formats where there is a variable that indicates how long as subsequent field might be.

Here is a totally contrived use of $varying

data example2;
  input fwidth 1. var1 $varying20. fwidth fwidth2 2. var2 $varying20. fwidth2;
datalines;
3abc14this is longer
1z20yetanotherlongerbit
010abcdefghij
;

Note the absence of any likely delimiter between values. The first character must be a single digit because of the way the informat is used on the input, that tells SAS how many characters to read with the $varying informat (the second appearance on the Input immediately after $varying to tell SAS how many characters) then a two digit number to specify the number of characters to read of the second variable.

Please do note the 0 for the  third line which means a zero-length string is read, i.e. no value.

 

There is a similar $varying format that could be used to create such an admittedly obnoxious file construct.

 

 

If your data in the source files doesn't exceed about 32K per line you could use the _infile_ option, which attempts to hold the current line of a source file when INPUT executes (with or without variables on the Input statement).

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 540 views
  • 0 likes
  • 2 in conversation