Hello,
I have made a listing of all the folders and all the files from the root (/folder1/...) using two different find command. But after, reviewing the files listing, I have found that some files have does not have file extension.
Is there a Unix command that I can use in SAS to get the file extension.
Please provide a sample example.
libname dest1 base "/.../Data_Retention/data";
filename folderls pipe "find /folder1/sasdata/ -type d ";
data dest1.folderslisting;
length text $2000.;
infile folderls;
input;
text=_infile_;
run;
filename filelist pipe "find /folder1/sasdata/ -type f ";
data dest1.fileslisting;
length text $1000.;
infile filelist;
input;
text=_infile_;
run;
Do it in SAS:
filename filelist pipe "find /folder1/sasdata/ -type f ";
data dest1.fileslisting;
length text $1000. extension $10;
infile filelist;
input;
text=_infile_;
if countw(scan(text,-1,"/"),".") > 1 then extension = scan(text,-1,".");
run;
On the UNIX side, you would need to apply the cut filter command.
Do it in SAS:
filename filelist pipe "find /folder1/sasdata/ -type f ";
data dest1.fileslisting;
length text $1000. extension $10;
infile filelist;
input;
text=_infile_;
if countw(scan(text,-1,"/"),".") > 1 then extension = scan(text,-1,".");
run;
On the UNIX side, you would need to apply the cut filter command.
Hello Kurt,
I have used exactly a script similar to yours but I found that some files have very large file extension that I am not famillar with. What's the maximum length for a file extension.
I am scanning files that have been created since 2010, therefore I am getting very large file extension, and some file name does not have a dot. So when I scan for file extension, I am getting the complete file name. Any idea how to solve that issue.
Thank in advance for your help.
If you don't have a dot, then you don't have an extension by definition.
The max filename length (space reserved in the directory file) is 255 for UNIX/Linux. You can modify my code to have an ELSE branch where the whole string after the last slash is taken.
File extension are just a convention and not a requirement even on Windows. Certainly since the original DOS filename limitations of 8 characters before the period and 3 after.
And on Unix they have even less meaning/impact.
So you can have filenames with no period in them. Or filenames with multiple periods in them.
And on Unix if you start the filename with a period it is a "hidden" file. Hidden in the sense that you have to add an option to the ls command if you want them to appear in the result.
Here is the code I use in SAS to extract the extension from a filename.
if index(filename,'.')>1 then extension=scan(filename,-1,'.');
The printf action of the find command lets you select what information in what structure you want returned.
I couldn't test below script but I have used similar code in the past.
With below syntax for each folder or file the find command will return a pipe delimited string with a line feed at the end - which in turn makes it easy to read into SAS.
filename ls pipe "find / -printf %nrstr('%h|%f|%y\n')";
/*filename ls pipe "find / -printf %nrstr('%h|%f|%y\n') 2>/dev/null";*/
data work.listing;
infile ls lrecl=1024 truncover dlm='|' dsd;
input path:$900. file_name:$200. file_type:$1.;
run;
filename ls clear
/**
-printf: Possible values for directive %y:
f: Regular file
d: Directory
l: Symbolic link
p: Named pipe (FIFO)
c: Character special file
b: Block special file
s: Socket
**/
The filename in comment with 2>/dev/null is something you could use for dealing with messages in case of insufficient permissions for listing directory content.
Hello @alepage
Your question is "Is there a Unix command that I can use in SAS to get the file extension."
Well there is . Assume that the file has an extension then use the following command to get the extension.
Note filename is the variable holding the name of the file.
${filename##*.}
The following example shows the usage
Thanks. Note that method will have the same trouble as SCAN() does with filenames that do no contain a period. So you would want to first test that the name actually has a period (and perhaps that it does not start with a period) before using that method to extract the characters after the period.
You already have a lot of interesting answers but, just for fun, one more.
You could use BasePlus package's %dirsAndFiles() macro.
It's OS independent, provides data in wide or long format, with details on files an directories or not.
EXAMPLE 1. Get list of files and directories:
%dirsAndFiles(C:\SAS_WORK\,ODS=work.result1)
%dirsAndFiles(~/,ODS=work.result2,details=1)
Bart
P.S. To install and use basePlus package do:
filename SPFinit url "https://raw.githubusercontent.com/yabwon/SAS_PACKAGES/main/SPF/SPFinit.sas";
%include SPFinit; /* enable the framework */
filename packages "</your/directory/for/packages/>";
%installPackage(SPFinit BasePlus)
filename packages "</your/directory/for/packages/>";
%include packages(SPFinit.sas);
%loadPackage(basePlus)
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.