Hello,
I have made a listing of all the folders and all the files from the root (/folder1/...) using two different find command. But after, reviewing the files listing, I have found that some files have does not have file extension.
Is there a Unix command that I can use in SAS to get the file extension.
Please provide a sample example.
libname dest1 base "/.../Data_Retention/data";
filename folderls pipe "find /folder1/sasdata/ -type d ";
data dest1.folderslisting;
length text $2000.;
infile folderls;
input;
text=_infile_;
run;
filename filelist pipe "find /folder1/sasdata/ -type f ";
data dest1.fileslisting;
length text $1000.;
infile filelist;
input;
text=_infile_;
run;
Do it in SAS:
filename filelist pipe "find /folder1/sasdata/ -type f ";
data dest1.fileslisting;
length text $1000. extension $10;
infile filelist;
input;
text=_infile_;
if countw(scan(text,-1,"/"),".") > 1 then extension = scan(text,-1,".");
run;
On the UNIX side, you would need to apply the cut filter command.
Do it in SAS:
filename filelist pipe "find /folder1/sasdata/ -type f ";
data dest1.fileslisting;
length text $1000. extension $10;
infile filelist;
input;
text=_infile_;
if countw(scan(text,-1,"/"),".") > 1 then extension = scan(text,-1,".");
run;
On the UNIX side, you would need to apply the cut filter command.
Hello Kurt,
I have used exactly a script similar to yours but I found that some files have very large file extension that I am not famillar with. What's the maximum length for a file extension.
I am scanning files that have been created since 2010, therefore I am getting very large file extension, and some file name does not have a dot. So when I scan for file extension, I am getting the complete file name. Any idea how to solve that issue.
Thank in advance for your help.
If you don't have a dot, then you don't have an extension by definition.
The max filename length (space reserved in the directory file) is 255 for UNIX/Linux. You can modify my code to have an ELSE branch where the whole string after the last slash is taken.
File extension are just a convention and not a requirement even on Windows. Certainly since the original DOS filename limitations of 8 characters before the period and 3 after.
And on Unix they have even less meaning/impact.
So you can have filenames with no period in them. Or filenames with multiple periods in them.
And on Unix if you start the filename with a period it is a "hidden" file. Hidden in the sense that you have to add an option to the ls command if you want them to appear in the result.
Here is the code I use in SAS to extract the extension from a filename.
if index(filename,'.')>1 then extension=scan(filename,-1,'.');
The printf action of the find command lets you select what information in what structure you want returned.
I couldn't test below script but I have used similar code in the past.
With below syntax for each folder or file the find command will return a pipe delimited string with a line feed at the end - which in turn makes it easy to read into SAS.
filename ls pipe "find / -printf %nrstr('%h|%f|%y\n')";
/*filename ls pipe "find / -printf %nrstr('%h|%f|%y\n') 2>/dev/null";*/
data work.listing;
infile ls lrecl=1024 truncover dlm='|' dsd;
input path:$900. file_name:$200. file_type:$1.;
run;
filename ls clear
/**
-printf: Possible values for directive %y:
f: Regular file
d: Directory
l: Symbolic link
p: Named pipe (FIFO)
c: Character special file
b: Block special file
s: Socket
**/
The filename in comment with 2>/dev/null is something you could use for dealing with messages in case of insufficient permissions for listing directory content.
Hello @alepage
Your question is "Is there a Unix command that I can use in SAS to get the file extension."
Well there is . Assume that the file has an extension then use the following command to get the extension.
Note filename is the variable holding the name of the file.
${filename##*.}
The following example shows the usage
Thanks. Note that method will have the same trouble as SCAN() does with filenames that do no contain a period. So you would want to first test that the name actually has a period (and perhaps that it does not start with a period) before using that method to extract the characters after the period.
You already have a lot of interesting answers but, just for fun, one more.
You could use BasePlus package's %dirsAndFiles() macro.
It's OS independent, provides data in wide or long format, with details on files an directories or not.
EXAMPLE 1. Get list of files and directories:
%dirsAndFiles(C:\SAS_WORK\,ODS=work.result1)
%dirsAndFiles(~/,ODS=work.result2,details=1)
Bart
P.S. To install and use basePlus package do:
filename SPFinit url "https://raw.githubusercontent.com/yabwon/SAS_PACKAGES/main/SPF/SPFinit.sas";
%include SPFinit; /* enable the framework */
filename packages "</your/directory/for/packages/>";
%installPackage(SPFinit BasePlus)
filename packages "</your/directory/for/packages/>";
%include packages(SPFinit.sas);
%loadPackage(basePlus)
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.