04-24-2017 08:24 AM
I want dataset with below two variable from one of directory(unix) location where multiple text files are present.. is there any way to do it ??
Name of File, Number of records
04-24-2017 08:35 AM
I suspect you want filename pipe command. I have only done this for windows command prompt but it may work for linux too?
Filename filelist pipe " DIR S:\MyFiles /S /B /A:D "; Data ALLFOLDERS; Infile filelist truncover; Input filename $100.; Run;
Then filter out any files that are not .txt.
Edit: revised to eliminate unnecessary code.
04-24-2017 08:47 AM
Not that I know of in SAS, no. If it were me I would consider creating a text file via another language (e.g. python, etc) and import the text file into SAS. Sorry man!
04-24-2017 08:53 AM
If you can't access the OS from SAS then how are you gonig to get the directory listing? If you have that then you could doa basic read of each file, and keep a count as it reads to get the output, but you need to feed it the list of files.
I would however also question why you have loads of files but don't know what they contain?
04-24-2017 09:24 AM - edited 04-24-2017 09:33 AM
%let path=path_to_your_files; filename oscmd pipe "cd &path.;wc -l *.txt"; data want; length filename $30 lines 8; infile oscmd; input filename lines; if filename ne 'total'; run;
of course this needs XCMD enabled; one of the many reasons I consider disabling XCMD as stupid.
Without XCMD, you can use a wildcard in the infile statement of a data step:
data want (keep=filename lines); length filnam filename $200; infile "&path./*.txt" filename=filnam end=done; retain filename; input; if filnam ne filename then do; if filename ne " " then output; lines = 0; filename = filnam; end; lines + 1; if done then output; run;
See the first of these two examples as application of Maxims 14 and 15.
04-24-2017 09:45 AM - edited 04-24-2017 10:00 AM
my solution isn't that elegant
filename location "C:\TEMP"; data files; length name $250 nbRec 8; drop rc did i; did=dopen("location"); if did > 0 then do; do i=1 to dnum(did); name=pathname('location')||'\'||dread(did,i); if scan(name,-1,'.') eq 'txt' then output; end; rc=dclose(did); end; else put 'Could not open directory'; run; data _NULL_; set files; call execute(' data _null_; infile "'||strip(name)||'" end=eof; input; if eof then call execute(" proc sql; update files set nbRec="||put(_N_,best32.)||" where name eq ""'||strip(name)||'""; quit; "); run; '); run;
- Cheers -
04-24-2017 10:33 AM
If you are running on Unix then use the wc command.
But you can use just a simple data step if you want.
Play with this code. Figuring out when SAS sets the EOV flag can be tricky so make sure to test it with some one record files and make sure it works.
data want; length fname $255 filename $100; infile '*.txt' filename=fname end=eof eov=eov; input ; n+eof; if eov or eof then do; filename=scan(fname,-1,'/'); output; n=1; eov=0; end; else n+1; run;
04-24-2017 11:51 AM
There are a slew of SAS functions to interact with external files starting with DOPEN to open an identified directory, DINFO, DOPTNUM, DOPTNAME to return information about a directory and then file functions. If the operating system will return the number of records then one of the results associated with FOPTNAME and And FOPTNUM will have it.
I don't work on unix and would not presume to attempt to guess with different flavors of unix which option names you would need to access. The online help starting with DOPEN should lead you to a solution. Note that the examples will tend to show a macro and a data step approach. I recommend staying with the data step unless you are very comfortable with macro language.
Don't forget to Fclose and Dclose each directory or file opened.
04-25-2017 02:39 AM
I don't work on unix and would not presume to attempt to guess with different flavors of unix which option names you would need to access.
While this is true for quite a lot of utilities used especially in the "commercial" UNIXen (AIX, HP-UX, Solaris), Linux stays with the GNU utilities, so the syntax of commandline programs is the same across different distributions. Notably IBM has made it a point to make AIX more Linux-compatible with every release from 4.3 on.
And utilities like the wordcount (wc) are so old (and have not changed their options for decades) that my example will work on all UNIX platforms. One can even get those utilities for Windows, increasing its usability.