DATA Step, Macro, Functions and more

Reading in multiple files saved in multiple folders....

Reply
N/A
Posts: 0

Reading in multiple files saved in multiple folders....

Hi,
I have a problem in finding a solution for the following problem:
I want to read-in multiple files with standard names (.LOG) saved in multiple folders, these folders has no standard in their names, and move them to another folder. this process must be done automatically on daily basis.

example:
folder names :
1. steven 2. connectivity 3. Ali 4. Jack

so folders' names are user names for the system
but, files names have the following criteria:

SMS-YearMonthDay.LOG

ex.

SMS-20080102.LOG
SMS-20080103.LOG
SMS-20080107.LOG


so, how could these be read ??
Super User
Posts: 5,260

Re: Reading in multiple files saved in multiple folders....

It depends on what kind of flexibility/automation you are looking for. Here is a suggestion for some steps:

1. Find out the names of the different directories. For this you can use the functions DOPEN, DNUM and DREAD.

2. Using the output from DREAD to do a CALL EXECUTE

3. The macro called from 2 assigns an aggregate fileref, or loops through each file using directory functions again. Then read each file. Move each file.

Hope this helps,
Linus
Data never sleeps
N/A
Posts: 0

Re: Reading in multiple files saved in multiple folders....

thanks, these functions helped a lot in solution, in addition, now I have to Use macro in order to change the sub-folder name automatically.
Super Contributor
Super Contributor
Posts: 3,174

Re: Reading in multiple files saved in multiple folders....

A macro may not necessarily be a requirement, if you only want to generate your file "wanted" (from the recent reply example code). It is possible to have a DATA step that builds this file's observations (pathfile_wanted) in order to drive you INPUT processing DATA step execution. For example, one technique:

DATA wanted;
length folder pathfile_wanted $255;
INPUT folder $;
format dt date9. ; /* assign format for visual effect */
DO DT=TODAY()-7 TO TODAY()-1;
pathfile_wanted = trim(folder) !! '/SMS' !! put(dt,yymmddn8.) !! '.LOG';
OUTPUT;
END;
RETURN;
* list of candidate file-folders listed instream below. ;
DATALINES;
folder1
folder2
folder3
RUN;


Scott Barry
SBBWorks, Inc.
N/A
Posts: 0

Re: Reading in multiple files saved in multiple folders....

sounds like the classic example for the infile option FILEVAR=[pre] data wanted( keep= pathfile_wanted ) ;
* create a list of the paths/files required ;
run;

data needed ;
set wanted ;
thisone= pathfile_wanted ;
infile allofit filevar= thisone end= eof ;
do while( not eof ) ;
input .... the columns you want ;
output ;
end;
eof = 0 ;
run;[/pre]
Just that plus the refinement you want for handling any speciall cases .

PeterC
N/A
Posts: 0

Re: Reading in multiple files saved in multiple folders....

Hi,
I reached to the following solution, but one problem remains, every new loop in reading a new user folder, the dataset is being overwritten by the new user data, so the result dataset contains only data from last users files. the code:

%macro etl(ds, ds2,path);
data &ds &ds2;
LENGTH DateTime 8
UserName $ 20
Submit $ 10
SentNumber $ 11
IP $ 15
MessageID $ 15
SendingMode $ 6
Contents $ 160 ;

%let filrf=mydir;
%let rc=%sysfunc(filename(filrf,"&path"));
%let did=%sysfunc(dopen(&filrf));
%let memcount=%sysfunc(dnum(&did));
%do i=1 %to &memcount;
AccountNum+1;
%let counter = AccountNum;
%let username&i=%sysfunc(dread(&did,&i));

%let filref=mydir2;
%let file=%sysfunc(filename(filref,"&path\&&username&i"));
%let op=%sysfunc(dopen(&filref));
%let flcount=%sysfunc(dnum(&op));

filename FT77F001 "D:\SMSGatewayData2\USERS\&&username&i\*.log";
%do j=1 %to &flcount;
%let trans&j=%sysfunc(dread(&op,&j));
%put '&&username&i = ' &&username&i '&&trans&j= ' &&trans&j '&flcount = ' &flcount '&filref = ' &filref '&filrf = ' &filrf;

infile FT77F001 filename=filename eov=eov end = done length=L DSD;
INPUT DateTime : ANYDTDTM19.
UserName $
Submit $
SentNumber $
IP $
MessageID $
SendingMode $
Contents $;

output;
%end;
%end;
run;
%mend;
%etl(sms2, sms,D:\SMSGatewayData2\USERS)


So, how can I solve the overwriting problem. I think it must be like something with dynamic datasets creation, so we can append every dataset created under the old one. is that available in SAS ? Or something like append function, I tried to use it, but it didn't work.
Respected Advisor
Posts: 3,900

Re: Reading in multiple files saved in multiple folders....

Hi
If this is only about moving external files then there is no need to read the data into a SAS table.
As this seems to be Windows I think best would be to generate some DOS code and have it executed outside or from within SAS.
Please let me know if it's really only about moving files and I can send you some example code.
Regards
Patrick
Respected Advisor
Posts: 3,900

Re: Reading in multiple files saved in multiple folders....

Hi
This would be a solution for moving all .log files from the directories in USERS to a common directory "DirWithAllLogsTogether".
Logs already in the "DirWithAllLogsTogether" with the same name like the one to move will be overwritten. This could be avoided with a little bit more coding - but first let me know if this is what you need.
HTH
Patrick

%let InDrivePath=D:\SMSGatewayData2\USERS;
%let OutDrivePath=D:\DirWithAllLogsTogether;

filename DIRLIST pipe "dir &InDrivePath /AD/B" ;

options noxwait noxsync;
data _null_;
length UserDirName $80;

infile DIRLIST length=reclen ;
input UserDirName $varying80. reclen ;

if reclen ne 0 then
do;
call system("move /Y &InDrivePath\*.log &OutDrivePath");
end;
run;
options xwait xsync;
N/A
Posts: 0

Re: Reading in multiple files saved in multiple folders....

Hi Patrick,

thanks for your advise,, it almost the solution I want. and your suggestion is clear and I started to work on it ( moving files, then to read them in one shot ) . but I think that the code you wrote is for OS/390 Operating system, and the one I use is windows, so I will use DOS commands withing SAS ( USing the X statement). I wrote the following code:

%macro etl(ds, path);
data &ds;

%let filrf=mydir;
%let rc=%sysfunc(filename(filrf,"&path"));
%let did=%sysfunc(dopen(&filrf));
%let memcount=%sysfunc(dnum(&did));
%do i=1 %to &memcount;

AccountNum+1;
%let username&i=%sysfunc(dread(&did,&i));

%let filref=mydir2;
%let file=%sysfunc(filename(filref,"&path\&&username&i"));
%let op=%sysfunc(dopen(&filref));
%let flcount=%sysfunc(dnum(&op));

/*filename FT77F001 "D:\SMSGatewayData2\USERS\&&username&i\*.log"; */
%do j=1 %to &flcount;

%let trans&j=%sysfunc(dread(&op,&j));
%put '&&username&i = ' &&username&i '&&trans&j= ' &&trans&j '&flcount = ' &flcount '&filref = ' &filref '&filrf = ' &filrf;
x 'copy "D:\SMSGatewayData2\USERS\ahmed12\SMS-20080609.LOG" "D:\to\SMS-20080609.LOG.log" ';

%end;

%end;

run;
%mend;
%etl(sms, D:\SMSGatewayData2\USERS)


Is that right ?
Thanks again for your help
Respected Advisor
Posts: 3,900

Re: Reading in multiple files saved in multiple folders....

Hi Dara

The code is for SAS9.1.3 under Windows XP. It runs on my laptop without error.
Code for OS/390 (z/OS) would look different - also because the filesystem for z/OS doesn't have a directory structure.

I had only a brief view at your code. One thing I saw:
%let memcount=%sysfunc(dnum(&did));
This will give you the total numbers of members in a directory - and these members could also be files.

The "DOS" command DIR with the switches "/AD/B" results in a list of directories only. Look it up in a command prompt using "dir /?"

I don't understand why you need a datastep. Do you want to store the information how many directories you were processing? Or what's the idea?

I would recommend that you use "my" code - not because what you're coding won't work but because it's much more difficult to read and maintain.

HTH
Patrick
N/A
Posts: 0

Re: Reading in multiple files saved in multiple folders....

Hi Patrick,

am really confused, and running in the same cycle. I will reexplain the problem, and see how could you help.

we have a message system that saves users' transactions in log files. I want to read these files for future analysis. But the way these files are saved is somehow complicated. they are saved as follow:

the main folder is : USERS
USERS has subfolders with users' names as their names, ex. if we have Patrick as a user, then his folder named: Patrick, and so on,, so every folder has the user name as its name.
then, in each user's folder, we have LOG files and other files, but what we are interested in are LOG files.
moreover, every LOG file in individual folder has the naming structure as: SMS-20080311.LOG
General structure : SMS-DATE.LOG
so, two users ( ex. Patrick and Dara ) could have files with the same name, coz they sent messages in the same date. this what made the problem when I copied files from subfolders to another folder, so there were replacement to new files with the same name. and I tried to rename files (Using DOS Command) before copying them according to variable value, the statement wasn't executed correctly.
The goal of the code is to read these files, and save the result in a datastep, for future analysis.

here the code I reached lately, and if you could please, check the rename statement, why it is not working.


options noxwait;
%macro etl(ds, path);
data &ds;

%let filrf=mydir;
%let rc=%sysfunc(filename(filrf,"&path")); /* D:\SMSGatewayData2\USERS */
%let did=%sysfunc(dopen(&filrf)); /* Opens USERS */
%let memcount=%sysfunc(dnum(&did)); /* NUM of members in USERS */
%do i=1 %to &memcount; /* from 1 to memcount */
%let username&i = %sysfunc(dread(&did,&i)); /* subfolders names */
%let filref=mydir2;
%let file=%sysfunc(filename(filref,"&path\&&username&i")); /* paths of subfolders */
%let op=%sysfunc(dopen(&filref)); /* opens subfolders */
%let flcount=%sysfunc(dnum(&op)); /* Num of members in subfolders */
/*%do j=1 %to &flcount;*/
%let Path = &&username&i;
/* list all files names in new created text files */
/*call system( "dir C:\from\&&username&i >D:\AllSMSUsers\&&username&i");*/
%let current= %sysfunc(time());
call system ("ren C:\from\&&username&i\*.LOG &&username&i.LOG");

/*
call system ("copy C:\from\&&username&i\*.log C:\to" );
%end;
*/

%end;

run;
%mend;
%etl (sms, C:\from)
options xwait;



P.S, if there is another way to do the job, please let me know Smiley Happy thanks a lot.
Respected Advisor
Posts: 3,900

Re: Reading in multiple files saved in multiple folders....

Well, that's a bit another task than I thought so far...

The code below scans all the directories and subdirectories (that's the /S switch) and looks for files that have the pattern "SMS-&LogDate..log".
The matching filenames (fully qualified; path & filename) are stored in the variable LogFiles and written to the SAS ds DIRLIST.

The second datastep is reading all the matching logfiles (as stored in the ds DIRLIST) and writing the information to the ds SMS_Logs_&LogDate -> There will be one ds per LogDate.

You might not know the technique used. The critical part is the infile statement with filevar= and end=.
Here some more information:
http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000146932.htm
http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000146932.htm#a000177201

Let me know if that worked.

HTH
Patrick

/* %let LogDate=%sysfunc(today(),yymmddN8.); */
%let LogDate=20080311;
%let path=D:\SMSGatewayData2\USERS\SMS-&LogDate..log;

filename DIRLIST pipe "dir ""&path""/A-D-h-s/B/S";
data dirlist ;
length LogFiles $256;
infile dirlist length=reclen ;
input LogFiles $varying256. reclen;
run ;
filename DIRLIST clear;

data SMS_Logs_&LogDate;
set dirlist;
infile dummy filevar=LogFiles end=done pad missover lrecl=128;
LogFileWithPath=LogFiles;
User=scan(LogFiles,-2,'\');
LogFile=scan(LogFiles,-1,'\');

do while(not done);
input @1 LogLine $char128.;
output;
end;
run; Message was edited by: Patrick
N/A
Posts: 0

Re: Reading in multiple files saved in multiple folders....

Thanks Patrick, that is exactly what I need !!!!

I tested the code, and it worked for reading files with my structure... and I found another way, using macros, u can make dynamic datasets, then, I merged them to have just one dataset with all data needed from files.
N/A
Posts: 0

Re: Reading in multiple files saved in multiple folders....

Hi, Patrick,

Thanks alot for your support, and suggestions. actually am a new SAS user, and I just started to use it with this project. that's what makes me somehow confused in the solution. but your idea to copy all log files to one folder, then reading them to a dataset is perfect.
I wrote the code, and it worked !!! Smiley Happy

Thanks again.
Ask a Question
Discussion stats
  • 13 replies
  • 611 views
  • 0 likes
  • 4 in conversation