BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,

I have given the task of importing many files merging them then combining these into a large database. Here is the setup.

I have 7 files from certain days and certain locations throughout a year in a folder titled with the respective year (2004-2008). Something like this: C:\Users\mherrin\Documents\Data\2005
Each of these folders contains 20,000+ files

I would like to import 3 of the 7 files from each day and location and merge them so I have one file per day per location. Then repeat this process for each day and location for all the files in the folder.

The next step will be to combine all of these into one data file for the entire year. Repeat this process in each folder then combine the yearly data files into a comprehensive database.




I have written the code to import and merge the files but now I think I need to make a macro to repeat this process.

Any help would be greatly appreciated. THANKS
23 REPLIES 23
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
The SAS Macro Language Guide has a section on the topic "Generating Repetitive Pieces of Text Using %DO Loops", which discusses using %DO/%END loops within a macro execution.

Scott Barry
SBBWorks, Inc.

http://support.sas.com/documentation/onlinedoc/91pdf/sasdoc_91/base_macro_6997.pdf
deleted_user
Not applicable
I am using the pipe command to read all of the file names out of my folder like so:


filename indata pipe 'dir C:\Users\mherrin\Documents\trial /b';
data file_list;
length fname $20;
infile indata truncover; /* infile statement for file names */
input fname $20.; /* read the file names from the directory */
call symput ('fname',fname);
call symput ('num_files',_n_); /* store the record number in a macro variable */
call symput ('name',substr(fname,1,11));
run;

But then how do I get my proc import command to read all of those file names?
This is what I have tried.

%macro work;
PROC IMPORT OUT= &fname
DATAFILE= "C:\Users\mherrin\Documents\trial\&fname"
DBMS=CSV REPLACE;
getnames=no;
RUN;
%mend work;



I am obviously missing the loop part of this process. I am sure this is something simple that I just cannot find. If I could get these to read in I think I would be on my way to figuring out the rest. Thanks!
data_null__
Jade | Level 19
I think reading 20,000+ files with PROC IMPORT will be too slow.

1) Can the files be read using the same input statements. Hopefully you don't have 20,000 file with vastly different record layout.

2) show directory and file name "structure" in more detail. You description was too abstract for me, I'm not very bright.

3) I didn't get that bit about 3 of 7 or something and how that is determined.

4) This call probably all be done with ONE data step. But I need to know more details. The "have" and "need" so to speak.
deleted_user
Not applicable
Sure I will explain more, anything for some help.

In each data folder there are 20,000+ files. 7 from each day and each location, these have different input layouts. Currently I need to merge only 3 of these 7 daily files. The paths and filenames are:

C:\Users\mherrin\Documents\trial\APX05132005.1
C:\Users\mherrin\Documents\trial\APX05132005.2
C:\Users\mherrin\Documents\trial\APX0513.DRF

for another day it would be like:

C:\Users\mherrin\Documents\trial\APX05142005.1
C:\Users\mherrin\Documents\trial\APX05142005.2
C:\Users\mherrin\Documents\trial\APX0514.DRF

Where 05 is the month 14 is the day 2005 is the year.
And for another location:

C:\Users\mherrin\Documents\trial\PEN05142005.1
C:\Users\mherrin\Documents\trial\PEN05142005.2
C:\Users\mherrin\Documents\trial\PEN0514.DRF



I need to import each of these sets of three and merge them into a daily file.

The files come without variable names, so I inefficiently named them with a data step using the rename function. But that part is done.

I have written code to input one specific file for all 7 types and the code to merge the 3 desired together.

We can focus on just the 3 desired files for now, so it will be more like 10,000 files per year. We bought a machine that is supposed to be FAST (dell precision T7400). I think what I need is a macro that will cycle through these files importing them into sas and merging them into a daily file. These will then be appended into one yearly data set.

I hope you can help. Thanks.
deleted_user
Not applicable
Hi,

Your input datastep from the pipe command will only ever give you the last line it processed, ie the last file. You need to give the macro variables you create some sort of unique identifier for each file. Just an iterative number will do and it might as well be the _n_ of the record being processed.

So when you do the call symput write (sorry I cant see the original post but it was something like this):

call symput(compress('fname'!!_n_),filename);

You will also need to store the total number of files so put a call symput in for that too (it will execute every time but when it's done it will have the value of the last one, you could make this more efficient if you wanted)....

call symput('tot_files',_n_);

Then you will end up with a macro variable for each file..

fname1
fname2
fname3 etc

then create a loop in the macro to run through each fname and import. Something like (I hope you understand double ampersands, if not see documentation)...

%do i %to &tot_files;
proc import out=&&fname&i datafile='..........
..........
%end;

Hopefully that points you in the right direction. Obviously the %d loop will need to be inside a macro.

Message was edited by: pznew Message was edited by: pznew
data_null__
Jade | Level 19
I'm still not sure exactly where the similarities are I will assume all file types have the same record structure. Therefore all type 1 files can be read with one data step. This example creates data(files) and reads them, into one data set.

Let me know your questions. All infile and file options are documented. And PIPE example are found in "Companion" and at support.sas.com

[pre]
dm 'clear log; clear output;';
* create test data(files to be read in next step);
data _null_;
length command $256;
infile cards truncover;
input name :$128.;
if _n_ eq 1 then do;
path = quote(substr(name,1,find(name,'\',-length(name))));
command = 'rmdir /S /Q ".\xUsers\"';
infile dummy1 pipe filevar=command;
command = catx(' ','mkdir',path);
put command=;
infile dummy1 pipe filevar=command;
rc = sleep(1); *small delay;
end;
file dummy filevar=name dsd dlm=',';
do _n_ = 1 to 10;
put _n_:z5. name;
end;
cards;
.\xUsers\mherrin\Documents\trial\APX05132005.1
.\xUsers\mherrin\Documents\trial\APX05132005.2
.\xUsers\mherrin\Documents\trial\APX0513.DRF
.\xUsers\mherrin\Documents\trial\APX05142005.1
.\xUsers\mherrin\Documents\trial\APX05142005.2
.\xUsers\mherrin\Documents\trial\APX0514.DRF
.\xUsers\mherrin\Documents\trial\PEN05142005.1
.\xUsers\mherrin\Documents\trial\PEN05142005.2
.\xUsers\mherrin\Documents\trial\PEN0514.DRF
;;;;
run;


*** Read all the type .1 records;
data Type1;
length command filename fname path $256.;
command = 'dir /s /b ".\xUsers"';
infile dummy1 pipe filevar=command truncover;
input path $256.;
if scan(path,-1,'.') eq '1';
put _infile_;
infile dummy2 filevar=path filename=fname end=eof dsd;
filename = scan(fname,-2,'.\');
length prefix $3;
prefix = filename; * assumes prefix is always first 3 chars;
date = input(substr(filename,anydigit(filename)),mmddyy10.);
do while(not eof);
input obs name:$64.; ** Fields for type 1 files?;
output;
end;
format date date9.;
run;
proc print;
run;
[/pre]
deleted_user
Not applicable
Thanks for the help.

I am having some success with the do loop in a macro I think but I get two errors;

1) ERROR: Library name is not assigned.
2)ERROR: Libname 'APX05132005' exceeds 8 characters.

I think this is due to the fact that &&fname&i converts to APX05132005.1 and sas thinks that the first part is the libname.

I would like to use an if statement as well in the macro. This is what I am thinking of doing. I commented it so you would have a better chance of understanding what I am "trying" to do. Please point out any mistakes. I appreciate all the help.


filename indata pipe 'dir C:\Users\mherrin\Documents\trial /b';
data file_list;
length fname $20;
infile indata truncover; /* infile statement for file names */
input fname $20.; /* read the file names from the directory */
call symput (compress('fname'!!_n_),fname);
call symput ('num_files',_n_); /* store the record number in a macro variable */
call symput (compress('name'!!_n_),substr(fname,1,11));
run;

%put _global_;


%macro work;
%do i=1 %to &num_files;
PROC IMPORT OUT= '&&fname&i'
DATAFILE= "C:\Users\mherrin\Documents\trial\&&fname&i"
DBMS=CSV REPLACE;
getnames=no;
RUN;
%if scan(&&fname&i,-1,'.') eq '1' %then %do; * for the .1 files;
data &&fname&i;
set &&fname&i(rename=(var1=... var99=);
run;
data &&name&i; *This will be the comprehensive daily file;
set &fname&i; *(i.e. APX05132005 for the first set of files);
run;;
%end;
%else %if scan(&&fname&i,-1,'.') eq '2' %then %do; /*for the .2 files*/
data &&fname&i;
set &&fname&i(rename=(var1=... var99=); /*different variable names*/
run;
data &&name&i; /*Merge the first two files together to begin the comprehensive file*/
merge &&name&i &&fname&i;
by raceNum;
if PostPosition=99 then delete; /*can you do this in the same data step?*/
run;
data &&name&i;
set &&name&i(drop=PostPos PostPosition); /*these must be removed for merge later*/
run;
%end;
%else %if scan(&&fname&i,-1,'.') eq 'D' %then %do; /*for the .drf files*/
data &&fname&i;
set &&fname&i(rename=(var1=... var1435=); /*different variable names*/
run;
data &&name&i-; /*can you do an i-1 type macro? Because the way I used name &&name&i would be APX0513.DRF in this case*/
merge &&name&i &&fname&i;
by raceNum ProgNum; /*I may have to sort these sets first*/
run;
%end;
%mend work;

%work;
deleted_user
Not applicable
It will certainly think the . is a seperator between the libref and the dataset name. You should change the dot to something else. Use the translate function to do that. Also you need to surround the macro variable with double quites " and not single ' or it will not resolve.

Do somethig like this....

out="%sysfuc(translate(&&fname&i, _ , . ))";

the sysfunc tells the macro language to use the sas function (remember the macro language and sas language are different things). And because we are not dealing with quoted strings in macro but just characters we do not put quotes round the characters on the translate function that we are translating to and from. ie to _ from .

Hoipe that helps in some way.
deleted_user
Not applicable
Thanks that was a great help. One funny thing is happening though. When I run the code the APX05132005_1 sas data file is not correct. It contains a list of path names.

I tried running this with if statements as well but I received an error reading the following:

ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand
is required. The condition was: scan(&&fname&i,-1,'.') eq '1'


I feel like we are getting closer. Thank you for helping.
deleted_user
Not applicable
Ok both of those errors are fixed.

Now I just cant read in the .drf files.

%if %scan(&&fname&i,-1,'.')='d' %then %do;

Is not working. Thanks Message was edited by: Bryce
deleted_user
Not applicable
I think I have it running now. Thanks, you all have been a great help.

One more question for now:
Is there anyway to prevent the window full popup from occuring. maybe set it to a clear all everytime?

Or should I add a clear window function inside the loop?

I think it will take 8 hours to import. :S I would "prefer" not to sit here clicking the clear button. I closed the log window now the message just sais "window full" not "log window full"

I would greatly appreciate it if anyone could answer tonight so I could get all this data imported over the weekend!
data_null__
Jade | Level 19
Run it in batch. Right click filename from winders explorer and select "Batch Submit with SAS 9.1" Good luck with 20,000 proc imports and all those other steps, I don't think it will ever finish.
deleted_user
Not applicable
Yeah, your right. It has been running over the weekend and it is only the the H's

what can I do differently? maybe I could import and combine all the same files then change the variable names instead of renaming them in each iteration? Or should I stay away from proc import all together?

Originally it was running faster about 8 seconds per loop so it should have taken 8 hours. It is running much slower now 15 seconds per loop. why is this?

Let me know if there is a more efficient way of doing this. Thank you!
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Ensure that you are current on SAS HOTFIX maintenance, as a general rule. Also, it may help with SAS WORK processing, if you can relocate your WORK allocation to another physical drive. There are technical papers on the SAS support website http://support.sas.com/ to address the SAS Windows performance topic - here is one for consideration (maybe too much detail, but useful info about SAS and I/O such as WORK):

http://support.sas.com/techsup/technote/ts684/ts684.html


Scott Barry
SBBWorks, Inc.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 7263 views
  • 0 likes
  • 3 in conversation