DATA Step, Macro, Functions and more

Import selected files

Reply
Frequent Contributor
Posts: 95

Import selected files

I'm trying files from a certain directory. There are many files in this directory, but I'm only interested in importing the ones that start "Run2_08". The number of files changes, so I can't hard code the pathnames. I'm trying this solution, but I can't seem to actually import the data. I keep getting a LOST CARD warning and only one file seems to get processed.

Any help?

[pre]
data tmp;
rc = filename("path", "C:\CA_TEMP");
did = dopen("path");
count = dnum(did);
do i = 1 to count;
name = dread(did,i);
if substr(name,1,7) = 'Run2_08' then do;
fc = filename("file", "C:\CA_TEMP\"||name);
do until (eof);
infile file dlm="," firstobs=2 obs=2 dsd missover end=eof;
input date :yymmdd6. Code :$6. Total :8. Neg :8. Pos :8.;
output;
end;
end;
end;
run;
[/pre]

This is my log:
[pre]
NOTE: The infile FILE is:
File Name=C:\CA_TEMP\Run2_080703.txt,
RECFM=V,LRECL=256

NOTE: LOST CARD.
rc=0 did=1 count=30 i=15 name=Run2_080626.txt fc=20036 eof=1 date=17716 Code=MT43 Total=227 Neg=0 Pos=227 _ERROR_=1 _N_=1
NOTE: 1 record was read from the infile FILE.
The minimum record length was 23.
The maximum record length was 23.
NOTE: The data set WORK.TMP has 1 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
[/pre]
Super Contributor
Posts: 260

Re: Import selected files

The way I would do that is to start the Data step as you did, but add a FILEVAR option to the INFILE statement to tell SAS which file it must be reading exactly. And don't forget the final STOP statement, otherwise the loop will go on forever re-reading your data again and again.
PS : I changed the SUBSTR(...)= for a more simple =: (begins with), and used a SAS-9 CATS function.
[pre]
DATA work.test ;
rc = FILENAME("dir","c:\temp") ;
did = DOPEN("dir") ;
DO i = 1 TO DNUM(did) ;
name = CATS("c:\temp\",DREAD(did, i)) ;
IF DREAD(did, i) =: "Run2_08" THEN DO ;
INFILE dummy FILEVAR = name DLM="," FIRSTOBS=2 MISSOVER DSD END=eof ;
DO UNTIL(eof) ;
INPUT date :YYMMDD6. code :$6 Total :8. Neg :8. Pos :8.;
OUTPUT ;
END ;
END ;
END ;
STOP ;
rc = DCLOSE(did) ;
RUN ;
[/pre]
Regards.
Olivier
Valued Guide
Posts: 2,177

Re: Import selected files

I could explain the behaviour you experienced, but think you just want a result.
So...
because you are only interested in the files with names of a fixed shape (and because you are not running SAS on z/OS) you can use the "global naming characters" like *.txt, on your infile statement.
This will make your program very much simpler, except for one thing, you want to ignore row1 of each file.... this example shows how>>>>

* change current directory, to folder of files starting "Run2_08" ;[pre] x cd 'this directory' ;

data tmp ;[/pre] * to simplify the input statement, pre-define your vars in the order in which they appear in the file ;[pre] length date 6 code $6 total pos 8 ;
attrib date informat= yymmdd6. format= date9. ;[/pre]* My preference to predefine lengths and informat(s) removes any need to define widths on input statement which becomes important when CSV data style makes these widths vary ;

* I assume you would like to know from which file each data row
has come, so define a var for the infile option and one to keep .[pre] length filen $250 ;
retain filename ;
infile "Run2_08*.*" dsd missover filename= filen ;
[/pre]* load input buffer, to discover which file is being read; [pre] input @ ; [/pre]
* when the filename changes you are at the beginning of a file, so you need to collect name, and drop header; [pre] if lag(filen) ne filen then do;
filename = filen ;
delete ;
end;
[/pre]* because your input columns are defined in order, you can use the -- style variable list syntax ;[pre] input date -- Pos ;

run ;
[/pre]
hope that helps

PeterC
Super Contributor
Posts: 260

Re: Import selected files

Hi Peter.
The INFILE "*.txt" is a great trick. Thanks for sharing that.
When I use the FILENAME option, I only get the directory, not the file name itself. And I thought of another way to check the first record without using the LAG option : the EOV option in the INFILE statement.
[pre]
DATA work.import ;
INFILE "c:\temp\run2_08????.txt"
DLM=";"
FIRSTOBS=2
MISSOVER
DSD
EOV=beginNewFile ;
INPUT @ ;
IF beginNewFile=1 THEN DO ;
beginNewFile=0 ;
DELETE ;
END ;
ELSE INPUT date :yymmdd6. code :$6. (total pos neg) (:8.) ;
FORMAT date DDMMYY10. ;
RUN ;
[/pre]
As the SAS doc says, you have to revert the EOV variable by yourself, because SAS automatically only sets it to 1 when beginning a new file.
Regards & thanks again.
Olivier
N/A
Posts: 0

Re: Import selected files

I'm surprised you were not able to make my method work!
You need to pre-set the length for the filename= variable or it defaults to $8.
You need to "load the buffer" before the variable is filled, hence the [pre] input @ ;[/pre]I don't use EOV only because I expect to use the filename= option value, and because the EOV is not set on the first file....although I guess you could use [pre] RETAIN beginNewFile 1 ;[/pre]

Glad you've got a solution that suits you..

PeterC
Frequent Contributor
Posts: 95

Re: Import selected files

Posted in reply to deleted_user
These are all great suggestions. Thanks for your help.
Ask a Question
Discussion stats
  • 5 replies
  • 176 views
  • 0 likes
  • 4 in conversation