Dear All,
I need to import many text files into sas, i found a sas code at support SAS: http://support.sas.com/kb/41/880.html and modified it. But i still have some problems getting my desired result.
(I changed the dirctory in the code into my dir)
1.
filename DIRLIST pipe 'dir "C:\Users\*.txt" ';
data dirlist ;
infile dirlist lrecl=200 truncover;
input line $200.;
if input(substr(line,1,10), ?? mmddyy10.) = . then delete;
length file_name $ 100;
file_name="c:\Users\"||scan(line,-1," ");
keep file_name;
data _null_;
set dirlist end=end;
count+1;
call symput('read'||left(count),file_name);
call symput('dset'||left(count),substr(file_name,118,7));
if end then call symput('max',count);
run;
%put &read;
log: WARNING: Apparent symbolic reference READ not resolved.
%put &dset;
log: WARNING: Apparent symbolic reference DSET not resolved.
%put &max;
I got the correct dirlist data file, but the two macro variables are always empty.
2. I manully typed in the drictory of last data file and its data name into macro variables read and dset and tested for the following macro:
%macro readin;
%local i;
%do i=1 %to 1;
proc import datafile="&&read&i"
out=%scan(&dset,&i) dbms=dlm replace;
delimiter='|';
getnames=yes;
run;
%end;
%mend readin;
%readin;
I think there is something wrong with the satement: datafile="&&read&i" , cause, SAS read the first data file in that folder, while i assigned info of the last dat file to the macro variabels.
Thank you for your help!!
Best wishes.
1)
In your code you wrote:
call symput('read'||left(count),file_name);
call symput('dset'||left(count),substr(file_name,118,7));
it means that your macro variables will be:
&read1 , &read2 , ...
&dset1 , &dset2 , ...
Yout wrote:
%put &read;
%put &dset
but there are no macro variables &read &dset but &read1 &dset1 etc.
2)
there is no error in: proc import datafile="&&read&i"
I do see error in: out=%scan(&dset,&i) dbms=dlm replace;
I suppose you meant to write: out=%scan(&&dset.&i) dbms=dlm replace;
If your datasets have the same format you can use the method outline here:
1)
In your code you wrote:
call symput('read'||left(count),file_name);
call symput('dset'||left(count),substr(file_name,118,7));
it means that your macro variables will be:
&read1 , &read2 , ...
&dset1 , &dset2 , ...
Yout wrote:
%put &read;
%put &dset
but there are no macro variables &read &dset but &read1 &dset1 etc.
2)
there is no error in: proc import datafile="&&read&i"
I do see error in: out=%scan(&dset,&i) dbms=dlm replace;
I suppose you meant to write: out=%scan(&&dset.&i) dbms=dlm replace;
The initial part of the code creates multiple macro variables: read1,read2.. and dset1,dset2.... and max.
So for the %put statement, you should rather list all read<n> and dset<n> macro variables:
%put &read1; %put &dset1; %put &max;
Or you can use this to list all user defined macro variables:
%put _user_;
For the macro readin, it is meant to read multiple files. But it seems like you have limited it to read only one file, so it is reading only one file. You can change the macro readin as below:
%do i=1 to &max.;
instead of
%do i=1 to 1;
This should make the code read all the .txt files in the folder.
You should include a large GUESSINGROWS if you must use proc import. If a column has mixed values that might be considered numeric such as Zip code or account numbers that may contain other characters such as dashes or letters then the type of value, character or numeric may be guessed at incorrectly. This may become much more of a problem if you expect Zip to be character but is numeric in some and character in others.
Additionally if any of the files are supposed to be of the same layout and content you may end up with different lengths for character varaibles and the potential mismatches of variable types means that when you go to combine any of them for other purposes you will get errors or possibly truncated data.
%macro readin;
%local i;
%do i=1 %to &max;
proc import datafile="&&read&i"
out=&&dset&i dbms=dlm replace;
delimiter='|';
getnames=yes;
guessingrows=20000;
run;
%end;
%mend readin;
%readin;
In general it's better to write your own data step to read the data. Issues with methodology include:
Will you continue to require macro processing to process each file? Or do you plan to combine this in the future?
Will you ever need to know the source file for the records after combining, that isn't being stored anywhere right now.
If you use proc import you can get different types for variables (numeric vs character) which means you can't append data until its fixed.
If someone else with a different version of SAS runs this, they may not get the same results as Proc Import changes a bit over time.
My nickel of opinion.
Since you apparently stuck all of this output in your WORK directory I provide any quick code but you really should examine at least a subset of the created data sets for mismatched lengths and variable types.
If you had put them all in one location it would be easy to check using SASHelp.vcolumns and limited a summary to that library. Since there are very likely other sets in WORK it wouldn't be easy to exclude them.
I'll put a small stack of quarters up that at least one variable has different lengths in different data sets.
I would not be surprised to find a different variable name in one and finding several with different types would not surprise me.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.