Hello
From a SAS tutorial I found code to import multiple .csv files at the same time and it works well (code is below). I will admit that I don't fully understand the code but I am wondering what code I would use to:
1. firstobs = 6 instead of the 2 that it is currently set at (I don't see in the code where this setting is)
2. each file name is the same format i.e. facility_trend_888.csv, facility_trend_999.csv etc. and I would like to create a column in the dataset of "facility" to equal the last 3 digits i.e. 888 for EACH row of data in the dataset.
3. Is there a way to exclude empty rows upon import or do I just do that after the fact?
4. Is there code to then combine all datasets into one dataset?
Thanks in advance for any and all assistance.
The code below works as it should by importing the datasets with each file being imported with the naming convention of dsn1, dsn2 etc. to equal the number of csv files in the identified folder.
%macro drive(dir,ext);
%local cnt filrf rc did memcnt name;
%let cnt=0;
%let filrf=mydir;
%let rc=%sysfunc(filename(filrf,&dir));
%let did=%sysfunc(dopen(&filrf));
%if &did ne 0 %then %do;
%let memcnt=%sysfunc(dnum(&did));
%do i=1 %to &memcnt;
%let name=%qscan(%qsysfunc(dread(&did,&i)),-1,.);
%if %qupcase(%qsysfunc(dread(&did,&i))) ne %qupcase(&name) %then %do;
%if %superq(ext) = %superq(name) %then %do;
%let cnt=%eval(&cnt+1);
%put %qsysfunc(dread(&did,&i));
proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=dsn&cnt
dbms=csv replace;
run;
%end;
%end;
%end;
%end;
%else %put &dir cannot be open.;
%let rc=%sysfunc(dclose(&did));
%mend drive;
%drive(E:\HIMS\TEST\Strokes\2019 20\HIT_Tool,csv)
Do you not know what is in the CSV files? Why are you using PROC IMPORT instead of just writing your own data step to read the file?
Once you know how to read one file you should be able to read them all using a wildcard in the input filename. You can use the FILENAME= option on the INFILE statement to get access to which file you are reading. You can use the change in input filename to know when you need to skip header rows.
data want;
length fname $256 facility $3 ;
infile 'E:\HIMS\TEST\Strokes\2019 20\HIT_Tool\facility_trend_*.csv'
dsd filename=fname truncover
;
input @;
if fname ne lag(fname) then row=0;
row+1;
if row<6 then delete;
length var1 8 var2 $20 .... var_last $4 ;
input var1--var_last ;
facility=substr(fname,length(fname)-6,3);
run;
So to adjust that program to your data just change these two lines to reflect your actual variables.
length var1 8 var2 $20 .... var_last $4 ;
input var1--var_last ;
This is how I'd do it:
* generate sample csv files ;
%macro create_csv(where);
%let num=%eval(&num+123); %* some random number ;
filename csv "%sysfunc(pathname(work))\facility_trend_&num..csv";
proc export data=sashelp.class(where=(&where)) file=csv dbms=csv replace;
run;
%mend;
%let num=0;
%create_csv(%str(age=12))
%create_csv(%str(age=13))
%create_csv(%str(age=14))
%create_csv(%str(age=15))
%create_csv(%str(age=16))
* get a list of the files in the directory ;
%dirlist(dir=%sysfunc(pathname(work)),filter=basename=:'facility_trend_' and ext='csv')
* import each file, naming the dataset dsn#, where # is the row number from dirlist ;
%macro code;
%let fullname=%trim(&fullname);
%let num=&__iter__;
proc import file="&fullname" out=work.dsn&num dbms=csv replace;
getnames=yes;
run;
%mend;
%loop_control(control=dirlist)
https://github.com/scottbass/SAS/blob/master/Macro/loop_control.sas
https://github.com/scottbass/SAS/blob/master/Macro/dirlist.sas
https://github.com/scottbass/SAS/blob/master/Macro/parmv.sas
So I read your requirements more carefully...
PROC IMPORT of a CSV does some data checking under the covers, determining the data type and length from the input data, then generates a data step.
This is problematic when you want to append the data. Even though your CSV file may have the same column structure, the data itself can mean the datasets won't append due to different structure.
For example, this does not work:
* generate sample csv files ;
%macro create_csv(where);
%let num=%eval(&num+123); %* some random number ;
filename csv "%sysfunc(pathname(work))\facility_trend_&num..csv";
proc export data=sashelp.zipcode(where=(&where)) file=csv dbms=csv replace;
run;
%mend;
%let num=0;
%create_csv(%str(statecode='CA'))
%create_csv(%str(statecode='NY'))
%create_csv(%str(statecode='FL'))
%create_csv(%str(statecode='NC'))
%create_csv(%str(statecode='AZ'))
* get a list of the files in the directory ;
%dirlist(dir=%sysfunc(pathname(work)),filter=basename=:'facility_trend_' and ext='csv')
* import each file, naming the dataset dsn#, where # is the row number from dirlist ;
%macro code;
%let fullname=%trim(&fullname);
%let num=&__iter__;
proc import file="&fullname" out=work.dsn&num dbms=csv replace;
getnames=yes;
run;
data &syslast;
set &syslast (firstobs=6);
if missing(cats(of _all_)) then delete;
run;
proc append data=&syslast out=final;
run;
%mend;
proc delete data=final;
run;
%loop_control(control=dirlist)
I'd advise just letting PROC IMPORT do its thing for a single file, then cut-and-paste the code from the log and edit as required.
For example, this does work (I've purposely over-allocated the problem columns from above):
* generate sample csv files ;
%macro create_csv(where);
%let num=%eval(&num+123); %* some random number ;
filename csv "%sysfunc(pathname(work))\facility_trend_&num..csv";
proc export data=sashelp.zipcode(where=(&where)) file=csv dbms=csv replace;
run;
%mend;
%let num=0;
%create_csv(%str(statecode='CA'))
%create_csv(%str(statecode='NY'))
%create_csv(%str(statecode='FL'))
%create_csv(%str(statecode='NC'))
%create_csv(%str(statecode='AZ'))
* get a list of the files in the directory ;
%dirlist(dir=%sysfunc(pathname(work)),filter=basename=:'facility_trend_' and ext='csv')
* import each file, naming the dataset dsn#, where # is the row number from dirlist ;
%macro code;
%let fullname=%trim(&fullname);
%let num=&__iter__;
data DSN#
infile "&fullname" delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=6;
informat ZIP best32. ;
informat Y best32. ;
informat X best32. ;
informat ZIP_CLASS $1. ;
informat CITY $1000. ;
informat STATE best32. ;
informat STATECODE $2. ;
informat STATENAME $1000. ;
informat COUNTY best32. ;
informat COUNTYNM $1000. ;
informat MSA best32. ;
informat AREACODE best32. ;
informat AREACODES $1000. ;
informat TIMEZONE $1000. ;
informat GMTOFFSET best32. ;
informat DST $1. ;
informat PONAME $1000. ;
informat ALIAS_CITY $1000. ;
informat ALIAS_CITYN $1000. ;
input
ZIP
Y
X
ZIP_CLASS $
CITY $
STATE
STATECODE $
STATENAME $
COUNTY
COUNTYNM $
MSA
AREACODE
AREACODES $
TIMEZONE $
GMTOFFSET
DST $
PONAME $
ALIAS_CITY $
ALIAS_CITYN $
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
if missing(cats(of _all_)) then delete;
run;
proc append data=&syslast out=final;
run;
%mend;
proc datasets lib=work nowarn nolist;
delete final;
quit;
%loop_control(control=dirlist)
If you want read all files in a directory with the same defined structure, you do not need ANY macro coding at all, as the infile statement accepts wildcards:
data _null_;
set sashelp.class;
file '$HOME/classes/class1.txt' dlm=';' dsd;
if _n_ = 1
then do;
put;
put;
put 'name;sex;age';
end;
put name sex age;
run;
data _null_;
set sashelp.class;
file '$HOME/classes/class2.txt' dlm=';' dsd;
if _n_ = 1
then do;
put;
put;
put 'name;sex;age';
end;
put name sex age;
run;
data want;
length fname inf $200;
infile "$HOME/classes/*.txt" dlm=';' dsd filename=inf truncover;
informat
name $8.
sex $1.
;
fname = inf;
input name@;
if name not in ('','name');
input sex age;
run;
And if you need to be more selective, you can use a compound file reference:
filename in ("$HOME/classes/class1.txt" "$HOME/classes/class2.txt");
data want;
length fname inf $200;
infile in dlm=';' dsd filename=inf truncover;
informat
name $8.
sex $1.
;
fname = inf;
input name@;
if name not in ('','name');
input sex age;
run;
so all you have to do is build the list of filenames for the filename statement. The read is then done in one single step.
@Kurt_Bremser 's approach is better than mine. In particular, it will perform better than my approach.
In another post yesterday, I said "It is just as important to know when NOT to use a macro". I should follow my own advice.
However, the OP stated:
1. firstobs = 6 instead of the 2 that it is currently set at (I don't see in the code where this setting is)
I've changed your code to this (since I'm on EG on Windows):
%let root=%sysfunc(pathname(work));
%macro create_file(filename);
data _null_;
set sashelp.class;
file "&root\&filename" dlm=';' dsd;
if _n_ = 1
then do;
put 'name;sex;age';
end;
put name sex age;
run;
%mend;
%create_file(class1.txt)
%create_file(class2.txt)
%create_file(class3.txt)
%dirlist(dir=&root,filter=basename=:'class' and ext='txt');
data class;set sashelp.class;run; * just so it shows in EG ;
data want;
set dirlist (keep=fullname);
end=0;
do until (end);
infile dummy delimiter = ';' dsd missover lrecl=32767 firstobs=6 end=end filevar=fullname;
informat
name $8.
sex $1.
;
input name sex age;
output;
end;
run;
This works, but is there a better approach (esp. one that doesn't require %dirlist)? I tried filevar= + filename= but couldn't get it to work (and lost patience 😉 )
I use this condition
if name not in ('','name');
to jump over my artificially included "header" lines. This condition needs to be adapted to the actual content (or non-content) of the 5 lines in every file that need to be skipped.
Solutions could be:
%macro drive(dir,ext);
%local cnt filrf rc did memcnt name;
%let cnt=0;
%let filrf=mydir;
%let rc=%sysfunc(filename(filrf,&dir));
%let did=%sysfunc(dopen(&filrf));
%if &did ne 0 %then %do;
%let memcnt=%sysfunc(dnum(&did));
%do i=1 %to &memcnt;
%let name=%qscan(%qsysfunc(dread(&did,&i)),-1,.);
%if %qupcase(%qsysfunc(dread(&did,&i))) ne %qupcase(&name) %then %do;
%if %superq(ext) = %superq(name) %then %do;
%let cnt=%eval(&cnt+1);
%put %qsysfunc(dread(&did,&i));
proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=dsn&cnt(where=(X is not NULL))
dbms=csv replace datarow=6;
run;
data dsn&cnt;
set dsn&cnt;
length file_ID $10;
retain File_ID "&cnt"; /* Or whatever you want to use to identify the file */
run;
proc append base=Facility force;
run;
%end;
%end;
%end;
%end;
%else %put &dir cannot be open.;
%let rc=%sysfunc(dclose(&did));
%mend drive;
%drive(E:\HIMS\TEST\Strokes\2019 20\HIT_Tool,csv)
PROC APPEND will not work if SAS interprets data from different files differently, and you may have problems with the data being truncated, if columns in the first file are shorter than in some of the following files. In that case, you may have to use the data step generated by PROC IMPORT, with modifications, instead of PROC IMPORT itself - press F4 (RECALL) after submitting a PROC IMPORT of one file, and then modify that code and use it instead of PROC IMPORT.
Do you not know what is in the CSV files? Why are you using PROC IMPORT instead of just writing your own data step to read the file?
Once you know how to read one file you should be able to read them all using a wildcard in the input filename. You can use the FILENAME= option on the INFILE statement to get access to which file you are reading. You can use the change in input filename to know when you need to skip header rows.
data want;
length fname $256 facility $3 ;
infile 'E:\HIMS\TEST\Strokes\2019 20\HIT_Tool\facility_trend_*.csv'
dsd filename=fname truncover
;
input @;
if fname ne lag(fname) then row=0;
row+1;
if row<6 then delete;
length var1 8 var2 $20 .... var_last $4 ;
input var1--var_last ;
facility=substr(fname,length(fname)-6,3);
run;
So to adjust that program to your data just change these two lines to reflect your actual variables.
length var1 8 var2 $20 .... var_last $4 ;
input var1--var_last ;
Thank you so much to all who responded, I really appreciate it. Tom, I did not know that you could use a wildcard for the infile statement so that will be life changing for future code! I also liked the simplicity of the code you provided and it worked like a charm.
Have a great day everyone!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.