Good afternoon,
I am currently working with several hundred excel spreadsheets in a folder. They currently have the variables Height and Width. The IDs and Time variables are in the excel file names. but not in the spreadsheets themselves. I am looking to import all of the files into one dataset with the variables ID, Time, Height, and Width. How can I go about importing them into one dataset with the headings, and using the file names to create variables? The first four characters of the file name would create the ID variable and the fifth would create the time variable. I have attached example datafiles and the final dataset I am hoping to achieve.
I am looking for guidance on how to achieve this.
Thank you for your help.
Mark
This can start you off:
This part should change:
proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=dsn&cnt
dbms=csv replace;
run;
To this, which add the file name into the file.
proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=dsn&cnt
dbms=XLSX replace;
run;
data dsn&cnt;
set dsn&cnt;
*creates file name;
sourceFile = "%qsysfunc(dread(&did,&i))";
*insert rest of functions from yesterdays post;
run;
EDIT: I think you also need to change it slightly to check for XLSX not CSV files but that should be fairly straightforward.
Then once the macro completes, this will append all the files together (any data set named with the prefix DSN)
data bigFile;
set dsn:;
run;
However, if your Excel files are not formatted consistently one common issue is that it will not read the types correctly. In that case you'll get errors about a mismatch of types (some are character, some are numeric). If that happens you'll need to either rejig your process to ensure your data types are correct for each file or convert them all to CSV and import those which do allow you to specify the types as you import the data.
This code converts all XLSX files in a folder to CSV
https://gist.github.com/statgeek/878e585102c14e01581f55dbe972d27e
@marksanter wrote:
Good afternoon,
I am currently working with several hundred excel spreadsheets in a folder. They currently have the variables Height and Width. The IDs and Time variables are in the excel file names. but not in the spreadsheets themselves. I am looking to import all of the files into one dataset with the variables ID, Time, Height, and Width. How can I go about importing them into one dataset with the headings, and using the file names to create variables? The first four characters of the file name would create the ID variable and the fifth would create the time variable. I have attached example datafiles and the final dataset I am hoping to achieve.
I am looking for guidance on how to achieve this.
Thank you for your help.
Mark
Hello,
Thank you for your help with this. I have tried to run the code and it is telling me that Work.dsn does not exist. Everything appears to be okay though. Any thoughts?
Mark
If those files were stored in csv or another text format, you would need only one data step:
data final;
length
fname f_name $200
id $4
time $1
;
infile "/path/*.csv" filename=fname dlm="," dsd truncover;
f_name = scan(fname,-1,"/"); /* extract filename from complete path */
id = substr(f_name,1,4);
time = substr(f_name,5,1);
input @;
if index(upcase(_infile_),'HEIGHT') = 0; /* discard header */
input height width;
drop f_name;
run;
Untested.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.