Good afternoon,
I am currently working with several hundred excel spreadsheets in a folder. They currently have the variables Height and Width. The IDs and Time variables are in the excel file names. but not in the spreadsheets themselves. I am looking to import all of the files into one dataset with the variables ID, Time, Height, and Width. How can I go about importing them into one dataset with the headings, and using the file names to create variables? The first four characters of the file name would create the ID variable and the fifth would create the time variable. I have attached example datafiles and the final dataset I am hoping to achieve.
I am looking for guidance on how to achieve this.
Thank you for your help.
Mark
This can start you off:
This part should change:
proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=dsn&cnt
dbms=csv replace;
run;
To this, which add the file name into the file.
proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=dsn&cnt
dbms=XLSX replace;
run;
data dsn&cnt;
set dsn&cnt;
*creates file name;
sourceFile = "%qsysfunc(dread(&did,&i))";
*insert rest of functions from yesterdays post;
run;
EDIT: I think you also need to change it slightly to check for XLSX not CSV files but that should be fairly straightforward.
Then once the macro completes, this will append all the files together (any data set named with the prefix DSN)
data bigFile;
set dsn:;
run;
However, if your Excel files are not formatted consistently one common issue is that it will not read the types correctly. In that case you'll get errors about a mismatch of types (some are character, some are numeric). If that happens you'll need to either rejig your process to ensure your data types are correct for each file or convert them all to CSV and import those which do allow you to specify the types as you import the data.
This code converts all XLSX files in a folder to CSV
https://gist.github.com/statgeek/878e585102c14e01581f55dbe972d27e
@marksanter wrote:
Good afternoon,
I am currently working with several hundred excel spreadsheets in a folder. They currently have the variables Height and Width. The IDs and Time variables are in the excel file names. but not in the spreadsheets themselves. I am looking to import all of the files into one dataset with the variables ID, Time, Height, and Width. How can I go about importing them into one dataset with the headings, and using the file names to create variables? The first four characters of the file name would create the ID variable and the fifth would create the time variable. I have attached example datafiles and the final dataset I am hoping to achieve.
I am looking for guidance on how to achieve this.
Thank you for your help.
Mark
Hello,
Thank you for your help with this. I have tried to run the code and it is telling me that Work.dsn does not exist. Everything appears to be okay though. Any thoughts?
Mark
Thank you, your suggestion works for me here is the combined code I used.
If you have a question start a new thread.
Note that you do not need to work so hard to read in multiple TEXT files that all have the same set of variables. By writing your own data step you can use the INFILE statement to access multiple files and tell you the name of the current file also. Plus then you can define the variables yourself instead of risking that PROC IMPORT will make the wrong guesses about what variable names to use and what types of variable they are.
For example:
data bigFile;
length fname $256 ticker $5 ;
infile "c:\Users\Test\*.csv" dsd truncover filename=fname;
input @;
if lag(fname) ne fname then delete;
ticker=scan(scan(fname,-1,'/\'),1,'.');
length var1 $10 var2 8 .... varlast $20 ;
input var1 -- varlast;
run;
Thanks, I will surely try it.
If those files were stored in csv or another text format, you would need only one data step:
data final;
length
fname f_name $200
id $4
time $1
;
infile "/path/*.csv" filename=fname dlm="," dsd truncover;
f_name = scan(fname,-1,"/"); /* extract filename from complete path */
id = substr(f_name,1,4);
time = substr(f_name,5,1);
input @;
if index(upcase(_infile_),'HEIGHT') = 0; /* discard header */
input height width;
drop f_name;
run;
Untested.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.