BookmarkSubscribeRSS Feed
marksanter
Obsidian | Level 7

Good afternoon,

 

I am currently working with several hundred excel spreadsheets in a folder. They currently have the variables Height and Width. The IDs and Time variables are in the excel file names. but not in the spreadsheets themselves. I am looking to import all of the files into one dataset with the variables ID, Time, Height, and Width. How can I go about importing them into one dataset with the headings, and using the file names to create variables? The first four characters of the file name would create the ID variable and the fifth would create the time variable. I have attached example datafiles and the final dataset I am hoping to achieve. 

 

I am looking for guidance on how to achieve this.

 

Thank you for your help.

 

Mark 

6 REPLIES 6
Reeza
Super User

This can start you off:

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=mcrolref&docsetTarget=n0c...

 

This part should change:

proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=dsn&cnt 
             dbms=csv replace;            
          run;      

To this, which add the file name into the file. 

proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=dsn&cnt 
             dbms=XLSX replace;            
          run;      

data dsn&cnt;
set dsn&cnt;

*creates file name;
sourceFile = "%qsysfunc(dread(&did,&i))";

*insert rest of functions from yesterdays post;

run;

 

EDIT: I think you also need to change it slightly to check for XLSX not CSV files but that should be fairly straightforward.

 

Then once the macro completes, this will append all the files together (any data set named with the prefix DSN)

data bigFile;
set dsn:;
run;

 

However, if your Excel files are not formatted consistently one common issue is that it will not read the types correctly. In that case you'll get errors about a mismatch of types (some are character, some are numeric). If that happens you'll need to either rejig your process to ensure your data types are correct for each file or convert them all to CSV and import those which do allow you to specify the types as you import the data.

 

This code converts all XLSX files in a folder to CSV

https://gist.github.com/statgeek/878e585102c14e01581f55dbe972d27e

 


@marksanter wrote:

Good afternoon,

 

I am currently working with several hundred excel spreadsheets in a folder. They currently have the variables Height and Width. The IDs and Time variables are in the excel file names. but not in the spreadsheets themselves. I am looking to import all of the files into one dataset with the variables ID, Time, Height, and Width. How can I go about importing them into one dataset with the headings, and using the file names to create variables? The first four characters of the file name would create the ID variable and the fifth would create the time variable. I have attached example datafiles and the final dataset I am hoping to achieve. 

 

I am looking for guidance on how to achieve this.

 

Thank you for your help.

 

Mark 


 

marksanter
Obsidian | Level 7

Hello, 

 

Thank you for your help with this. I have tried to run the code and it is telling me that Work.dsn does not exist. Everything appears to be okay though. Any thoughts?

 

Mark

Maud1
Calcite | Level 5

Thank you, your suggestion works for me here is the combined code I used. 

 

/*this code download all the csv or excel files in the same directory, they don’t need to have the same file name or patterns, but they are output as dsn1 dsn2 …*/
 
%macro drive(dir,ext); 
   %local cnt filrf rc did memcnt name;
 
   %let cnt=0;
   %let filrf=mydir;
 
   %let rc=%sysfunc(filename(filrf,&dir));
   %let did=%sysfunc(dopen(&filrf));
 
   %if &did ne 0 %then %do;
      %let memcnt=%sysfunc(dnum(&did));
      %do i=1 %to &memcnt;
         %let name=%qscan(%qsysfunc(dread(&did,&i)),-1,.);
         %if %qupcase(%qsysfunc(dread(&did,&i))) ne %qupcase(&name) %then %do;
            %if %superq(ext) = %superq(name) %then %do;
               %let cnt=%eval(&cnt+1);
               %put %qsysfunc(dread(&did,&i));
               proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=dsn&cnt 
                  dbms=csv replace;
                  guessingrows=max;
  GETNAMES=NO;/*whether excel or csv has the header or not, put no is easier to combine then change
  variable's name later*/
               run;
   data dsn&cnt;
   length /*maximum length of the file name*/
  Ticker $5;
    set dsn&cnt;
Ticker="%qscan(%qsysfunc(dread(&did,&i)),1,.)";
run;
            %end;
         %end;
       %end;
    %end;
  %else %put &dir cannot be opened.;
 
  %let rc=%sysfunc(dclose(&did));
%mend drive;
/*change your folder path below where the files are located, keep ,csv at the end though*/
%drive(C:\Users\Test, csv)
 
/*combine all the files*/
data bigFile;
set dsn:;
run;
Tom
Super User Tom
Super User

If you have a question start a new thread.

Note that you do not need to work so hard to read in multiple TEXT files that all have the same set of variables. By writing your own data step you can use the INFILE statement to access multiple files and tell you the name of the current file also.  Plus then you can define the variables yourself instead of risking that PROC IMPORT will make the wrong guesses about what variable names to use and what types of variable they are.

For example:

data bigFile;
  length fname $256 ticker $5 ;
  infile "c:\Users\Test\*.csv" dsd truncover filename=fname;
  input @;
  if lag(fname) ne fname then delete;
  ticker=scan(scan(fname,-1,'/\'),1,'.');
  length var1 $10 var2 8 .... varlast $20 ;
  input var1 -- varlast;
run;

 

 

Kurt_Bremser
Super User

If those files were stored in csv or another text format, you would need only one data step:

data final;
length
  fname f_name $200
  id $4
  time $1
;
infile "/path/*.csv" filename=fname dlm="," dsd truncover;
f_name = scan(fname,-1,"/"); /* extract filename from complete path */
id = substr(f_name,1,4);
time = substr(f_name,5,1);
input @;
if index(upcase(_infile_),'HEIGHT') = 0; /* discard header */
input height width;
drop f_name;
run;

Untested.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 12145 views
  • 3 likes
  • 5 in conversation