BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
novinosrin
Tourmaline | Level 20

Requesting ideas/help to read multiple excel files xlsx from a folder in server efficiently?

 

Folder is in server

 

Number of cols: 130

Row 1 has  some junk and has to be ignored

Variable names begin from Row 2

Values begin from Row 3

Also some columns do not have variable names and is blank

 

Any simple convenient yet efficient way to handle this?An example will help. Thank you in advance

 

PS My mind isn't thinking well. Sorry for the bother

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

It is a pain in ***.

If you just use PROC IMPORT it will use the first row as the names and include the names in the data.

If you tell it the data starts on line 3 it will not read the names as data, but it still tries to use the title row as the names.

Instead use the RANGE option to tell it start reading on A2.  But then you NEED to know the sheetname.

Example file: 

image.png

Example Code:

%let filename="c:\downloads\testa.xlsx";

proc import datafile=&filename dbms=xlsx
  out=test1 replace
;
run;

proc print; title1 'TEST1';
run;

proc import datafile=&filename dbms=xlsx
  out=test2 replace
;
datarow=3;
run;

proc print; title1 'TEST2';
run;

proc import datafile=&filename dbms=xlsx
  out=test3 replace
;
range="Sheet1$A2:0";
run;

proc print; title1 'TEST3';
run;

title1;

Example Results:

TEST1

       Title_
Obs     line     B       C

 1       ID      Name    Age
 2       1       Sam     23
 3       2       Fred    33

TEST2

       Title_
Obs     line     B       C

 1       1       Sam     23
 2       2       Fred    33

TEST3

Obs    ID    Name    Age

 1     1     Sam     23
 2     2     Fred    33

 

View solution in original post

18 REPLIES 18
Reeza
Super User

1. Do all the files have similar structure?

2. What type of server, Windows, Unix? If you're on Unix, you likely need to use PCFILES. 

3. What happens if you write a libname reference to the file and use PROC COPY? 

4. If they are similar structure have you tried using a basic PROC IMPORT with DBMS = XLSX and specifying the RANGE from A2:END? There's a way to specify a dynamic range so if this works, I can look up how to do that. 

  

Documentation reference for Excel files:

https://documentation.sas.com/?docsetId=acpcref&docsetTarget=n0msy4hy1so0ren1acm90iijxn8j.htm&docset...

 

libname myFile XLSX "path to xlsx file";

*you need to ensure pcfiles is set up and running;
libname myTest PCFILES Path = "path to xlsx file" server= port=;


 

Vince_SAS
Rhodochrosite | Level 12

The XLSX LIBNAME engine and the XLSX PROC IMPORT engine work on Unix, so the SAS PC Files Server might not be needed.

 

Vince DelGobbo

SAS R&D

novinosrin
Tourmaline | Level 20

It's a windows server.

 

This seems to work

 

proc import 
DATAFILE = "\\corp\sites\RIB1001\HLSCreditRisk\RMBS Collateral Data\RAW DATA FOR NAVEEN\12646wah7.xlsx" 
DBMS=XLSX   OUT =test REPLACE;

getnames=no;

datarow=3;
run;

But I do not understand the datarow as the doc says  Starts reading data from the specified row number in the delimited text file. So what's the delimiter here?

Reeza
Super User
DATAROW should only work with delimited text files, not an XLSX files as far as I know. The SAS documentation is starting to drive me nuts lately....seems like it's constantly missing things or out of date.
novinosrin
Tourmaline | Level 20

+1 to your thoughts.

 

I'm very unimpressed too as I was embarrassed 😞

Reeza
Super User
You shouldn't ever feel embarrassed at not knowing something, especially if its your first time dealing with that situation. If it's the third time you've made that mistake then you should feel a bit embarrassed 🙂
Tom
Super User Tom
Super User

It is a pain in ***.

If you just use PROC IMPORT it will use the first row as the names and include the names in the data.

If you tell it the data starts on line 3 it will not read the names as data, but it still tries to use the title row as the names.

Instead use the RANGE option to tell it start reading on A2.  But then you NEED to know the sheetname.

Example file: 

image.png

Example Code:

%let filename="c:\downloads\testa.xlsx";

proc import datafile=&filename dbms=xlsx
  out=test1 replace
;
run;

proc print; title1 'TEST1';
run;

proc import datafile=&filename dbms=xlsx
  out=test2 replace
;
datarow=3;
run;

proc print; title1 'TEST2';
run;

proc import datafile=&filename dbms=xlsx
  out=test3 replace
;
range="Sheet1$A2:0";
run;

proc print; title1 'TEST3';
run;

title1;

Example Results:

TEST1

       Title_
Obs     line     B       C

 1       ID      Name    Age
 2       1       Sam     23
 3       2       Fred    33

TEST2

       Title_
Obs     line     B       C

 1       1       Sam     23
 2       2       Fred    33

TEST3

Obs    ID    Name    Age

 1     1     Sam     23
 2     2     Fred    33

 

Reeza
Super User

@Tom where in the documentation does it say that data row applies to XLSX DBMS? In the documentation, it only shows that RANGE and SHEET apply to XLSX DBMS? 

 

That's primarily the issue here, I'm guessing that it's under a different section than the link above?

Tom
Super User Tom
Super User

Since DATAROW doesn't really work right unless you want GETNAMES=NO not sure it matters that much.

On this page:

https://documentation.sas.com/?docsetId=acpcref&docsetTarget=n0msy4hy1so0ren1acm90iijxn8j.htm&docset...

It mentions DATAROW under the section on RANGE.  It also does not seem to properly describe the limitation on the RANGE option that it MUST include a sheetname if you are not using a named range.

Vince_SAS
Rhodochrosite | Level 12

In the simple case of reading the first worksheet, I don't think that you need to specify the worksheet name in the range.

 

proc import datafile=&filename dbms=xlsx
  out=test3 replace;
  range='$A2:0';
run; quit;

 

Vince DelGobbo

SAS R&D

Tom
Super User Tom
Super User

I thought I tried that.  But looking at the log I had tried just '$A2'.  

Interestingly you don't need the trailing zero. So just '$A2:' works.

novinosrin
Tourmaline | Level 20

Marvelous Sir @Tom  . First off, thank you for your time and detailing the example. let me try that solution, and then come back to you should i seek any clarification and of course to mark the answer. 

 

 

novinosrin
Tourmaline | Level 20

Sir @Tom , @Reeza  , @Vince_SAS 

 

Sorry for the bother yet again. May I request a favor in helping me with some kind of a tool that would convert all the xlsx files in a folder to CSV. I know this request is perhaps not appropriate in a SAS forum but I am just taking a chance. The reason that led to me to this is I am not quite happy with Proc import making guesses incorrectly for some files and perhaps correctly for others. Could i get lucky plz?

 

 

Reeza
Super User

https://gist.github.com/statgeek/c51f58a009f8d315a200f34912e494b1

 

This goes from xml to xlsx but it's easy to modify, change the 51 to 6 and the extension in the code from xlsx to csv

https://docs.microsoft.com/en-us/office/vba/api/excel.xlfileformat

 

I don't think this deals with multiple sheets in each file, is that something you need to deal with?

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 1643 views
  • 11 likes
  • 4 in conversation