Hi all,
i am facing a problem while importing data from excel 2010 in SAS Datasets. The data is present in different sheets and in each sheet headers are there till some rows i.e the original data starts from different rows in each sheet. i can import the data by specifying RANGE option. But without RANGE option i wanna import each sheet into a single dataset. i tried DATAROW option, FRISTOBS option but they didn't worked. Please help me in this.
Thanks
This is a popular question. I'm consolidating some of the best responses from this and other threads to provide some guidance.
First, with SAS 9.4 and later, SAS recommends using DBMS=XLSX for the most flexibility. It works on all operating systems without the need for additional components like the PC Files Server. Your Excel file must be in the Excel 2007-or-later format (XLSX). You do need a licence for SAS/ACCESS to PC Files.
If your Excel data does not begin in cell A1 (the default start point for an import process), then you can specify a RANGE= value that includes the specific cells. The easiest method is to use a Named Range in Excel to define the exact boundaries of the data.
To define a named range in Excel, highlight the range of cells to include and simply type the new name of the range in the Name Box:
Then save the Excel file.
Then to import into SAS, specify that range name in the RANGE= option:
proc import datafile="/myprojects/myfile.xlsx"
out=mydata
replace;
range="myspecialrange";
run;
The original question asked for a method that didn't use RANGE=. You can use PROC IMPORT to read the entire sheet, but the result will not have the column headers and types you want. Consider a sheet like this:
This code will read it:
proc import datafile="/myprojects/middle.xlsx"
out=mid dbms=xlsx
replace;
run;
But the result will contain many empty cells, and the values will be read as all character types:
With additional coding, you can "fix" this result in another pass using DATA step. Or, if you're willing to add the RANGE option, you can read it properly in the first pass:
proc import datafile="/myprojects/middle.xlsx"
out=mid dbms=xlsx
replace;
range="Sheet1$E7:K17" ;
run;
You can also use LIBNAME XLSX to read entire sheets from Excel, or simply as a discovery step to see what sheets the Excel file contains before you run PROC IMPORT. However, LIBNAME XLSX does not show the Excel named ranges.
On SAS for Windows systems, you can use LIBNAME EXCEL (32-bit) or LIBNAME PCFILES (64-bit) to reveal a little more information about the Excel file.
libname d pcfiles path=c:\myprojects\middle.xlsx";
proc datasets lib=d; quit;
/* always clear the libname, as it locks the file */
libname d clear;
Note that does DBMS=XLSX not support some of the options we see in the legacy DBMS=XLS (which supports only old-format XLS files), such as STARTROW and NAMEROW. DBMS=XLSX does support GETNAMES (treats the first record of the sheet or range as the variable names). See the full reference for Excel file import/export in the SAS documentation.
Hi,
could you post your code please?
Did you try sheet=sheetname instead of range=range?
Hi
proc import datafile="C:\Users\data.xlsx"
DBMS=excel replace
out=Report;
mixed= yes;
getnames= yes;
DBDSOPTS= 'FIRSTOBS=9';
sheet='Compliance Report';
run;
When i used the above code i am getting the data into dataset but the variable names become F1, F2, F3..........
proc import datafile="C:\Users\data.xlsx"
DBMS=excel replace
out=Report;
mixed= yes;
getnames= yes;
datarow= 9;
sheet='Compliance Report';
run;
When i am using the above code i am getting an error stating that "ERROR 180-322: Statement is not valid or it is used out of proper order."
Your problem is that you are using Excel. Excel is not a strucuted dataset as database/SAS etc. software are. SAS datasets are created as tables, so each variable - column in a dataset has a fixed length/format. Excel does not impose this, each cell is different. Also, cells could be anywhere on the spreadsheet, as you have said there are heard rows and such like.
The second point to note on this is that proc import is a guessing procedure, it will attempt to do a best guess on your data. Not ideal really.
So as always, you should use the technology best suited to the task at hand. If you are transferring data, then a data transfer format would be best suited - CSV, XML etc. I understand that sometimes this is not possible, and that is why so manmy of these issues occur. In your case, you will need to look at each sheet you want to import, pull out the data you want into a dataset type structure, i.e. if your data starts in column D, at row 6, then copy from there, put the datainto a grid of its own. If you need titles then put them into a grid themselves. You can then import that data, although as always, I recommend saving that to CSV and writing a datastep import program to read the data in as specified by you, the person who knows the data best.
An alternative, if you know VBA, is to write a macro to do this and export the data to CSV (or if your feeling really brave, head directly in the Open Office Document format).
DATAROW option is not available for dbms=excel try STARTROW=
This is a popular question. I'm consolidating some of the best responses from this and other threads to provide some guidance.
First, with SAS 9.4 and later, SAS recommends using DBMS=XLSX for the most flexibility. It works on all operating systems without the need for additional components like the PC Files Server. Your Excel file must be in the Excel 2007-or-later format (XLSX). You do need a licence for SAS/ACCESS to PC Files.
If your Excel data does not begin in cell A1 (the default start point for an import process), then you can specify a RANGE= value that includes the specific cells. The easiest method is to use a Named Range in Excel to define the exact boundaries of the data.
To define a named range in Excel, highlight the range of cells to include and simply type the new name of the range in the Name Box:
Then save the Excel file.
Then to import into SAS, specify that range name in the RANGE= option:
proc import datafile="/myprojects/myfile.xlsx"
out=mydata
replace;
range="myspecialrange";
run;
The original question asked for a method that didn't use RANGE=. You can use PROC IMPORT to read the entire sheet, but the result will not have the column headers and types you want. Consider a sheet like this:
This code will read it:
proc import datafile="/myprojects/middle.xlsx"
out=mid dbms=xlsx
replace;
run;
But the result will contain many empty cells, and the values will be read as all character types:
With additional coding, you can "fix" this result in another pass using DATA step. Or, if you're willing to add the RANGE option, you can read it properly in the first pass:
proc import datafile="/myprojects/middle.xlsx"
out=mid dbms=xlsx
replace;
range="Sheet1$E7:K17" ;
run;
You can also use LIBNAME XLSX to read entire sheets from Excel, or simply as a discovery step to see what sheets the Excel file contains before you run PROC IMPORT. However, LIBNAME XLSX does not show the Excel named ranges.
On SAS for Windows systems, you can use LIBNAME EXCEL (32-bit) or LIBNAME PCFILES (64-bit) to reveal a little more information about the Excel file.
libname d pcfiles path=c:\myprojects\middle.xlsx";
proc datasets lib=d; quit;
/* always clear the libname, as it locks the file */
libname d clear;
Note that does DBMS=XLSX not support some of the options we see in the legacy DBMS=XLS (which supports only old-format XLS files), such as STARTROW and NAMEROW. DBMS=XLSX does support GETNAMES (treats the first record of the sheet or range as the variable names). See the full reference for Excel file import/export in the SAS documentation.
Thank you Chris.
Just as what you suggest, when put DBMS = XLSX, only a few options are supported, according to SAS documentation.
Available Statements for Importing and Exporting Excel Files Using DBMS=XLS and DBMS=XLSX | |||||
DBMS= Identifier | Option | Valid Value | Default Value | PROC | PROC |
IMPORT | EXPORT | ||||
XLS | ENDCOL | Last column for data | Last column that contains data | Yes | No |
ENDNAMEROW | Last row for variable names | Same as NAMEROW | Yes | No | |
ENDROW | Last row for data | Last row that contains data | Yes | No | |
GETNAMES | Yes | No | Yes | Yes | No | |
NAMEROW | First row for variable names | First row that contains variable names | Yes | No | |
NEWFILE | Yes | No | No | No | Yes | |
PUTNAMES | Yes | No | Yes | No | Yes | |
RANGE | name | sheet$ul:lr | First row | Yes | No | |
SHEET | Sheet name | First sheet | Yes | Yes | |
STARTCOL | First column for data | Last column that contains data | Yes | No | |
STARTROW | First row for data | First row that contains data | Yes | No | |
XLSX | GETNAMES | Yes | No | Yes | Yes | No |
RANGE | name | sheet$ul:lr | First row | Yes | No | |
SHEET | Sheet name | First sheet | Yes | Yes |
Yes, fewer (and different) options are supported with PROC IMPORT and DBMS=XLSX -- but the most important options are there. DBMS=XLS is a very old method that can work only with legacy XLS files.
True. looking at the SAS code generated by SAS when importing Excel files by using the wizard, SAS put DBMS=Excel by default, with several options that resemble those of DBMS=XLS.
It works fine with me (DBMS=Excel ) until I import Excel Sheets that have more than 255 column. In fact columns beyond 255 column are ignored by the program without any warning sign.
The only choice is to use DBMS=XLSX.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.