Dear SAS community,
In your opinion what is the best method to read an excel file into SAS? Initially I used Proc Import to read in an xlsx, but it does not retain the column formats for many of my variables, even with the guessing rows option (i.e. column is empty, but still need to be in numeric format). Then I thought I would do an infile statement with the data converted into a csv file. This was so I could add in the length/format statements. However, I cannot read in multiple "sheets" with CSV files and the data I am reading comes with multiple tabs. Thoughts?
Thanks!
How many files and how often?
I usually just pull them in with file > import data >sas import wizard.
You can have it save the steps and then maybe use that generated import statement as a base for a macro if you have a lot to do.
It depends. Are you trying to combine the data after? If so, you do have a problem because you have to make sure SAS reads the files the same so you can operationalize this. Otherwise you run into one variable is character and one is numeric and one is length 20 in one file and length 50 in another file.
You have the most control using CSV's and you can read multiple CSVs in a single data step, especially if they all have the same format.
@sophia_SAS wrote:
You are correct in that I am trying to combine the data afterwards. I would like to use CSV but am struggling with reading in multiple tabs within 1 csv file. Here's my infile code for the first tab. I have 5 other 'tabs in this mydata csv file. (I know CSV doesn't do sheets.)
DATA want;
FORMAT ID $7. STATE $2. DOB MMDDYY10. CLASSOTH $25.;
INFILE "mydata.csv" missover dsd firstobs=2;
INPUT IT $ STATE $ DOB GEO2010FIPS CLASS1 CLASS2 CLASSOTH $;
run;
Obviously 5 source files will require file separate CSV files since they are just simple text files.
If they have the same structure you can either read them all in data step, or read them separately and then combine.
Do if you had this code to read one file.
data want ;
length ID $7 STATE $2 DOB GEO2010 FIPS CLASS1 CLASS2 8 CLASSOTH $25;
informat dob mmddyy. ;
format dob date9. ;
INFILE "mydata.csv" dsd firstobs=2 truncover;
INPUT ID -- CLASSOTH ;
run;
You call it five with different input files and dataset names to read in five files.
Or you could possible adopt it to read in all five files in one pass if you wanted.
One way would be to use wildcard in the INFILE name. But if each file has a header row then you need to add code to skip the extra headers. Perhaps by using the EOV variable that SAS will set to 1 when you start a new physical file. So if you had files named mydata1.csv, mydate2.csv .... then you might be able to use a step like this.
data want ;
length ID $7 STATE $2 DOB GEO2010 FIPS CLASS1 CLASS2 8 CLASSOTH $25;
informat dob mmddyy. ;
format dob date9. ;
INFILE "mydata*.csv" dsd firstobs=2 truncover eov=eov;
input @;
if _n_=1 or eov then input;
INPUT ID -- CLASSOTH ;
eov=0;
run;
I normally use .NET to handle Excel. That said, I am wondering if you have Access to ODBC? If so, consider using Excel as an ODBC database. Set your libname then query it using SQL:
Driver={Microsoft Excel Driver (*.xls)};DriverId=790;Dbq=C:\MyExcel.xls;DefaultDir=c:\mypath;
SQL syntax "SELECT [Column Name One], [Column Name Two] FROM [Sheet One$]". I.e. excel worksheet name followed by a "$" and wrapped in "[" "]" brackets.
See https://www.connectionstrings.com/excel/
Either PROC IMPORT or the XLSX libname engine will do a good job of reading an Excel file. It is easier to read multiple tabs using the libname engine.
libname in xlsx 'path to file.xlsx';
libname out 'directoryname';
proc copy inlib=in outlib=out;
run;
The problems are usually caused by the fact that Excel is a spreadsheet and NOT a database. That is each individual cell can hold whatever you want to put into it. A database consists of multiple observations, each of which has the same set of variables. So unless you take care to properly format your columns so that all cells use the same data type then when your data changes you can end up with differently defined variables in SAS.
You will have the similar problem with text files like CSV files when using PROC IMPORT because it will have to guess at what type of variable to create based on the data you provide it. At least with Excel you can set metadata on the cells in Excel and therefore SAS can know what type of value you have, but most users do not do this. So characters strings that look like numbers might confuse PROC IMPORT ( or Excel for that matter) when read from a CSV file, but if you format the cells properly in Excel then it will know that the field is characters and not a number.
If you use a text file for your transfer method then you can write a simple data step that reads the file and so enforce a consistent structure for the SAS dataset. You just need to make sure the the supplier of the data uses the format consistently.
If you have multiple sheets in your Excel file then you would want multiple individual text files to transfer that data (whether you ever put it into excel or not). If they all have the same structure then you can read them in one data step if you want. Or if if create seprate SAS dataset but use a consistent data step then you can easily combine the separate dataset and not have to worry about things like wrong data type for some variables or varying lengths for character variables.
All good ways for importing data into SAS do not involve the Excel file format. Use a textual format and a data step.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.