I have a large excel file that I am importing to SAS; Row 1 has the variable names but rows 2-4 of the file have additional descriptive information (about the variables) that I do not want in my SAS dataset. Is there a way to exclude those 3 rows while importing or after importing the data onto SAS?
@rsuresh wrote:
I have a large excel file that I am importing to SAS; Row 1 has the variable names but rows 2-4 of the file have additional descriptive information (about the variables) that I do not want in my SAS dataset. Is there a way to exclude those 3 rows while importing or after importing the data onto SAS?
You are in luck. The quirky way that the DATAROW= statement works in PROC IMPORT with the XLSX db engine will do exactly that. You can tell it what row to start reading the data, but it always uses the first row to guess what names to give to the variables.
proc import datafile='large.xlsx' dbms=xlsx
out=want replace
;
datarow=5;
run;
If the extra information was in rows 1 to 3 and row 4 had the column headers then you would have to work harder.
Example program:
proc import datafile='c:\downloads\skip.xlsx' dbms=xlsx
out=skip replace
;
datarow=3;
run;
Result:
Obs X Y Z 1 1 12/31/2021 Height
Picture of XLSX file:
Best thing is to remove these rows in Excel.
If you are going to have multiple files with the same structure then one approach is to save the file to a CSV file format and write a data step to read the file.
The Infile statement that is used by the data step to identify the file and characteristics such as delimiters and logical record length include a FIRSTOBS option which means skip to the row of the file before attempting to read anything.
One big reason to use the CSV approach with multiple files is so you control the type and length of all the variables. Proc Import, or any wizard that uses such, makes separate decisions for each file imported. Which means that variables of the same name may change type or length from different files. Or in a poor data source environment may even mean that the variable names change. The code to read CSV would treat the columns in order the same for each file.
If you save your file to CSV and use Proc Import for that file (recommend the guessingrows=max; option) then SAS will create data step code that appears in the log. You can copy that code to the editor, clean it up save and reuse.
This will work better with a file without extra information as you describe. Once you have a nice program to read the file you can add the option for Firstobs to skip the header rows for the next file.
@rsuresh wrote:
I have a large excel file that I am importing to SAS; Row 1 has the variable names but rows 2-4 of the file have additional descriptive information (about the variables) that I do not want in my SAS dataset. Is there a way to exclude those 3 rows while importing or after importing the data onto SAS?
You are in luck. The quirky way that the DATAROW= statement works in PROC IMPORT with the XLSX db engine will do exactly that. You can tell it what row to start reading the data, but it always uses the first row to guess what names to give to the variables.
proc import datafile='large.xlsx' dbms=xlsx
out=want replace
;
datarow=5;
run;
If the extra information was in rows 1 to 3 and row 4 had the column headers then you would have to work harder.
Example program:
proc import datafile='c:\downloads\skip.xlsx' dbms=xlsx
out=skip replace
;
datarow=3;
run;
Result:
Obs X Y Z 1 1 12/31/2021 Height
Picture of XLSX file:
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.