Hi All
I have an Excel file that I need to import from another team to SAS. Usually I would save it as a CSV and import using a data step with an infile statement. But for various reasons i don't want to do that for this particular excersice - I need to import the Excel file directly. Please see attached an example of the file structure. The first row is a heading that I want to skip, with the variable names in the second row and the data from there on.
I looked on the forum and for some reason the code that people suggest to do this, gives me an error. When I try and run the code below
proc import datafile= "C:\temp\book3b.xlsx"
out= Test_Data replace
DBMS= Excel;
sheet= "Test Data";
/* startrow=3;*/
/* namerow=2;*/
getnames= yes;
/* mixed=yes;*/
/* scantext=yes;*/
/* usedate=yes;*/
datarow=3;
run;
I get "ERROR 180-322: Statement is not valid or it is used out of proper order." on the 'datarow' line.
1) How can I ignore the first row?
2) is there a way i can tell the import procedure I want to exclude certain columns? The 'Comments' column has a high potential to give errors and inconsistancies, and as such i would just like to ignore it.
Thnx
GD
Helpful little note in the documentation, which did work as expected.
Specify RANGE="Sheetname$A#:0"; where # is the first data row. Thus, RANGE="sheet1$A3:0";
starts to read the data at row 3. If you use RANGE= for this purpose, do not specify the DATAROW= statement.
proc import out=test datafile='/folders/myfolders/input.xlsx' dbms=xlsx replace; getnames=yes;
range="Sheet1$A2:0";
run;
Try DBMS=XLSX and DATAROW=3.
I couldn't get it to work with GETNAMES and DATAROW though.
Helpful little note in the documentation, which did work as expected.
Specify RANGE="Sheetname$A#:0"; where # is the first data row. Thus, RANGE="sheet1$A3:0";
starts to read the data at row 3. If you use RANGE= for this purpose, do not specify the DATAROW= statement.
proc import out=test datafile='/folders/myfolders/input.xlsx' dbms=xlsx replace; getnames=yes;
range="Sheet1$A2:0";
run;
Nope, its still not working. It doesn't matter what I do, it keeps on taking the first row (the one I want to ignore) as the variable names. Log window gives no errors or notes. The range statement makes no difference. If I comment out the range stament with the datarow statement in, I can either get the variable names in as the first row, or exclude it.
proc import datafile= "C:\temp\book3b.xlsx"
out= Test_Data replace
DBMS= XLSX;
sheet= "Test Data";
getnames= yes;
/* datarow=2;*/
range="Test Data$A2:0";
run;
So I can go the log way around and import it like this with the variable names in the first row, and then somehow create a new dataset from this....
GD
ah...the sheet statement with the range statement confuses the thing. Silly that it doesn't give you an error or a note on this.
proc import datafile= "C:\temp\book3b.xlsx"
out= Test_Data replace
DBMS= XLSX;
/* sheet= "Test Data";*/
getnames= yes;
/* datarow=2;*/
range="Test Data$A2:0";
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.