Hi All
I have an Excel file that I need to import from another team to SAS. Usually I would save it as a CSV and import using a data step with an infile statement. But for various reasons i don't want to do that for this particular excersice - I need to import the Excel file directly. Please see attached an example of the file structure. The first row is a heading that I want to skip, with the variable names in the second row and the data from there on.
I looked on the forum and for some reason the code that people suggest to do this, gives me an error. When I try and run the code below
proc import datafile= "C:\temp\book3b.xlsx"
	out= Test_Data replace
	DBMS= Excel;
	sheet= "Test Data";
/*	startrow=3;*/
/*	namerow=2;*/
	getnames= yes;
/*	mixed=yes;*/
/*	scantext=yes;*/
/*	usedate=yes;*/
	datarow=3;
 run;I get "ERROR 180-322: Statement is not valid or it is used out of proper order." on the 'datarow' line.
1) How can I ignore the first row?
2) is there a way i can tell the import procedure I want to exclude certain columns? The 'Comments' column has a high potential to give errors and inconsistancies, and as such i would just like to ignore it.
Thnx
GD
Helpful little note in the documentation, which did work as expected.
Specify RANGE="Sheetname$A#:0"; where # is the first data row. Thus, RANGE="sheet1$A3:0"; starts to read the data at row 3. If you use RANGE= for this purpose, do not specify the DATAROW= statement.
proc import out=test datafile='/folders/myfolders/input.xlsx' dbms=xlsx replace; getnames=yes;
range="Sheet1$A2:0";
run;Try DBMS=XLSX and DATAROW=3.
I couldn't get it to work with GETNAMES and DATAROW though.
Helpful little note in the documentation, which did work as expected.
Specify RANGE="Sheetname$A#:0"; where # is the first data row. Thus, RANGE="sheet1$A3:0"; starts to read the data at row 3. If you use RANGE= for this purpose, do not specify the DATAROW= statement.
proc import out=test datafile='/folders/myfolders/input.xlsx' dbms=xlsx replace; getnames=yes;
range="Sheet1$A2:0";
run;Nope, its still not working. It doesn't matter what I do, it keeps on taking the first row (the one I want to ignore) as the variable names. Log window gives no errors or notes. The range statement makes no difference. If I comment out the range stament with the datarow statement in, I can either get the variable names in as the first row, or exclude it.
proc import datafile= "C:\temp\book3b.xlsx"
out= Test_Data replace
DBMS= XLSX;
sheet= "Test Data";
getnames= yes;
/* datarow=2;*/
range="Test Data$A2:0";
run;
So I can go the log way around and import it like this with the variable names in the first row, and then somehow create a new dataset from this....
GD
ah...the sheet statement with the range statement confuses the thing. Silly that it doesn't give you an error or a note on this.
proc import datafile= "C:\temp\book3b.xlsx"
out= Test_Data replace
DBMS= XLSX;
/* sheet= "Test Data";*/
getnames= yes;
/* datarow=2;*/
range="Test Data$A2:0";
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
