10-09-2015 07:30 AM
I want to transfer a table from Excel to SAS (My SAS version is 9.2 and Excel file format is XLSM, macro). The column names will be read from the cell B3 and the data will start from the cell B4, like below:
A B C D E F G ... 1 2 3 Col1 Col2 4 15 20 5 16 21 6 ... ...
The problem is that the last row number is unknown, because the table length can be 200 rows today and it can be 350 rows tomorrow. So how can I import this table from Excel (XLSM) to SAS-table?
I read in somwhere that we can use
Proc Import when
DBMS=EXCEL like below:
proc import datafile = "!datafile" out=Table1 DBMS = EXCEL REPLACE; SHEET = "Sheet1"; GETNAMES=YES; MIXED=YES; USEDATE=YES; SCANTIME=YES; NAMEROW=3; DATAROW=4; run;
However, SAS can not recognize the
DATAROW option, giving the error
ERROR 180-322: Statement is not valid or it is used out of proper order. . There is another way of importing table from Excel like:
PROC SQL; CONNECT TO EXCEL (PATH='C:\\thepath\excelfile.xlsm'); Create Table Table1 as SELECT * FROM CONNECTION TO EXCEL (SELECT * FROM [Sheet1$]); DISCONNECT FROM EXCEL; QUIT;
As a result, does anyone know how to export a table with Unknown rows from XLSM to SAS? Thanks in advance...
10-09-2015 08:40 AM
To be honest going from Excel (unstrcutured mess) to SAS (structured format) is going to be difficult. You could try specifyfing the range as A:Z and drop blank records. However, do the columns all have the same format in each observations etc. Why can you not just remove those two records from the start. If it was me I would write a small VBA macro which just runs over the range, i.e. A4: xlRight/xlDown, and loop over the data writing out a CSV file.
Then you can simply import the CSV into SAS using datastep and infile.
10-09-2015 08:49 AM
First of all, save the file in a suitable file transfer format (which xlsm is NOT), ie csv.
The DATAROW statement is valid only for delimited files, so it cannot be used with crappy formats like the xlsX family.
(This is explicitly explained in the PROC IMPORT documentation)
10-09-2015 09:39 AM
I found an "ineffective" alternative solution which reads all possible rows in Excel (reads 50.000 rows), at the same time it checks every row under the column
Col1 if these rows have a value.
It takes 7-8 seconds, and it works. But as I wrote, it feels ineffective to read the whole 50.000 rows.
PROC SQL; CONNECT TO EXCEL (PATH='C:\\thepath\excelfile.xlsm'); Create Table Table1 as SELECT * FROM CONNECTION TO EXCEL (SELECT * FROM [Sheet1$B3:C50000] WHERE Col1 IS NOT NULL); DISCONNECT FROM EXCEL; QUIT;