- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How can i import excel having .XLSX Extension using proc import where header start in 2 row and data start in 4 row.I know keywords namerow and startrow used in xls but same is not working in xlsx.i dont want to use range option because i have diffrent excel and want to make code generic.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Gosh, I hate Excel. I understand why the documentation for R strongly discourages the use of Excel for data import.
Anyway, to solve your problem, in Excel, delete rows 1 and 3, and then use SAS PROC IMPORT.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you but i dont want to delete anythng manually . I need a keyword that makes any row as header and data row.For ex-->for xls extension in proc import we use namerow=2 ,datarow=4 to make 2 row as header and 4 row from where data starts.
In my case there are many files and data is jumbled up.so i need keyword that i can use.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry to inform you that I am not aware of a simple solution that will do what you ask.
Perhaps the good news is that some expert here has a solution and I will learn the solution as well as you.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"- you can use import procedure with GETNAMES=NO, DATAROW=3, GUESSINGROWS= (default 20 - you may increase it depending on your file size) and after importing it use a data step or proc datasets to change the variables name."
So, to sum:
- Create two sas macro variables, &namerow = 2 and &datarow=4
- Import the data WITHOUT GETNAMES, so that the variable names will consistently be temp names
- Create an ID on your table using monotonic() or an incrementing data step
- Write a macro, or an elegant SAS INTO: statement, to rename the variables based on &namerow
- Write a data step that overwrites the current dataset, where the ID is less than &datarow
- Profit
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In other words, exactly what I said, there is no easy way to do this.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps you are right, Paige.
Tarun, if you find that this is not an easy solution and you would like some assistance I would be happy to assist! I have conducted similar work before and it would be a pleasure for me to help you out.
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For your perusal. I am sure there may be some efficiencies you could add but I only took about 20 min coding time with it.
Patrick
/*IMPORT YOUR DATA, WITH GETNAMES = NO AND MIXED=YES*/
PROC IMPORT OUT= WORK.HAVE
DATAFILE= "C:\Users\USERNAME\Documents\have.xls"
DBMS=EXCEL REPLACE;
RANGE="Sheet1$";
GETNAMES=NO;
MIXED=YES;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
/*SET YOUR MACRO VARIABLES*/
%LET NAMEROW = 2;
%LET DATAROW = 4;
/*INSERT THE ROWID*/
PROC SQL;
CREATE TABLE HAVE_NEW AS
SELECT MONOTONIC() AS ID, * FROM HAVE;
QUIT;
/*GET ALL OF THE COLUMN NAMES IN THE HAVE SET*/
PROC SQL;
CREATE TABLE COLNAMES AS
SELECT NAME FROM DICTIONARY.COLUMNS WHERE MEMNAME = 'HAVE_NEW' AND NAME NOT IN ('ID');
QUIT;
/*SET THE LENGTH OF THE BLANK NEWNAME COLUMN*/
DATA COLNAMES;
SET COLNAMES;
LENGTH NEWNAME $ 80;
FORMAT NEWNAME $8.;
RUN;
/*CREATE A MACRO THAT WE WILL USE TO UPDATE NEWNAME WITH THE SPECIFIED CRITERIA*/
%MACRO ROWNAMES(COLNAME);
PROC SQL;
UPDATE COLNAMES SET NEWNAME = (SELECT &COLNAME FROM HAVE_NEW WHERE ID = &NAMEROW) WHERE NAME = '%STR(&COLNAME)';
QUIT;
%MEND;
/*EXECUTE THE MACRO FOR EACH OF THE OLD NAMES*/
DATA COLNAMES;
SET COLNAMES;
CALL EXECUTE("%ROWNAMES("||TRIM(NAME)||")");
RUN;
/*CREATE A MACRO VARIABLE WITH A LIST OF EACH OF THE OLD AND NEW COLUMN NAMES
, WHICH WE WILL USE IN THE RENAME STATEMENT*/
PROC SQL;
SELECT TRIM(NAME)||"="||TRIM(NEWNAME) INTO :NEWVAR SEPARATED BY " " FROM COLNAMES;
QUIT;
/*CREATE A MACRO VARIABLE WITH A LIST OF EACH OF THE NEW COLUMN NAMES
, WHICH WE WILL USE IN THE LABEL STATEMENT*/
PROC SQL;
SELECT TRIM(NEWNAME)||"="||TRIM(NEWNAME) INTO :NEWLABEL SEPARATED BY " " FROM COLNAMES;
QUIT;
/*CHANGE THE NAMES, AND THE LABELS, AND ONLY SELECT DATA THAT ARE GE OUR DATAROW*/
DATA HAVE_NEW;
SET HAVE_NEW(RENAME=(&NEWVAR));
WHERE ID >= &DATAROW;
LABEL &NEWLABEL;
RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How about .
proc import datafile='/folders/myfolders/have.xlsx' out=have1 dbms=xlsx replace;
run;
proc import datafile='/folders/myfolders/have.xlsx' out=want dbms=xlsx replace;
datarow=4;
run;
proc transpose data=have1(obs=1) out=temp;
var _all_;
run;
data _null_;
set temp end=last;
if _n_=1 then call execute('proc datasets library=work nolist nodetails;modify want;rename ');
call execute(catx('=',_name_,col1));
if last then call execute(';quit;');
run;