04-25-2017 11:27 AM
How can i import excel having .XLSX Extension using proc import where header start in 2 row and data start in 4 row.I know keywords namerow and startrow used in xls but same is not working in xlsx.i dont want to use range option because i have diffrent excel and want to make code generic.
04-25-2017 11:30 AM
Gosh, I hate Excel. I understand why the documentation for R strongly discourages the use of Excel for data import.
Anyway, to solve your problem, in Excel, delete rows 1 and 3, and then use SAS PROC IMPORT.
04-25-2017 11:34 AM
Thank you but i dont want to delete anythng manually . I need a keyword that makes any row as header and data row.For ex-->for xls extension in proc import we use namerow=2 ,datarow=4 to make 2 row as header and 4 row from where data starts.
In my case there are many files and data is jumbled up.so i need keyword that i can use.
04-25-2017 11:38 AM
Sorry to inform you that I am not aware of a simple solution that will do what you ask.
Perhaps the good news is that some expert here has a solution and I will learn the solution as well as you.
04-25-2017 02:45 PM
"- you can use import procedure with GETNAMES=NO, DATAROW=3, GUESSINGROWS= (default 20 - you may increase it depending on your file size) and after importing it use a data step or proc datasets to change the variables name."
So, to sum:
04-25-2017 03:03 PM
Perhaps you are right, Paige.
Tarun, if you find that this is not an easy solution and you would like some assistance I would be happy to assist! I have conducted similar work before and it would be a pleasure for me to help you out.
04-26-2017 09:32 AM
For your perusal. I am sure there may be some efficiencies you could add but I only took about 20 min coding time with it.
/*IMPORT YOUR DATA, WITH GETNAMES = NO AND MIXED=YES*/ PROC IMPORT OUT= WORK.HAVE DATAFILE= "C:\Users\USERNAME\Documents\have.xls" DBMS=EXCEL REPLACE; RANGE="Sheet1$"; GETNAMES=NO; MIXED=YES; SCANTEXT=YES; USEDATE=YES; SCANTIME=YES; RUN; /*SET YOUR MACRO VARIABLES*/ %LET NAMEROW = 2; %LET DATAROW = 4; /*INSERT THE ROWID*/ PROC SQL; CREATE TABLE HAVE_NEW AS SELECT MONOTONIC() AS ID, * FROM HAVE; QUIT; /*GET ALL OF THE COLUMN NAMES IN THE HAVE SET*/ PROC SQL; CREATE TABLE COLNAMES AS SELECT NAME FROM DICTIONARY.COLUMNS WHERE MEMNAME = 'HAVE_NEW' AND NAME NOT IN ('ID'); QUIT; /*SET THE LENGTH OF THE BLANK NEWNAME COLUMN*/ DATA COLNAMES; SET COLNAMES; LENGTH NEWNAME $ 80; FORMAT NEWNAME $8.; RUN; /*CREATE A MACRO THAT WE WILL USE TO UPDATE NEWNAME WITH THE SPECIFIED CRITERIA*/ %MACRO ROWNAMES(COLNAME); PROC SQL; UPDATE COLNAMES SET NEWNAME = (SELECT &COLNAME FROM HAVE_NEW WHERE ID = &NAMEROW) WHERE NAME = '%STR(&COLNAME)'; QUIT; %MEND; /*EXECUTE THE MACRO FOR EACH OF THE OLD NAMES*/ DATA COLNAMES; SET COLNAMES; CALL EXECUTE("%ROWNAMES("||TRIM(NAME)||")"); RUN; /*CREATE A MACRO VARIABLE WITH A LIST OF EACH OF THE OLD AND NEW COLUMN NAMES , WHICH WE WILL USE IN THE RENAME STATEMENT*/ PROC SQL; SELECT TRIM(NAME)||"="||TRIM(NEWNAME) INTO :NEWVAR SEPARATED BY " " FROM COLNAMES; QUIT; /*CREATE A MACRO VARIABLE WITH A LIST OF EACH OF THE NEW COLUMN NAMES , WHICH WE WILL USE IN THE LABEL STATEMENT*/ PROC SQL; SELECT TRIM(NEWNAME)||"="||TRIM(NEWNAME) INTO :NEWLABEL SEPARATED BY " " FROM COLNAMES; QUIT; /*CHANGE THE NAMES, AND THE LABELS, AND ONLY SELECT DATA THAT ARE GE OUR DATAROW*/ DATA HAVE_NEW; SET HAVE_NEW(RENAME=(&NEWVAR)); WHERE ID >= &DATAROW; LABEL &NEWLABEL; RUN;
04-26-2017 10:34 AM
How about .
proc import datafile='/folders/myfolders/have.xlsx' out=have1 dbms=xlsx replace; run; proc import datafile='/folders/myfolders/have.xlsx' out=want dbms=xlsx replace; datarow=4; run; proc transpose data=have1(obs=1) out=temp; var _all_; run; data _null_; set temp end=last; if _n_=1 then call execute('proc datasets library=work nolist nodetails;modify want;rename '); call execute(catx('=',_name_,col1)); if last then call execute(';quit;'); run;