- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For the purpose of having control over how my data is imported (specifying formats and data types), I have typically used the following form:
data DSN;
%let _EFIERR_=0;
infile "&inputpath.\&filename..csv" delimiter=',' MISSOVER DSD lrecl=32767 firstobs=2;
informat &infmtstatement.;
format &fmtstatement.;
input &inputstatement.;
label &labelstatement.;
if _ERROR_ then call symputx('_EFIERR_',1);
run;
Recently, I have started work at a new company that strictly uses PROC IMPORT (see below). This has worked great for me until recently. I have noticed that sometimes the data is imported in different variable types and formats than I want.
PROC IMPORT OUT=DSN DATAFILE= "&inputpath\&filename..xlsx" DBMS=EXCELCS REPLACE;
RANGE="Check 2 Visit Windows$";
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
My old method works great for importing CSV files when I want to specify data types and formats. My new method works great if I want to read in XLSX files with multiple worksheets. Is there a way I can combine the two methods such that I can read in XLSX files with multiple worksheets and still specify the variable types and formats?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Only if you use the EXCELCS database engine. With that database engine you can use the DBDSOPTS setting to pass in settings for the DBTYPE option.
DBDSOPTS= "DBTYPE=(BilledTo='CHAR(8)')";
BUT to use that database engie either your SAS is running on Windows and you have a compatible version of Excel that it can call or you are using a SAS PCFILES server that is running compatilble version of Excel.
Normally it is not a big issue with XLSX files since SAS can read the metadata on the cells an compute the data type. Unlike a CSV file where the only available metadata is the column heading. But if you are trying to read multiple fies that SHOULD have similar structure you might still get some characters variables that have different lengths defined based on the data that is in the sheet. In that case you might want to add a step after your PROC IMPORT to impose your standard structure. It could look similar to your data step code you posted. I would add step to REMOVE formats that PROC IMPORT might have attached to variables and then just add back on the formats for the variables, like dates and times, that need them.
PROC IMPORT OUT=DSN_temp DATAFILE= "&inputpath\&filename..xlsx" DBMS=XLSX REPLACE;
RANGE="Check 2 Visit Windows$";
RUN;
%let varlist=Id 8 Name $50 Age 8 DOB 8 ;
%let formatlist= dob date9.;
data dsn;
length &varlist ;
set dsn_temp;
format _all_;
format &formatlist;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
At this web page, scroll down to the DBDSOPTS option
An example is given where a column is forced to be char(8), or character variable with length 8.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Only if you use the EXCELCS database engine. With that database engine you can use the DBDSOPTS setting to pass in settings for the DBTYPE option.
DBDSOPTS= "DBTYPE=(BilledTo='CHAR(8)')";
BUT to use that database engie either your SAS is running on Windows and you have a compatible version of Excel that it can call or you are using a SAS PCFILES server that is running compatilble version of Excel.
Normally it is not a big issue with XLSX files since SAS can read the metadata on the cells an compute the data type. Unlike a CSV file where the only available metadata is the column heading. But if you are trying to read multiple fies that SHOULD have similar structure you might still get some characters variables that have different lengths defined based on the data that is in the sheet. In that case you might want to add a step after your PROC IMPORT to impose your standard structure. It could look similar to your data step code you posted. I would add step to REMOVE formats that PROC IMPORT might have attached to variables and then just add back on the formats for the variables, like dates and times, that need them.
PROC IMPORT OUT=DSN_temp DATAFILE= "&inputpath\&filename..xlsx" DBMS=XLSX REPLACE;
RANGE="Check 2 Visit Windows$";
RUN;
%let varlist=Id 8 Name $50 Age 8 DOB 8 ;
%let formatlist= dob date9.;
data dsn;
length &varlist ;
set dsn_temp;
format _all_;
format &formatlist;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Tom wrote:
Only if you use the EXCELCS database engine.
The page I linked to disagrees with you. It says it works with EXCEL database engine, and my own personal experience confirms that it works with EXCEL.
Paige Miller