SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How to handle null for column date from excel to load to dataset

Reply
N/A
Posts: 1

How to handle null for column date from excel to load to dataset

Hi everyone,

I am new with SAS DI. I want to extract the data from excel to sas dataset. I am using user written code to extract data from excel to sas dataset.

I create the table and set format DATETIME16. for column date. However, I face a problem for date column form excel null and it become error and fail to load the data.

Anyone can help me on this problem?..

Regards,

Noomie

Occasional Contributor
Posts: 18

Re: How to handle null for column date from excel to load to dataset

Can you please send a sample of a couple of obs and the code you have written for people to understand and give you a precise solution.

Thanks,

Naveen

Respected Advisor
Posts: 4,173

Re: How to handle null for column date from excel to load to dataset

You don't give us enough information so below is based on some assumptions. If I understand your problem then you're using Proc Import and in cases where all rows for the datetime column in your Excel source are missing you end up with a character variable. As to my knowledge there is no way to tell Proc Import to apply a specific variable type for a source column. It's based on scanning the source data - and if it's all missing then Proc Import defaults to a character variable.

Excel is just a bad source for data integration and should be avoided. But I know that it's sometimes not possible.

If the Excel source is rectangular then you could try and register it as metadata table object. I've done this in the past with mixed results. SAS(R) Data Integration Studio 4.4: User's Guide

Another way which I've already used is to first read the Excel source using Proc Import, then write the resulting table as text delimited file to disk and then define a External File metadata object for this csv. After that it's "out-of-the-box" ETL.

Such an approach adds some I/0 but it let's you much better control how you read the data for further processing.

In the past my thinking for such an approach was also that may be somewhere in the future I will be getting a "reasonable" extracts like a csv. So what I've done is to implement a separate DI job which only reads the Excel and writes it to disk as csv. The next job then reads this csv using a External File Metadata object. And if once in the future data is directly provided as csv one would only need to turn of the first conversion job.

Below a code example which uses such an approach. The csv gets written to a temporary file and no external file metadata object is used. You might decide to modify the code and split it up into 2 jobs using above approach.

/* create target table */
data target;
  attrib charvar length=$20 datetimevar length=8 informat=datetime. format=datetime16. numvar length=8;
  charvar='target';
  datetimevar=datetime();
  numvar=1;
run;

/* create table as it could look like after Proc Import with all values in datetime column missing */
data after_import_from_excel;
  attrib charvar length=$20 datetimevar length=$10 numvar length=8;
  charvar='source,test';
  call missing(datetimevar);
  numvar=2;
  output;
  output;
run;

/* create temporary file */
filename csvtmp temp;

/* write imported data to temporary file as csv */
ods csv file=csvtmp;

proc print data=after_import_from_excel noobs;
run;

ods csv close;

/* import csv with variable types as required for loading into target */
data source;
  attrib charvar length=$20 datetimevar length=8 informat=datetime. format=datetime16. numvar length=8;
  infile csvtmp dsd dlm="," firstobs=2 truncover;
  input charvar datetimevar numvar;

run;

/* release temporary file */
filename csvtmp clear;

/* load source into target */
proc append base=target data=source force nowarn;
run;

Super User
Posts: 5,434

Re: How to handle null for column date from excel to load to dataset

Totally agree with Patrick on usage of Excel filen in a data warehouse environment.

Having that said, I also think that doing a rond robin using temporary CSV-files seems un-necessary complicated, since it add some substantial user written code, espacially if there are several/many Excel-files to ne imported.

Even with it flaws, using meta data objects for Excel files should be seen as best practice. The main negative side of this is that you need to register a database server and a schema for each Excel file (as I recall).

Still, if we assume that PROC IMPORT is used in Naomis user written code, ther eshouls be ways to make this work (depending in the nature of the error - very little information here). Have you tried adjusting the MIXED option and the TypeGuessRows registry value?

Data never sleeps
Respected Advisor
Posts: 4,173

Re: How to handle null for column date from excel to load to dataset

at

Are you sure that there won't be a type mismatch during execution in cases where the metadata table expects a numeric variable (a SAS date) but the whole source column is Null?

I agree that conversion to cvs is overhead - it's not a lot of extra coding though and you can implement with a rather simple user transformation with path, workbook and sheet as prompt.

Ask a Question
Discussion stats
  • 4 replies
  • 1557 views
  • 3 likes
  • 4 in conversation