Import data from Excel, while changing the format?

Reply
New Contributor
Posts: 4

Import data from Excel, while changing the format?

Hi Everybody,

Long time reader, first time poster!

I'm trying to load data from an excel file. I know what you're going to tell me: "Why don't you use .csv?" Well I can't because it's a whole mess of files that a colleague sends after it's exported out of a platform. I'm supposed to take these files, merge them into one dataset, as automatic as possible. So what do I do? I use proc import, get all the data and stack that mess into usable information. Until I ran into a snag. The thing is some columns in some of the files are empty; so when the data is read by the proc import then it is automatically turned into text.
The data looks very simple:

Date               Unit

mm/dd/yy       10000

And when it's:

Date    Unit

(blank) (blank)

Then I can't append the different datasets on the data step because of the different formats.

Is there any way specify a format using proc import?

Is there any way to detect the file is blank and it will just return zero and won't get merged with the rest?

Any help is greatly appreciated!

Thanks,

Frank

Grand Advisor
Posts: 10,210

Re: Import data from Excel, while changing the format?

How are you combining the data sets after being read into SAS?

Are these files supposedly in the same format?

Are the affected columns sort of random or generally the same ones?

Is there any way specify a format using proc import?

Don't believe so yet.

Is there any way to detect the file is blank and it will just return zero and won't get merged with the rest?

File or column, you mentioned what appeared to be specific columns above. The answer is likely to depend on how you are merging and response to the other two questions above.

There are several ways to examine the characteristics of SAS datasets and variables so the answer is likely to be yes. The issue may be how complex the solution is depending on the actual contents and needs.

Esteemed Advisor
Posts: 6,646

Re: Import data from Excel, while changing the format?

If the files are supposed to be of the same structure, or at least same-name columns always have the same attributes across the datasets, then you do best by taking the data steps created by proc import as a base and set the correct formats there.

For automating things, I would never use proc import, as it reacts to changes in the input data on its own and makes it harder to detect unwanted or unexpected changes in the input data.

Use data steps that throw messages/errors/warnings when something unexpected comes along.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: Import data from Excel, while changing the format?

I am going to ask it anyways, why not get the data in CSV.  Sounds to me like the main problem is the platform you are using.  However, if you have Excel files, there is nothing stopping *you* from creating the CSV files.  A simple VBA macro, open each file in a given directory and save as csv, you could even load each file up into Excel and then save that as a CSV.  Then you have some proper data you can work with and create proper import procedures for. 

However, I would still say, if my DB was giving me some rubbish documents which meant more work to use, then I would change the DB.

New Contributor
Posts: 4

Re: Import data from Excel, while changing the format?

Thank you everyone for your responses! Thank you ballardw, KurtBremser, and RW9!

KurtBremser said something really interesting, about using the data step to do the format corrections. So I multiplied by 1 the columns that were turned into string format. Once they were on the same format the error was eliminated; it allowed me to stack the files using the data step. (That's how I was merging them, to answer your question ballardw). I think I'll write the code so that every variable that comes in through proc import is turned into the same format. Since it's just those two columns then it won't be a problem.

I also pushed for the files to be sent in CSV. I got some friction back but who cares.

Thanks everyone! All answers were helpful, I wish I could give more!

Thanks again!

Frank

Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: Import data from Excel, while changing the format?

"I also pushed for the files to be sent in CSV. I got some friction back but who cares." - Trust me, one day you will.

As for your solution, I think you misunderstand the datastep inport principal.  Your problem arises because you are letting proc import guess what you want to do.  You are the one with the knowledge of the data, so you need to be the one telling the software what to do, not the other way round.  Now, it wasn't clear from your post whether you were importing the data via datastep e.g:

data imp;

     infile "xyz.csv";

     input a $ b d;

run;

If so then your input statement needs to be changed for the numeric variable so that it reads in as numeric, no need to convert it from character then.

If however,  as I suspect you are doing, something like this:

proc import datafile="xyz.xls" out=tmp;run;

data tmp;

     set tmp;    

     new_numeric=char_variable * 1;

run;

Then there are two things to remember.  Firstly, and most importantly, this may not solve the problem in every situation.  Excel is completely strcutureles (which is why its never recommended as data transfer/base), so what happens *when* they put characters ofr special characters in the data, your program will fail or it might work but you will end up with odd results - for example 1e3, does this refer to 1000 because in your example this would be missing.

Secondly, if you are going to do character to numeric (or vice versa) then I would recommend that you do this explicitly.  With that multiplication, behind the scenes SAS is converting the character to number, again, your letting the software guess for you.  Use a put() function to convert to numeric (and input for to char).

new_numeric=put(char_variable,best.);

Note you can specify the format, and hence provides more error trapping. 

New Contributor
Posts: 4

Re: Import data from Excel, while changing the format?

Thanks for your response RW9!

The way that I'm importing the data is with:

proc import out=data

datafile = "file path etc."

dbms=excel replace

getnames=yes;

mixed=no;

scantext=yes

usedate=yes;

scantime=yes;

sheet="Sheet Name";

run;

Since for now they users are sending various tabs, this code is a godsend.

What I'm planing to do is exactly what you said. Multiply by one to turn the character into number format. Once that's' done I can stack them like so:

data all;

set

data1

data2

;

run;

But you're right I'll use the put function, no chance for errors if I tell the software exactly the format right?

Is there a way to use the data step to import a specific sheet in an excel file?

Esteemed Advisor
Esteemed Advisor
Posts: 7,203

Re: Import data from Excel, while changing the format?

No, the point is to not use Excel.  That proc import code you have may seem wonderful, but I can guarantee after 1 or two runs you will be changing your code.  Excel's biggest selling point is that you can doing anything you want in any cell, this is exactly the reason why, anyone who deals with data will hate the use of it.  Several examples come to mind, changing column orders, multiple formats in cells, special characters, blank rows/headers.  In fact until recently you could get differing results importing the same file twice. 

Again I would strongly recommend changing the file format to CSV (which can also be read in Excel) and then writing a datastep import rather than a guessing procedure.

New Contributor
Posts: 3

Re: Import data from Excel, while changing the format?

We also have users making excel tables available to SAS and we push the responsibility to the end user to comply with the agreed format.

As such we have a stored process which the user can use to upload the excel (csv) and upon receipt we check the format (e.g. date is correct, correct delimiter....)

If OK, then we process it, if not then we return a html to the user telling him that the format is not according to the agreed format.

Very simple and effective to ensure that the data is correctly delivered.

Maybe not an answer to your question, but it might give another view on the problem.

New Contributor
Posts: 4

Re: Import data from Excel, while changing the format?

I AM going to use that thanks!

Ask a Question
Discussion stats
  • 9 replies
  • 1461 views
  • 7 likes
  • 5 in conversation