Help reading in Excel / csv problem

Accepted Solution Solved
Reply
Super Contributor
Posts: 418
Accepted Solution

Help reading in Excel / csv problem

I have an excel file that is 30,000 + rows. For some reason, when I try to import it through proc import, the data types are not correctly formatting (characters are being read in as numeric), even when I specify the "mixed=yes" and "guessingrows=32000" option.

Aka this fails.

proc import datafile="c:\mydrive\datafile.xlsx" out=testdata replace;

guessingrows=32000;

mixed=yes;

run;

This by itself is extremely frustrating, however it is what it is. What I would like to do instead is save the file as CSV and then read it in, however when I try this the "validvarname=ANY" option does not work. Column with names like P&I or Borrower SSN#  have their values translated to an _. If I use validvarname with proc import on excel this does not happen.

Does anyone know why mixed is not working, or why validvarname=any is not working with an import from CSV?

Thanks so much!

Brandon


Accepted Solutions
Solution
‎04-21-2014 11:04 PM
Respected Advisor
Posts: 4,930

Re: Help reading in Excel / csv problem

Posted in reply to Anotherdream

One simple, albeit imperfect, way to read your whole Excel table as characters if your data table has headers is to tell Excel that the headers are data. Consider this example using the attached workbook:

libname xl Excel "&sasforum.\Datasets\Mixed.xlsx" header=no mixed=yes;

proc print data=xl.DATA(firstobs=2) noobs; run;

libname xl clear;

You can then use a datastep to rename/convert your columns.

PG

PG

View solution in original post

Attachment

All Replies
Respected Advisor
Posts: 4,930

Re: Help reading in Excel / csv problem

Posted in reply to Anotherdream

About reading Excel files, Proc Import documentation states:

Interaction:The GUESSINGROWS statement is valid only for delimited files.

That does not include Excel files, but is valid for CSV files.

The type of a column in Excel is not determined by SAS, but by MS-Office. The problem is that it scans only 8 rows of data. You can force Excel to scan all rows by changing a Windows Registry key. See

SAS/ACCESS(R) 9.3 Interface to PC Files: Reference

I made the proposed changes and had much less trouble thereafter to import Excel data.

Conversely, MIXED=YES is for Excel files only, not for CSV.

PG

PG
Super User
Posts: 11,343

Re: Help reading in Excel / csv problem

Posted in reply to Anotherdream

There are VERY good reasons not to have & in variable names. Generally when importing I pick simple variable names if the original column headis has something obnoxious such as

 

"Use of Emergency Services in past six months"

I assign that to a label. EmServices is a lot easier to type in code that that long string.


Super User
Posts: 19,855

Re: Help reading in Excel / csv problem

Posted in reply to Anotherdream

Consider using the libname method with the DBSASTYPE option.

SAS/ACCESS(R) 9.2 Interface to PC Files: Reference, Second Edition

Super Contributor
Posts: 418

Re: Help reading in Excel / csv problem

Yeah I was hoping that the answer wouldn't be use the libname statement or change the registry... it seems strange to me that proc import basically guesses the incorrect type of variable, even if you specify mixed option.

On a related question, I can't save the files as CSV's because there are variables that have values such as 01 and values of 1. They are distinct and if I save the file as a csv, they both get converted to a value of 1 making them incorrect.

Does anyone have a solution to the saving as csv problem? (This is why I avoided saving it as a csv...) ..

Is the dbsastype a new option? It doesn't seem to work in my code as shown in the examples I've found.

Super User
Posts: 19,855

Re: Help reading in Excel / csv problem

Posted in reply to Anotherdream

Post your code if you're having issues with DBSASTYPE.

Excel isn't reading the saved CSV properly, if you open the CSV file with a text editor or import with SAS it will be correct.

Super Contributor
Posts: 418

Re: Help reading in Excel / csv problem

Hey Reeza, as always thank you for your help.  I believe you misread my statement (or it wasn't very clear. Likely the second!).

Anyways I have excel files (I got them in excel format) and in the files there are fields with leading zeros and without in the same column. Aka column "Pool" has a value of 01, and a value of 1 in two rows.  However the value of 1 and the value of 01 are distinctly different. If I open excel and save it as a csv, excel naturally strips the leading zeros... Because excel is a terrible format to move data in I imagine.

Is there a way to get around this besides manually coding every column as "text" (aka open and save the excel file after changing every column type to text?).  If this is the case, is there code anywhere that has automated this by chance? I'm not sure where i'd even begin with a task like that.

Thanks again!

Brandon

Super User
Posts: 19,855

Re: Help reading in Excel / csv problem

Posted in reply to Anotherdream

Perhaps your situation is different than mine.

I created an excel file with leading 0's and non leading zeros and saved it as CSV. Opening the CSV file in a TEXT EDITOR such as Notepad shows me that the leading zeroes are there.

If I open the CSV file in Excel again, Excel interprets the data on its own and strips the leading zeros. 

Solution
‎04-21-2014 11:04 PM
Respected Advisor
Posts: 4,930

Re: Help reading in Excel / csv problem

Posted in reply to Anotherdream

One simple, albeit imperfect, way to read your whole Excel table as characters if your data table has headers is to tell Excel that the headers are data. Consider this example using the attached workbook:

libname xl Excel "&sasforum.\Datasets\Mixed.xlsx" header=no mixed=yes;

proc print data=xl.DATA(firstobs=2) noobs; run;

libname xl clear;

You can then use a datastep to rename/convert your columns.

PG

PG
Attachment
Super Contributor
Posts: 418

Re: Help reading in Excel / csv problem

Thanks for your response again everyone.

Pg I went with your method and then dynamic naming, and that appeared to be the best way I could find to get it to work!

THank you

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 1201 views
  • 0 likes
  • 4 in conversation