BookmarkSubscribeRSS Feed
Reeza
Super User

I think the integration with Microsoft Office usually refers to the SAS Product know as Add In for Microsoft Office. 

You can integrate, the answer is it's not easy, and if you expect it to be you'll be disappointed.  If you have different specific questions, such as renaming, changing character types those are more specific and you can probably find more efficient ways of doing things. But for such a generic question the answer is probably avoid Excel - which I think is impossible, personally.

data_null__
Jade | Level 19

I'm saying that PROC IMPORT is not suited to your problem of importing disparate EXCEL files.  Using PROC IMPORT you have to react to the result leaving a patchwork program to fix this issues caused by guessing.  You need a proactive approach, my example is one.

shailey
Obsidian | Level 7

Thanks data_null_; I will think on it more. I do still think that some basic data cleaning procedures are overly convoluted in SAS. I believe there might be good reasons for why it's set up this way, and it's just frustrating for intermediate_level users.

data_null__
Jade | Level 19

What data cleaning procedures?  How would SAS know what is clean?  Isn't that a job for people?

shailey
Obsidian | Level 7

Yes!! It's absolutely a job for people, and I want to do it, but I find it very difficult to do so in SAS. I don't need SAS to know what clean is, I want to tell it exactly what to do but there seems to be no efficient way to do so. Of course I can go into Excel and clean up the data by hand, change the variable names, etc. But that is error-prone and not reproducible.

n my ideal world, I would like to be able to process an excel file in SAS without touching it so that my procedure is totally explicit and documented. I was able to do this in R with no problems. With SAS by the time I'm done writing 80 renames and 80 formats and 80 informats and studying macros etc. etc. etc. I'm like "*$%^ I'll just rename the variables in Excel!"


I'm not saying that SAS has to be like R, or even that SAS is intended to import Excel files. Maybe the SAS community never has to deal with Excel files and the people who store data with them, and that's great! That doesn't change the fact that sometimes people need to import excel files into SAS. So my question is intended to ask people what their strategies are, not debate whether it should be done at all.

data_null__
Jade | Level 19

I sounds like you are still looking for a magic bullet what will fix a program that actually exist in the data source.  I don't think that exists in SAS or R or any language-flavor--of-the-day.

Another powerful tool that can be use to convert data types and "rename" is PROC TRANSPOSE.  If you transpose your variables to observations then you can write simpler programs to change the name or type of variables then transpose back to re-create the data with new names and or type.  Post an example of your data and I'll show you.

ballardw
Super User

Data transfer in any form should, at least in my opinion, start with communication between source and recipient of: 1) what will be transferred, 2) names/ descriptions of the data (variable names NOT but Content YES, I may pick my own name based on some standard) and 3) the actual content: numeric, date, character of up to XXX characters.

You could inherit  data previous to some date that needs fixing but at some point that agreement must be made OR you will fight with this forever. You may have to get management involved to discuss the costs of constant adjustments, timeliness or accuracy resulting in undocumented changes.

And Excel is a poor, not impossible, choice if PEOPLE are entering data because people do things differently, hide columns/rows, add summaries or notes not actual data to be interchanged as added rows or columns and then there are the Excel features that when entering some text it will do arithmetic or data conversions that not everyone catches (example: product code of 456-123 that mysteriously becomes 333 or 5-1 that comes May 1 of the current year).

Bill_in_Toronto
Obsidian | Level 7

My solution:

1. Import Excel worksheet with all columns as character by using the MIXED and GETNAMES options. For example:

PROC IMPORT DATAFILE='...'

OUT=rawdata_character

DBMS=EXCEL REPLACE;

SHEET="..."

MIXED=YES;

GETNAMES=NO;

RUN;

2.  Assuming that the row of column headers is within the guessingrows range this technique will force all columns to be character. The SAS variable names will be F1, F2, F3, ...

     Note that the formatted value in each cell will be imported. e.g., value=5.1324234, formatted cell=5.13, SAS value=5.13.

3. In data steps find the row containing the column headers (hard code it if it's always the same row), read the headers into SAS macro variables, and loop through the macro variables to create a SAS numeric variable for each column.  When reading the row of headers remove special characters in order to produce a valid SAS name. When converting character to numeric (e.g., numeric_name = INPUT(character_name,BEST.) check if the conversion was successful (if character_name NE '' and numeric_name = . then conversion failed) and output unsuccessful conversions to a data file for further investigation.

You can enhance the data steps to handle issues such as inconsistent column header names by using a crosswalk mapping of each name to a standard name.

Bill.

shailey
Obsidian | Level 7

Hi Bill, thanks for these thoughts. Can you show me an example of what you mean by reading the headers into a macro variable?

jakarman
Barite | Level 11

bypasses bypasses assumptions conversions, not seeing something that is a really an easier way

---->-- ja karman --<-----
ballardw
Super User

Beat the users into submission or have them pay for every time they change file layout/content is the only way to make this "easier".

BTW I have had some of the same issues with data exported from MS SQL Server in CSV format to be read by SQL server because the folks providing the data kept changing schema and such without referring to the documentation. They one time asked why we had a "programming" cost line on our billing for services. We sent them examples of 5 files generated over a period of 10 days that were supposed to have identical layout and content types that had changes in column orders and whether character variables were fully quote qualified or not or "numeric" fields with additional codes or text.

jwillis
Quartz | Level 8

I solved a similar problem by converting the EXCEL files to ACCESS database tables then reading the ACCESS tables with SAS.  It was not a pretty solution however the users wanted flexibility to do their own thing in their spreadsheets and I needed to have structured data.  Microsoft handled the conversion of the data from EXCEL to predefined and formatted ACCESS tables and SAS read the ACCESS table structures better than reading the spreadsheet columns.

SAS/ACCESS(R) 9.2 Interface to PC Files: Reference, Second Edition

jakarman
Barite | Level 11

shailey do not expect that sliver bullet. We (I suppose not only me) understand your frustration of all surprises of data delivery not according to some agreement causing issues.
That is what I am used for a very looong time and do not expect it that will be changed.   This was existent a long time before Excel even exist (flat files) becoming into troubles with csv with NLS and encodings (traumatized). Getting more challenging with RDBMS connections being involved with every one have his own standard (not a standard at all).  No, we don't discuss that data can be delivered by use of Excel, that is common business. I (we?) made the note the data should be delivered according to some agreement. When a failure is there it is the supplier of the data fault / responsibility to correct.   Just preventing for  becoming crazy of all issues.         

What are the issue to handle Excel data :

- With SAS you are needing to transform data into a RDBMS like approach. (4 GL). That requirement does not exist with every language (3 GL) getting data from Excel.
- http://www.r-bloggers.com/read-excel-files-from-r/ No with R the same challenge and same kind of approaches.

  May be in that experience of yours the data was delivered better according an agreement.    

- SAS/ACCESS(R) 9.4 Interface to PC Files: Reference, Third Edition (excel XLSX) you find that one already. It is the most easy and rather recent one.
this one is older but still not very often used: SAS/ACCESS(R) 9.4 Interface to PC Files: Reference, Third Edition with all options SAS/ACCESS(R) 9.4 Interface to PC Files: Reference, Third Edition

The sheet-names at Excel are the tables/datasets in SAS.

- http://www2.sas.com/proceedings/sugi31/024-31.pdf is having the explanation what is behind  (excel ranges can be usefull as mixed is yes -char only- and dblabel=yes avoiding weird long names-)    

The most advanced interface  using SAS is AMO SAS(R) Add-In 7.1 for Microsoft Office: Getting Started in Microsoft Excel, Microsoft Word, and Micr...

It avoids the user problems resolving by another person as the user is getting direct access to SAS with Excel. They have to solve the data-entry theirself 

---->-- ja karman --<-----