02-03-2015 04:55 PM
I get a excel workbook with 15 tabs from another department in our company. The header rows for tabs 2-14 are linked to the header row in tab 1. This way, if a column name is changed, the person only has to change the 1st tab, which will then change the other tabs. When I import the 2nd tab, the resultant sas dataset column names are all "A". Is there a way for me to be able to import the tabs and still get the coluimn names? Thanks!
02-04-2015 03:58 AM
SAS should be automatically using the current value of the formula in a cell when it imports the file. Which means you should be getting names.
Does the excel file have macros or only compute on update on?
Can you post a sample workbook that doesn't work for you and the code you use to import it? If your info is confidential you can mock one up.
02-04-2015 04:45 AM
You are quite correct Reeza, Excel stores in the underlying XML both the formula and the current value returned by the formula:
-<row r="1" x14ac:dyDescent="0.25" spans="1:3">-<c r="A1" t="str"><f>Sheet1!A1</f><v>abc</v>
So it should read in fine (and does after a small test), so I assume something else is going on with the Spreadsheet (maybe older version?)
02-04-2015 05:31 AM
I will need to work on a mockup of the file. I apologize and should have stated that the file is an .xlsx file (Excel 2007). I finally learned yesterday how to use PROC IMPORT to import .xlsx file. Additionally, the eventual plan is that there will be 40+ of these files (each with 15 tabs). So to have to save each file as a .csv each time is something I prefer not to do. The request is for me to import every tab of each .xlsx workbook into one big dataset for analysis. Here is the code I use to import the file: Proc Import Datafile='/home/UKHeatmap 2015.xlsx' Out=agnostic DBMS=xlsx Replace; Sheet= 'Heat Map Channel Agnostic'; Run; I cannot figure out why the column headers from the other tabs come through.
02-04-2015 05:42 AM
Well, you don't need dbms=xlsx on that statement, it worked for me without it (from an xlsx). I would strongly advise that you go back to source and ask them to provide a proper data transfer. Excel is not a data transfer format, and you will just be building yourself trouble up for the future if you accept it. Me, unless there is an absolute, positively cannot do anything about scenario and management accept that each time this data is handled there will be time/money costs, then I would refuse Excel data. Even something simple like, down the line a number could be stored as text this can completely ruin your programming and give you hours of fun trying to figure out what has gone wrong, and with Excel, things will go wrong. Go back to the source and get the data from there.
02-04-2015 06:59 AM
I am operating SAS from UNIX and am told by SAS that I must use the XLSX engine. Once I did, then I was able to import. So yes, I do need the dbms=xlsx As for requesting the source for proper data transfer, I have done this. However, I am not the only person using this source and others' work are dependent on the xlsx format. It's a LARGE corporation so, yes it is difficult from a logistic standpoint, and not from cooperation perspective, I appreciate your feedback/suggestion and fully agree. HOWEVER, it is unrealistic for me to say I absolutely refuse to do this until you give me the file in X-format. I have to work with what I am given and not expend energy or time on things out of my control.
02-04-2015 08:39 AM
Another approach that I've had success with is the EXCEL or PCFILES libname engine. Sometimes you can work a two-step approach, where you use:
libname myexcel PCFILES 'path-to-my-xlsx/excel.xlsx';
and then PROC DATASETS to get the sheet names and column names.
Then feed that information into your PROC IMPORT and any post-process DATA step that you might need to munge the data columns into the proper names/types.
02-04-2015 09:12 AM
Chris - Thanks for the advice. Following your instructions, I ran this code: Libname myexcel PCFILES'/home//G2C/Heatmap 2015.xlsx'; I got this error message: ERROR: Incorrect syntax for this LIBNAME statement. This engine does not accept a physical name. ERROR: Error in the LIBNAME statement. Also, is there an example of the PROC DATASETS to get the sheet and column names? Thanks!
02-04-2015 09:30 AM
I missed the fact that you're on Unix. The PCFILES library engine requires the PC Files Server to be available on a Windows machine somewhere. On SAS for Windows, this is automatically invoked with no additional setup. On Unix, you have to specify the node name/port within the libname syntax in order to "delegate" the work of reading the Excel file to that PC Files Server machine. If that's not possible for you to set up, I apologize for leading you down a primrose path.
Here's an example macro that I've used in the past to collect all of the sheet names from an Excel "data set" using the PCFILES engine.
02-04-2015 09:45 AM
It doesn't sound like have an issue importing your sheets but only with the column names?
When I import the 2nd tab, the resultant sas dataset column names are all "A".
Run a proc contents on the data set imported and verify the column names. It's not possible to have variable names repeat so having all the variables named A isn't possible. Look at the column name vs the column labels.
proc contents data=imported; run;
02-04-2015 10:27 AM
Hi reeza, My mistake - I incorrectly typed what I meant. When I import the file, the column with formula (vlookup/link), rather than giving me the text for that row/observation, it gives me only "A". So for example, in the excel file column "L5CodeName" is a vlookup to another tab in the workbook. When I import the file into SAS, the data under column LSCodeName are all "A". Hope I made sense this time.