- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Does anyone know of a quick way to import data from an Excel file that contains rows of merged cells? Something like this:
Ideally, I would like the data in the merged cells to become a variable in the SAS dataset. And if possible, I would like to avoid SQL or macros.
Thanks in advance!
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There isn't very much control when importing from Excel.
I'd do a default import and then fix it in SAS.
If this is a one time job, I'd fix it in Excel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
HAVE AN EXCEL SHEET and I need to input merged headers and merged data rows
* I think you need full SAS for this because
only 'proc export' works in the reduced SAS enviroment.
This solution reads merged headers and merged row cells.
Merged headers within the data portion that do not start in column 1
are an issue. You need to know the starting cell of
the interior merged header and read the ranges above and below
the interior merged header. You may be able to use
DBSASTYPE to change a interior column type to a long character and
avoid multiple range reads.
HAVE
* create a sheet with two merged headers and one merged row og data;
%utlfkil(d:/xls/mrgcls.xlsx); * delete if exists;
ods excel file="d:/xls/mrgcls.xlsx" ;
ods excel options(sheet_name="mrgcls" sheet_interval="none");
proc report data=sashelp.class missing;
COLUMN ("This is a merged cell" NAME AGE ("Lab values" HEIGHT WEIGHT));
define name / group ;
break before name / skip;
compute before name ;
txt='MERGED';
if name='Carol' then len=10;
else len=0;
line txt $varying. len;
endcomp;
run;quit;
ods excel close;
* merged cells are tricky especially if they occur randomly;
* however if you know where they are you can extract them.
/*HAVE
This is a merged cell
Lab values
NAME AGE HEIGHT WEIGHT
Alfred 14 69 112.5
Alice 13 56.5 84
Barbara 13 65.3 98
MERGED
Carol 14 62.8 102.5
Henry 14 63.5 102.5
James 12 57.3 83
Jane 12 59.8 84.5
Janet 15 62.5 112.5
*/
WANT
HEADER
GET Merged Headers
Up to 40 obs WORK.HEADER total obs=2
Obs F1 F2 F3 F4
1 This is a merged cell
2
Lab values
...
DATA
Get Merged data
Up to 40 obs WORK.DATA total obs=20
Obs NAME AGE HEIGHT WEIGHT
1 Alfred 14 69.0 112.5
2 Alice 13 56.5 84.0
3 Barbara 13 65.3 98.0
4 MERGED . . .
5 Carol 14 62.8 102.5
6 Henry 14 63.5 102.5
7 James 12 57.3 83.0
8 Jane 12 59.8 84.5
9 Janet 15 62.5 112.5
...
SOLUTION
libname xls 'd:/xls/mrgcls.xlsx' header=no scantext=no;
data header;
set xls.'mrgcls$A1:E2'n;
;run;quit;
libname close;
libname xls 'd:/xls/mrgcls.xlsx';
data data;
set xls.'mrgcls$A3:E9999'n; /* over specify */
;run;quit;
libname xls clear;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for this, but it seems to be a very specific solution. What I need is a general solution that will work on a variety of Excel files.
Perhaps it cannot be done in SAS.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SAS tables like any "normal" database table are rectangular AND each column has a data type (=it's columnar and not cell based).
There is almost always a way to read an Excel and if logically possible transform it into a rectangular structure but given "the mess" one can make in an Excel there won't be a generic solution for any possible structure.
IF you can define a pattern to code for then there might be ways. And IF you can define the pattern then please also attach some sample Excels with these patterns (many people won't open Excel attachments - but others do).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Good points.
If I save the mrgcls Excel file (created by the code above) in CSV format, then open it with Notepad++, the top of the file looks like this:
So could a new variable be created that flags if a row might contain merged cells, based on one or more missing data (the commas)? I could then verfity whether those rows actually contain merged cells, rather than having to look through the entire Excel file.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Norman21 wrote:
Good points.
If I save the mrgcls Excel file (created by the code above) in CSV format, then open it with Notepad++, the top of the file looks like this:
So could a new variable be created that flags if a row might contain merged cells, based on one or more missing data (the commas)? I could then verfity whether those rows actually contain merged cells, rather than having to look through the entire Excel file.
Does that example show vertically merged or horizontally merged cells?
If counting the number of missing values on a line helps then you could use the CMISS() function. The CMISS() function will count the number of missing values in its arguments and it works with both numeric an character arguments. So you could do something like:
data want ;
set have ;
number_missing = cmiss(of _all_) - 1;
run;
The minus one is to account for the fact the NUMBER_MISSING variable will be included in the _ALL_ variable list and will be missing when the CMISS() function executes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What do you expect as result exactly?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you refer to the first post of this thread, I think it would be useful to have a new file output with the first variable as Continent (the merged cells). Something like this:
Continent | Country | 1981 | 1982 | 1983 | 1984 |
Australia | Australia | 3.6 | - | - | |
Australia | New Zealand | - | - | - | |
Europe | Austria | 1.9 | - | - | |
Europe | Belgium | 3.4 | 3.7 | 3.7 | |
Europe | Czech Republic | - | - | - | |
Europe | Denmark | 2.7 | 2.9 | 3.1 | |
Europe | Finland | - | - | 3.9 | |
Europe | France | 3.9 | 4.1 | 4.2 | |
...etc... |
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For that particular example it is a trivial exercise. Something that is done all the time when reading in text files with similar type reports.
data want;
length continent $15 ;
set have;
if n(of _numeric_)=0 then continent=country;
else output;
retain continent;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
But for some countries like New Zealand, the numeric vars are all missing as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@jon11779 wrote:
But for some countries like New Zealand, the numeric vars are all missing as well.
Then use a different test. For example you could just test if COUNTRY is any of the valid CONTINENT values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
But for Australia, continent happens to be the same as country. Of course you can deal with it specifically, but my point is that it was not a universal solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, this is a specific, trivial example. I'm looking for a generic solution.
Perhaps a "two pass" process, where the first pass lists the "likely" merged cells. These can then be edited as required, to feed into the second pass that produces a clean result.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation