SAS Programming

Norman21 · Posted 06-22-2016 12:51 PM

Does anyone know of a quick way to import data from an Excel file that contains rows of merged cells? Something like this:

Ideally, I would like the data in the merged cells to become a variable in the SAS dataset. And if possible, I would like to avoid SQL or macros.

Thanks in advance!

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Reeza · Posted 06-22-2016 02:14 PM

There isn't very much control when importing from Excel.

I'd do a default import and then fix it in SAS.

If this is a one time job, I'd fix it in Excel.

rogerjdeangelis · Posted 06-23-2016 11:33 AM

HAVE AN EXCEL SHEET and I need to input merged headers and merged data rows

* I think you need full SAS for this because
  only 'proc export' works in the reduced SAS enviroment.

This solution reads merged headers and merged row cells.
Merged headers within the data portion  that do not start in column 1
are an issue. You need to know the starting cell of
the interior merged header and read the ranges above and below
the interior merged header. You may be able to use
DBSASTYPE to change a interior column type to a long character and
avoid multiple range reads.

HAVE

* create a sheet with two merged headers and one merged row og data;
%utlfkil(d:/xls/mrgcls.xlsx); * delete if exists;
ods excel file="d:/xls/mrgcls.xlsx" ;
ods excel options(sheet_name="mrgcls" sheet_interval="none");
proc report data=sashelp.class missing;
COLUMN ("This is a merged cell"  NAME AGE ("Lab values" HEIGHT WEIGHT));
define name   / group ;
break before name / skip;
compute before name ;
 txt='MERGED';
 if name='Carol' then len=10;
 else len=0;
 line txt $varying. len;
endcomp;
run;quit;
ods excel close;

* merged cells are tricky especially if they occur randomly;
* however if you know where they are you can extract them.

/*HAVE
            This is a merged cell
                            Lab values
  NAME            AGE     HEIGHT     WEIGHT
  Alfred           14         69      112.5
  Alice            13       56.5         84
  Barbara          13       65.3         98
                  MERGED
  Carol            14       62.8      102.5
  Henry            14       63.5      102.5
  James            12       57.3         83
  Jane             12       59.8       84.5
  Janet            15       62.5      112.5
*/


WANT

HEADER
GET Merged Headers
Up to 40 obs WORK.HEADER total obs=2

Obs             F1              F2        F3        F4

 1     This is a merged cell
 2
                                    Lab values
...

DATA
Get Merged data
Up to 40 obs WORK.DATA total obs=20

Obs    NAME       AGE    HEIGHT    WEIGHT

  1    Alfred      14     69.0      112.5
  2    Alice       13     56.5       84.0
  3    Barbara     13     65.3       98.0
  4    MERGED       .       .          .
  5    Carol       14     62.8      102.5
  6    Henry       14     63.5      102.5
  7    James       12     57.3       83.0
  8    Jane        12     59.8       84.5
  9    Janet       15     62.5      112.5
...

SOLUTION

libname xls 'd:/xls/mrgcls.xlsx' header=no scantext=no;

data header;
  set xls.'mrgcls$A1:E2'n;
;run;quit;

libname close;

libname xls 'd:/xls/mrgcls.xlsx';

data data;
  set xls.'mrgcls$A3:E9999'n; /* over specify */
;run;quit;

libname xls clear;

Norman21 · Posted 07-23-2022 04:10 AM

Thanks for this, but it seems to be a very specific solution. What I need is a general solution that will work on a variety of Excel files.

Perhaps it cannot be done in SAS.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Patrick · Posted 07-23-2022 05:56 AM

SAS tables like any "normal" database table are rectangular AND each column has a data type (=it's columnar and not cell based).

There is almost always a way to read an Excel and if logically possible transform it into a rectangular structure but given "the mess" one can make in an Excel there won't be a generic solution for any possible structure.

IF you can define a pattern to code for then there might be ways. And IF you can define the pattern then please also attach some sample Excels with these patterns (many people won't open Excel attachments - but others do).

Norman21 · Posted 07-24-2022 04:12 AM

Good points.

If I save the mrgcls Excel file (created by the code above) in CSV format, then open it with Notepad++, the top of the file looks like this:

So could a new variable be created that flags if a row might contain merged cells, based on one or more missing data (the commas)? I could then verfity whether those rows actually contain merged cells, rather than having to look through the entire Excel file.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Tom · Posted 07-24-2022 11:15 AM

@Norman21 wrote:

Good points.

If I save the mrgcls Excel file (created by the code above) in CSV format, then open it with Notepad++, the top of the file looks like this:

So could a new variable be created that flags if a row might contain merged cells, based on one or more missing data (the commas)? I could then verfity whether those rows actually contain merged cells, rather than having to look through the entire Excel file.

Does that example show vertically merged or horizontally merged cells?

If counting the number of missing values on a line helps then you could use the CMISS() function. The CMISS() function will count the number of missing values in its arguments and it works with both numeric an character arguments. So you could do something like:

data want ;
  set have ;
  number_missing = cmiss(of _all_) - 1;
run;

The minus one is to account for the fact the NUMBER_MISSING variable will be included in the _ALL_ variable list and will be missing when the CMISS() function executes.

andreas_lds · Posted 07-25-2022 08:18 AM

What do you expect as result exactly?

Norman21 · Posted 07-25-2022 12:29 PM

If you refer to the first post of this thread, I think it would be useful to have a new file output with the first variable as Continent (the merged cells). Something like this:

Continent	Country	1981	1982	1983	1984
Australia	Australia	3.6	-	-
Australia	New Zealand	-	-	-
Europe	Austria	1.9	-	-
Europe	Belgium	3.4	3.7	3.7
Europe	Czech Republic	-	-	-
Europe	Denmark	2.7	2.9	3.1
Europe	Finland	-	-	3.9
Europe	France	3.9	4.1	4.2
...etc...

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Tom · Posted 07-25-2022 12:39 PM

For that particular example it is a trivial exercise. Something that is done all the time when reading in text files with similar type reports.

data want;
   length continent $15 ;
   set have;
   if n(of _numeric_)=0 then continent=country;
   else output;
   retain continent;
run;

jon11779 · Posted 07-25-2022 12:43 PM

But for some countries like New Zealand, the numeric vars are all missing as well.

Tom · Posted 07-25-2022 12:54 PM

@jon11779 wrote:

But for some countries like New Zealand, the numeric vars are all missing as well.

Then use a different test. For example you could just test if COUNTRY is any of the valid CONTINENT values.

Reeza · Posted 07-25-2022 12:58 PM

There are 6 continents. You could hard code it or use a reference data set from sashelp/sasmaps to map countries to continent in this particular case.

jon11779 · Posted 07-25-2022 01:13 PM

But for Australia, continent happens to be the same as country. Of course you can deal with it specifically, but my point is that it was not a universal solution.

Norman21 · Posted 07-25-2022 01:17 PM

Yes, this is a specific, trivial example. I'm looking for a generic solution.

Perhaps a "two pass" process, where the first pass lists the "likely" merged cells. These can then be edited as required, to feed into the second pass that produces a clean result.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

SAS Programming

Import data from Excel file that contains rows with merged cells

Re: Import data from Excel file that contains rows with merged cells

Re: Import data from Excel file that contains rows with merged cells

Re: Import data from Excel file that contains rows with merged cells

Re: Import data from Excel file that contains rows with merged cells

Re: Import data from Excel file that contains rows with merged cells

Re: Import data from Excel file that contains rows with merged cells

Re: Import data from Excel file that contains rows with merged cells

Re: Import data from Excel file that contains rows with merged cells

Re: Import data from Excel file that contains rows with merged cells

Re: Import data from Excel file that contains rows with merged cells

Re: Import data from Excel file that contains rows with merged cells

Re: Import data from Excel file that contains rows with merged cells

Re: Import data from Excel file that contains rows with merged cells

Re: Import data from Excel file that contains rows with merged cells

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...