SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
Norman21
Lapis Lazuli | Level 10

Does anyone know of a quick way to import data from an Excel file that contains rows of merged cells? Something like this:

 

undefined

 

Ideally, I would like the data in the merged cells to become a variable in the SAS dataset. And if possible, I would like to avoid SQL or macros.

 

Thanks in advance!

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

14 REPLIES 14
Reeza
Super User

There isn't very much control when importing from Excel.

 

I'd do a default import and then fix it in SAS.

If this is a one time job, I'd fix it in Excel. 

rogerjdeangelis
Barite | Level 11
HAVE AN EXCEL SHEET and I need to input merged headers and merged data rows

* I think you need full SAS for this because
  only 'proc export' works in the reduced SAS enviroment.

This solution reads merged headers and merged row cells.
Merged headers within the data portion  that do not start in column 1
are an issue. You need to know the starting cell of
the interior merged header and read the ranges above and below
the interior merged header. You may be able to use
DBSASTYPE to change a interior column type to a long character and
avoid multiple range reads.

HAVE

* create a sheet with two merged headers and one merged row og data;
%utlfkil(d:/xls/mrgcls.xlsx); * delete if exists;
ods excel file="d:/xls/mrgcls.xlsx" ;
ods excel options(sheet_name="mrgcls" sheet_interval="none");
proc report data=sashelp.class missing;
COLUMN ("This is a merged cell"  NAME AGE ("Lab values" HEIGHT WEIGHT));
define name   / group ;
break before name / skip;
compute before name ;
 txt='MERGED';
 if name='Carol' then len=10;
 else len=0;
 line txt $varying. len;
endcomp;
run;quit;
ods excel close;

* merged cells are tricky especially if they occur randomly;
* however if you know where they are you can extract them.

/*HAVE
            This is a merged cell
                            Lab values
  NAME            AGE     HEIGHT     WEIGHT
  Alfred           14         69      112.5
  Alice            13       56.5         84
  Barbara          13       65.3         98
                  MERGED
  Carol            14       62.8      102.5
  Henry            14       63.5      102.5
  James            12       57.3         83
  Jane             12       59.8       84.5
  Janet            15       62.5      112.5
*/


WANT

HEADER
GET Merged Headers
Up to 40 obs WORK.HEADER total obs=2

Obs             F1              F2        F3        F4

 1     This is a merged cell
 2
                                    Lab values
...

DATA
Get Merged data
Up to 40 obs WORK.DATA total obs=20

Obs    NAME       AGE    HEIGHT    WEIGHT

  1    Alfred      14     69.0      112.5
  2    Alice       13     56.5       84.0
  3    Barbara     13     65.3       98.0
  4    MERGED       .       .          .
  5    Carol       14     62.8      102.5
  6    Henry       14     63.5      102.5
  7    James       12     57.3       83.0
  8    Jane        12     59.8       84.5
  9    Janet       15     62.5      112.5
...

SOLUTION

libname xls 'd:/xls/mrgcls.xlsx' header=no scantext=no;

data header;
  set xls.'mrgcls$A1:E2'n;
;run;quit;

libname close;

libname xls 'd:/xls/mrgcls.xlsx';

data data;
  set xls.'mrgcls$A3:E9999'n; /* over specify */
;run;quit;

libname xls clear;



 

Norman21
Lapis Lazuli | Level 10

Thanks for this, but it seems to be a very specific solution. What I need is a general solution that will work on a variety of Excel files.

 

Perhaps it cannot be done in SAS.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Patrick
Opal | Level 21

SAS tables like any "normal" database table are rectangular AND each column has a data type (=it's columnar and not cell based).

There is almost always a way to read an Excel and if logically possible transform it into a rectangular structure but given "the mess" one can make in an Excel there won't be a generic solution for any possible structure. 

IF you can define a pattern to code for then there might be ways. And IF you can define the pattern then please also attach some sample Excels with these patterns (many people won't open Excel attachments - but others do).

 

Norman21
Lapis Lazuli | Level 10

Good points.

 

If I save the mrgcls Excel file (created by the code above) in CSV format, then open it with Notepad++, the top of the file looks like this:

 

Norman21_0-1658650078667.png

So could a new variable be created that flags if a row might contain merged cells, based on one or more missing data (the commas)? I could then verfity whether those rows actually contain merged cells, rather than having to look through the entire Excel file.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Tom
Super User Tom
Super User

@Norman21 wrote:

Good points.

 

If I save the mrgcls Excel file (created by the code above) in CSV format, then open it with Notepad++, the top of the file looks like this:

 

Norman21_0-1658650078667.png

So could a new variable be created that flags if a row might contain merged cells, based on one or more missing data (the commas)? I could then verfity whether those rows actually contain merged cells, rather than having to look through the entire Excel file.


Does that example show vertically merged or horizontally merged cells?

 

If counting the number of missing values on a line helps then you could use the CMISS() function. The CMISS() function will count the number of missing values in its arguments and it works with both numeric an character arguments.  So you could do something like:

data want ;
  set have ;
  number_missing = cmiss(of _all_) - 1;
run;

The minus one is to account for the fact the NUMBER_MISSING variable will be included in the _ALL_ variable list and will be missing when the CMISS() function executes.

 

andreas_lds
Jade | Level 19

What do you expect as result exactly?

Norman21
Lapis Lazuli | Level 10

If you refer to the first post of this thread, I think it would be useful to have a new file output with the first variable as Continent (the merged cells). Something like this:

 

Continent Country 1981 1982 1983 1984
Australia Australia 3.6 - -  
Australia New Zealand - - -  
Europe Austria 1.9 - -  
Europe Belgium 3.4 3.7 3.7  
Europe Czech Republic - - -  
Europe Denmark 2.7 2.9 3.1  
Europe Finland - - 3.9  
Europe France 3.9 4.1 4.2  
...etc...          

 

 

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Tom
Super User Tom
Super User

For that particular example it is a trivial exercise.  Something that is done all the time when reading in text files with similar type reports.

 

data want;
   length continent $15 ;
   set have;
   if n(of _numeric_)=0 then continent=country;
   else output;
   retain continent;
run;
jon11779
Calcite | Level 5

But for some countries like New Zealand, the numeric vars are all missing as well.

Tom
Super User Tom
Super User

@jon11779 wrote:

But for some countries like New Zealand, the numeric vars are all missing as well.


Then use a different test.  For example you could just test if COUNTRY is any of the valid CONTINENT values.

Reeza
Super User
There are 6 continents. You could hard code it or use a reference data set from sashelp/sasmaps to map countries to continent in this particular case.
jon11779
Calcite | Level 5

But for Australia, continent happens to be the same as country. Of course you can deal with it specifically, but my point is that it was not a universal solution.

Norman21
Lapis Lazuli | Level 10

Yes, this is a specific, trivial example. I'm looking for a generic solution.

 

Perhaps a "two pass" process, where the first pass lists the "likely" merged cells. These can then be edited as required, to feed into the second pass that produces a clean result.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 8900 views
  • 2 likes
  • 7 in conversation