BookmarkSubscribeRSS Feed
Norman21
Lapis Lazuli | Level 10

Does anyone know of a quick way to import data from an Excel file that contains rows of merged cells? Something like this:

 

ExcelExample.PNG

 

Ideally, I would like the data in the merged cells to become a variable in the SAS dataset. And if possible, I would like to avoid SQL or macros.

 

Thanks in advance!

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

14 REPLIES 14
Reeza
Super User

There isn't very much control when importing from Excel.

 

I'd do a default import and then fix it in SAS.

If this is a one time job, I'd fix it in Excel. 

rogerjdeangelis
Barite | Level 11
HAVE AN EXCEL SHEET and I need to input merged headers and merged data rows

* I think you need full SAS for this because
  only 'proc export' works in the reduced SAS enviroment.

This solution reads merged headers and merged row cells.
Merged headers within the data portion  that do not start in column 1
are an issue. You need to know the starting cell of
the interior merged header and read the ranges above and below
the interior merged header. You may be able to use
DBSASTYPE to change a interior column type to a long character and
avoid multiple range reads.

HAVE

* create a sheet with two merged headers and one merged row og data;
%utlfkil(d:/xls/mrgcls.xlsx); * delete if exists;
ods excel file="d:/xls/mrgcls.xlsx" ;
ods excel options(sheet_name="mrgcls" sheet_interval="none");
proc report data=sashelp.class missing;
COLUMN ("This is a merged cell"  NAME AGE ("Lab values" HEIGHT WEIGHT));
define name   / group ;
break before name / skip;
compute before name ;
 txt='MERGED';
 if name='Carol' then len=10;
 else len=0;
 line txt $varying. len;
endcomp;
run;quit;
ods excel close;

* merged cells are tricky especially if they occur randomly;
* however if you know where they are you can extract them.

/*
HAVE This is a merged cell Lab values NAME AGE HEIGHT WEIGHT Alfred 14 69 112.5 Alice 13 56.5 84 Barbara 13 65.3 98 MERGED Carol 14 62.8 102.5 Henry 14 63.5 102.5 James 12 57.3 83 Jane 12 59.8 84.5 Janet 15 62.5 112.5 */ WANT HEADER
GET Merged Headers Up to 40 obs WORK.HEADER total obs=2 Obs F1 F2 F3 F4 1 This is a merged cell 2 Lab values ... DATA Get Merged data Up to 40 obs WORK.DATA total obs=20 Obs NAME AGE HEIGHT WEIGHT 1 Alfred 14 69.0 112.5 2 Alice 13 56.5 84.0 3 Barbara 13 65.3 98.0 4 MERGED . . . 5 Carol 14 62.8 102.5 6 Henry 14 63.5 102.5 7 James 12 57.3 83.0 8 Jane 12 59.8 84.5 9 Janet 15 62.5 112.5 ... SOLUTION libname xls 'd:/xls/mrgcls.xlsx' header=no scantext=no; data header; set xls.'mrgcls$A1:E2'n; ;run;quit; libname close; libname xls 'd:/xls/mrgcls.xlsx'; data data; set xls.'mrgcls$A3:E9999'n; /* over specify */ ;run;quit; libname xls clear;

 

Norman21
Lapis Lazuli | Level 10

Thanks for this, but it seems to be a very specific solution. What I need is a general solution that will work on a variety of Excel files.

 

Perhaps it cannot be done in SAS.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Patrick
Opal | Level 21

SAS tables like any "normal" database table are rectangular AND each column has a data type (=it's columnar and not cell based).

There is almost always a way to read an Excel and if logically possible transform it into a rectangular structure but given "the mess" one can make in an Excel there won't be a generic solution for any possible structure. 

IF you can define a pattern to code for then there might be ways. And IF you can define the pattern then please also attach some sample Excels with these patterns (many people won't open Excel attachments - but others do).

 

Norman21
Lapis Lazuli | Level 10

Good points.

 

If I save the mrgcls Excel file (created by the code above) in CSV format, then open it with Notepad++, the top of the file looks like this:

 

Norman21_0-1658650078667.png

So could a new variable be created that flags if a row might contain merged cells, based on one or more missing data (the commas)? I could then verfity whether those rows actually contain merged cells, rather than having to look through the entire Excel file.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Tom
Super User Tom
Super User

@Norman21 wrote:

Good points.

 

If I save the mrgcls Excel file (created by the code above) in CSV format, then open it with Notepad++, the top of the file looks like this:

 

Norman21_0-1658650078667.png

So could a new variable be created that flags if a row might contain merged cells, based on one or more missing data (the commas)? I could then verfity whether those rows actually contain merged cells, rather than having to look through the entire Excel file.


Does that example show vertically merged or horizontally merged cells?

 

If counting the number of missing values on a line helps then you could use the CMISS() function. The CMISS() function will count the number of missing values in its arguments and it works with both numeric an character arguments.  So you could do something like:

data want ;
  set have ;
  number_missing = cmiss(of _all_) - 1;
run;

The minus one is to account for the fact the NUMBER_MISSING variable will be included in the _ALL_ variable list and will be missing when the CMISS() function executes.

 

andreas_lds
Jade | Level 19

What do you expect as result exactly?

Norman21
Lapis Lazuli | Level 10

If you refer to the first post of this thread, I think it would be useful to have a new file output with the first variable as Continent (the merged cells). Something like this:

 

Continent Country 1981 1982 1983 1984
Australia Australia 3.6 - -  
Australia New Zealand - - -  
Europe Austria 1.9 - -  
Europe Belgium 3.4 3.7 3.7  
Europe Czech Republic - - -  
Europe Denmark 2.7 2.9 3.1  
Europe Finland - - 3.9  
Europe France 3.9 4.1 4.2  
...etc...          

 

 

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

Tom
Super User Tom
Super User

For that particular example it is a trivial exercise.  Something that is done all the time when reading in text files with similar type reports.

 

data want;
   length continent $15 ;
   set have;
   if n(of _numeric_)=0 then continent=country;
   else output;
   retain continent;
run;
jon11779
Calcite | Level 5

But for some countries like New Zealand, the numeric vars are all missing as well.

Tom
Super User Tom
Super User

@jon11779 wrote:

But for some countries like New Zealand, the numeric vars are all missing as well.


Then use a different test.  For example you could just test if COUNTRY is any of the valid CONTINENT values.

Reeza
Super User
There are 6 continents. You could hard code it or use a reference data set from sashelp/sasmaps to map countries to continent in this particular case.
jon11779
Calcite | Level 5

But for Australia, continent happens to be the same as country. Of course you can deal with it specifically, but my point is that it was not a universal solution.

Norman21
Lapis Lazuli | Level 10

Yes, this is a specific, trivial example. I'm looking for a generic solution.

 

Perhaps a "two pass" process, where the first pass lists the "likely" merged cells. These can then be edited as required, to feed into the second pass that produces a clean result.

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 5519 views
  • 2 likes
  • 7 in conversation