DATA Step, Macro, Functions and more

Excel 2007 Import - 'Moving' Header and Data

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Excel 2007 Import - 'Moving' Header and Data

I have an Excel sheet that is essentially a report from an outside system.  The header and the data starting points are inconsistent from month to month.  I was hoping to see if there was a way to 'automate' this PROC IMPORT without having to open the file and see where the data begins each month.

 

The data looks like this:

 

FILTERCHOICE(S)    
       
       
SiteXXXX(Site:1)    
 XXXX(Site:1)    
 XXXX(Site:1)    
Visit DateFrom 01/01/2016 To 06/30/2016  
IT MD_NO664    
  2330    
  2631    
  3611    
  3778    
  4871    
  6661    
  10766    
  12238    
  12440    
  13404    
  17124    
  19968    
  20966    
  21245    
  31640    
  56340    
  70465    
  72580    
  73643    
  74561    
  75833    
  76428    
  76894    
  86711    
  90069    
  90360    
  95850    
  98760    
       
FREQUENCY REPORT    
       
       
SiteIT MD_NOVisit DateQuestionVery Good %Very Good nTotal n
'Hospital''00001''2/1/16 2/29/16'Overall rating of care10011
'Total''00001''2/1/16 2/29/16'Overall rating of care10011
'Hospital''00001''4/1/16 4/30/16'Overall rating of care10011

 

I don't need any of the data prior to the row that contains SITE|IT MD_NO|Visit Date...

 

Because the word 'SITE' is also found at the top of the report where I don't need the data, I didn't think it would be usable.

 

Any assistance would be greatly appreciated.


Accepted Solutions
Solution
‎08-17-2016 12:15 PM
Contributor
Posts: 33

Re: Excel 2007 Import - 'Moving' Header and Data

[ Edited ]

This is a very good example why excel should not be used as data source. Due to the things above the relevant data, sas will most likely fail to guess the correct data type and you will end up with more than one proc/datastep to fix issues.

 

You can find the data:

  1. use proc import with getnames = no
  2. use a data step to find the row containing "Site", "IT MD_NO" (add more checks if necessary).
  3. find the last line of data
  4. with the information from 2 and 3 you know the range to import
  5. use proc import with getnames=yes and range (you need to know the name of the sheet).

Hardly tested code:

proc import
      datafile="yourfile.xlsx"
      dbms=xlsx
      out=work.crap
      replace
   ;
   getnames=no;
run;

data _null_;
   set work.crap;
   length rangeStart rangeEnd 8 range $ 20;
   retain range:;
   
   if A = "Site" and B = "IT MD_NO" /* ...*/ then do;
      rangeStart = _n_;
   end;

   if not missing(rangeStart) and cmiss(A, B /* ... */) = 2 then do;
      rangeEnd = _n_ - 1;
      range = cats("Sheet1$A", rangeStart, ":G", rangeEnd);
      call symputx("range", range);
      put range=;
      stop;
   end;
run;


proc import 
      datafile="yourfile.xlsx"
      dbms=xlsx
      out=work.cool
      replace
   ;
   getnames = yes;
   range= "&range";
run;

 

Edit: One thing to add: you need option validvarname set to any.

View solution in original post


All Replies
Super User
Super User
Posts: 7,392

Re: Excel 2007 Import - 'Moving' Header and Data

Not really, you have hit onoe of the main issues with using Excel.  Excel is unstructured and hence a very bad choice for data transfer. I would go back to the source and ask them to provide a useable data in a proper data transfer file - CSV, XML etc. 

Otherwise, you have three options:

1) Manually copy out the data you want, and save it as a decent file format.

2) Use Excel VBA to derive your data and save it to a decent file format

3) Import as is and then post process what is imported in a datastep

Solution
‎08-17-2016 12:15 PM
Contributor
Posts: 33

Re: Excel 2007 Import - 'Moving' Header and Data

[ Edited ]

This is a very good example why excel should not be used as data source. Due to the things above the relevant data, sas will most likely fail to guess the correct data type and you will end up with more than one proc/datastep to fix issues.

 

You can find the data:

  1. use proc import with getnames = no
  2. use a data step to find the row containing "Site", "IT MD_NO" (add more checks if necessary).
  3. find the last line of data
  4. with the information from 2 and 3 you know the range to import
  5. use proc import with getnames=yes and range (you need to know the name of the sheet).

Hardly tested code:

proc import
      datafile="yourfile.xlsx"
      dbms=xlsx
      out=work.crap
      replace
   ;
   getnames=no;
run;

data _null_;
   set work.crap;
   length rangeStart rangeEnd 8 range $ 20;
   retain range:;
   
   if A = "Site" and B = "IT MD_NO" /* ...*/ then do;
      rangeStart = _n_;
   end;

   if not missing(rangeStart) and cmiss(A, B /* ... */) = 2 then do;
      rangeEnd = _n_ - 1;
      range = cats("Sheet1$A", rangeStart, ":G", rangeEnd);
      call symputx("range", range);
      put range=;
      stop;
   end;
run;


proc import 
      datafile="yourfile.xlsx"
      dbms=xlsx
      out=work.cool
      replace
   ;
   getnames = yes;
   range= "&range";
run;

 

Edit: One thing to add: you need option validvarname set to any.

Occasional Contributor
Posts: 8

Re: Excel 2007 Import - 'Moving' Header and Data

Thank you both for the responses.  I completely agree that Excel is a crappy data soure and is my LEAST favorite vehicle for receiving data from clients.  However, sometimes the client just can't get the data any other way and we have to do some sort of manipulation.  Most of the tips and tricks I've learned using SAS are from loading and manipulating data like this.

 

Thank you again for your quick and easily consumable responses!

 

Brian

Valued Guide
Posts: 505

Re: Excel 2007 Import - 'Moving' Header and Data

Hi Team

 

  It is fruitless to complain about excel, SAS is a knat on the dehind of excel. The number of excel users dwarfs SAS.

 

  Here is a solution that scans the entire excel sheet (all character columns) for a string.

 

 This was posted a long time ago in SAS-L. Does not exactly answer the question, but may be usefull. I suspect you can replace the libname with 'proc import'. There are aslo mant R and Python solutuions.

 

It does require your excel configuarion to use row column addressing, which I find is easier to program.

 

Finding the excel cell reference that contains a string (ie R10C1)

works best in the old dm text editor?
This editor has the nice property of eliminating (UE, EE, EG and SAS Studio options).

http://goo.gl/kJVveA

Will post R solution later (RDCOM?)

HAVE Excel sheet

Up to 40 obs from xls.class$ total obs=19

Obs    NAME       SEX    AGE    HEIGHT    WEIGHT

  1    Alfred      M      14     69.0      112.5
  2    Alice       F      13     56.5       84.0
  3    Barbara     F      13     65.3       98.0
  4    Carol       F      14     62.8      102.5
  5    Henry       M      14     63.5      102.5
  6    James       M      12     57.3       83.0
  7    Jane        F      12     59.8       84.5
  8    Janet       F      15     62.5      112.5
  9    Jeffrey     M      13     62.5       84.0
 10    John        M      12     59.0       99.5

WANT the cell reference for John from excel
without loading the entire workbook into SAS

Variable  Value    Excel Cell
Name       John    R10C1

SOLUTION

*create an excel sheet;

libname xls "d:/xls/class.xls";
data xls.class;
  set sashelp.class;
;run;quit;
libname xls clear;

* Need position (column number) in PDV for
all char variables;mm

%put &=pdvchr;

/*
pdvchr= 1,2
*/

* it is a shame that we cannot use some form
of column number ro reference variables.
Note: R can do this and it is a major strength;

data celref;
  set xls.class;
  array chrvars[*] _character_;
  array chrpos[2] _temporary_ (&pdvchr);
  do varnums=1 to dim(chrvars);
     if chrvars[varnums]='John' then do;
        xlscel=cats('R',put(_n_,8.),'C',put(chrpos[varnums],8.));
        nam=vname(chrvars[varnums]);
        put nam chrvars[varnums] xlscel;
        stop;
     end;
  end;
run;quit;

Name John R10C1



Super User
Posts: 10,466

Re: Excel 2007 Import - 'Moving' Header and Data

I'm not going to attempt code but if the file data of interest is in the same format then I would be tempted to save to CSV. Then I can write code that will find the line where IT MD_NO is the second "word" or even the "visit date".

An input @ and Scan(_Infile_) likely to be helpful within a Do Until loop.

I will also say that I wouldn't be terrible fond of "visit date" (name singular) containing what appears to be a range but that's easy to pull apart.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 342 views
  • 4 likes
  • 5 in conversation