BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Didi_b
Obsidian | Level 7

 Hello, I have complex xls daily data tables from several years for example from 2000 to 2019 in a file. Unfortunately, the data tables have text above and below the data. Also, the name row, start row and end row are not same. However the variables name are the same as well as the structure of the table. I try a macro program by specing the name and start rows, but since all the tables don't have the same name et start row, and regarding the text above, I am looking for a solution to import all the tables at once with the text removed (at least the one above). I'm asking because it is a lot of tables to work on it one by one.

 

Below a copy of different tables (I'll post as a response) and a copy of my macro for tables with the same start and name row, I tested with the help from the SAS communities. Need your help, please.

options mprint;
%macro import(T01);
FILENAME REFFILE "Z:\tune\BAD\FAC\T01\&T01..xls";

proc import datafile=reffile dbms=xls out=%sysfunc(compress(&T01. , "- ")) replace;
sheet="XTH";
namerow=7;
startrow=10;
*Endrow=934 (the end rows are not the same for each table;
getnames=yes;
run;

%mend import;

%import (CalT_glob_2018-01-15);
%import (CalT_glob_2019-03-20);
/*etc.*/
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Get it to work for ONE table before trying to make a macro to generalize it.

 

Here is a working example.

I created an XLS file with junk at top and bottom and a 3 column table in the middle.

So this code reads the first column.  Checks it to find the start and stop column for the table. Then uses that to build the RANGE to give to PROC IMPORT.

%let path=c:\downloads;
%let fname=spanrows.xls;

proc import datafile="&path/&fname" dbms=xls out=raw replace;
  getnames=no;
  startcol='A';
  endcol='A';
run;

data _null_;
  set raw;
  if upcase(strip(A))='NAME' then call symputx('startrow',_n_);
  if left(A)=:'**' then call symputx('endrow',_n_-1);
run;

proc import datafile="&path/&fname" dbms=xls out=want replace;
  getnames=yes;
  range="$A&startrow:C&endrow";
run;

Results

Obs    NAME                     DAY           VALUE

 1     ABC                        1              34
 2                                2              56
 3                                3              78
 4                                4             101
 5     ABC                        1              34
 6                                2              56
 7                                3              78
 8                                4             101

Finding a method to carry the NAME values from the merged cells forward I will leave as an exercise.

View solution in original post

10 REPLIES 10
Didi_b
Obsidian | Level 7

Here an example of the different tables (It is not the names neither the real data because I’m not allowed to)

 

SASKiwi
PROC Star

What would you want a SAS dataset of one of your tables to look like? For example your tables have subtotal and total rows. Typically a SAS dataset is just detail rows from which a tabular report can be created with subtotals and total rows calculated on the fly.

Didi_b
Obsidian | Level 7

I would like to have a normal table at the end. I prepar the code to delete the sub total and total, and also code to file the void on variables name and date. I just ned to find a way to import all my tables at once because it is a lot (daily table on more than 7 years)

data want; 
set Work.have;

if Day="" then Day='12/01/2022';
if Day="Sous Total" then delete;
else Day='12/01/2022';

NDay=input(Day,MMDDYY10.);
format NDay MMDDYY10.;

Nhour=input(translate(hour,':','H'),time15.);
format NHour tod5.;

retain File;
if not missing(Name) then File=Name;
else Name=File;
drop File Day hour;
run;
Tom
Super User Tom
Super User

Try something like this to determine the start and end of the table part of the sheet.

Read in the file without trying to generate variable names.

Then check the first column to see where the row header is and where the footnote is.

%macro import(T01);
FILENAME REFFILE "Z:\tune\BAD\FAC\T01\&T01..xls";
proc import datafile=reffile out=raw replace dbms=XLS;
  sheet="XTH";
  startcol=1;
  endcol=1;
  getnames=no;
run;
data _null_;
  set raw;
  if 'NAME'=upcase(strip(A)) then call symputx('startrow',_n_);
  if A ne ' ' and left(A)=:'**' then call symputx('endrow',cats('endrow=',_n_));
run;
proc import datafile=reffile dbms=xls out=%sysfunc(compress(&T01. , "- ")) replace;
sheet="XTH";
  startrow=&startrow;
  &endrow;
  getnames=yes;
run;

%mend import;

Here are the statements that work with XLS engine.

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/acpcref/n0msy4hy1so0ren1acm90iijxn8j.htm

DBMS= Identifier

Option

Valid Value

Default Value

PROC

IMPORT

PROC

EXPORT

XLS

ENDCOL

Last column for data

Last column that contains data

Yes

No

ENDNAMEROW

Last row for variable names

Same as NAMEROW

Yes

No

ENDROW

Last row for data

Last row that contains data

Yes

No

GETNAMES

Yes | No

Yes

Yes

No

NAMEROW

First row for variable names

First row that contains variable names

Yes

No

NEWFILE

Yes | No

No

No

Yes

PUTNAMES

Yes | No

Yes

No

Yes

RANGE

name | sheet$ul:lr

First row

Yes

No

SHEET

Sheet name

First sheet

Yes

Yes

STARTCOL

First column for data

Last column that contains data

Yes

No

STARTROW

First row for data

First row that contains data

Yes

No

Didi_b
Obsidian | Level 7

Thanks Tom. I still get the table with the text below and above 

Log result:


1    %macro import(T01);
2    FILENAME REFFILE "filepath\&T01..xls";
3    proc import datafile=reffile out=raw replace dbms=XLS;
4      sheet="XTH";
5      getnames=no;
6    run;
7    data _null_;
8      set raw;
9      if 'NAME'=upcase(strip(A)) then call symputx('startrow',_n_);
10     if A ne ' ' and left(A)=:'**' then call symputx('endrow',cats('endrow=',_n_));
11   run;
12   proc import datafile=reffile dbms=xls out=%sysfunc(compress(&T01. , "- ")) replace;
13   sheet="XTH";
14     namerow=&startrow;
15     &endrow;
16     getnames=yes;
17   run;
18
19   %mend import;
20
21   %import (file1name);

NOTE: The import data set has 34 observations and 12 variables.
NOTE: WORK.RAW data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.09 seconds
      cpu time            0.01 seconds



NOTE: There were 34 observations read from the data set WORK.RAW.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds



NOTE:    Variable Name Change.  Ré -> R_
NOTE:    Variable Name Change.  Ti Ré -> Ti_R_
NOTE:    Variable Name Change.  Ti max -> Ti_max
NOTE: The import data set has 33 observations and 12 variables.
NOTE: WORK.table1name data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.20 seconds
      cpu time            0.00 seconds


22   %import (file2name);

NOTE: The import data set has 31 observations and 9 variables.
NOTE: WORK.RAW data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds



NOTE: There were 31 observations read from the data set WORK.RAW.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds



NOTE:    Variable Name Change.  
Ré -> _R_
NOTE:    Variable Name Change.  Ti Ré -> Ti_R_
NOTE:    Variable Name Change.  Ti max -> Ti_max
NOTE: The import data set has 30 observations and 9 variables.
NOTE: WORK.table2name data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.21 seconds
      cpu time            0.03 seconds


23   %import (file3name)

NOTE: The import data set has 47 observations and 14 variables.
NOTE: WORK.RAW data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds



NOTE: There were 47 observations read from the data set WORK.RAW.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


WARNING: Apparent symbolic reference ENDROW not resolved.
NOTE: Line generated by the invoked macro "IMPORT".
2     ' ' and left(A)=:'**' then call symputx('endrow',cats('endrow=',_n_)); run; proc import
2  ! datafile=reffile dbms=xls out=%sysfunc(compress(&T01. , "- ")) replace; sheet="XTH";
2  ! namerow=&startrow;   &endrow;   getnames=yes; run;
                          -
                          180
ERROR 180-322: Statement is not valid or it is used out of proper order.

NOTE: The previous statement has been deleted.
NOTE: The previous statement has been deleted.


NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds

24   /*etc.*/
25   run;

 

Tom
Super User Tom
Super User

Get it to work for ONE table before trying to make a macro to generalize it.

 

Here is a working example.

I created an XLS file with junk at top and bottom and a 3 column table in the middle.

So this code reads the first column.  Checks it to find the start and stop column for the table. Then uses that to build the RANGE to give to PROC IMPORT.

%let path=c:\downloads;
%let fname=spanrows.xls;

proc import datafile="&path/&fname" dbms=xls out=raw replace;
  getnames=no;
  startcol='A';
  endcol='A';
run;

data _null_;
  set raw;
  if upcase(strip(A))='NAME' then call symputx('startrow',_n_);
  if left(A)=:'**' then call symputx('endrow',_n_-1);
run;

proc import datafile="&path/&fname" dbms=xls out=want replace;
  getnames=yes;
  range="$A&startrow:C&endrow";
run;

Results

Obs    NAME                     DAY           VALUE

 1     ABC                        1              34
 2                                2              56
 3                                3              78
 4                                4             101
 5     ABC                        1              34
 6                                2              56
 7                                3              78
 8                                4             101

Finding a method to carry the NAME values from the merged cells forward I will leave as an exercise.

Didi_b
Obsidian | Level 7

I'll work on it, and see what I can do with your proposition. Thanks 

Didi_b
Obsidian | Level 7

This program works with one table. I'm trying the macro one and still on it.

I'll let you know if I find my way. Thanks @Tom 

ballardw
Super User

If the files are truly complex then SAVE the data as a text file.

Write a data step to read the data.

 

Import, especially with many files, can result in all sorts of garbage variables because of multiple line headers and row or column spanning cells. Different files that should have the same structure will end up with different lengths of variables and can well have variables change types from numeric to character. The data step, while it takes some time to write will be consistent and has many more tools to deal with problem cases.

 

With a data step you set the constant properties for the variables, such as type and length. Then to read different files you change the input file name and maybe the output data set name.

Didi_b
Obsidian | Level 7

Thanks. How do I do that? I have multiples files with multiples tables with the same structure.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1228 views
  • 3 likes
  • 4 in conversation