Hello, I have complex xls daily data tables from several years for example from 2000 to 2019 in a file. Unfortunately, the data tables have text above and below the data. Also, the name row, start row and end row are not same. However the variables name are the same as well as the structure of the table. I try a macro program by specing the name and start rows, but since all the tables don't have the same name et start row, and regarding the text above, I am looking for a solution to import all the tables at once with the text removed (at least the one above). I'm asking because it is a lot of tables to work on it one by one.
Below a copy of different tables (I'll post as a response) and a copy of my macro for tables with the same start and name row, I tested with the help from the SAS communities. Need your help, please.
options mprint;
%macro import(T01);
FILENAME REFFILE "Z:\tune\BAD\FAC\T01\&T01..xls";
proc import datafile=reffile dbms=xls out=%sysfunc(compress(&T01. , "- ")) replace;
sheet="XTH";
namerow=7;
startrow=10;
*Endrow=934 (the end rows are not the same for each table;
getnames=yes;
run;
%mend import;
%import (CalT_glob_2018-01-15);
%import (CalT_glob_2019-03-20);
/*etc.*/
run;
Get it to work for ONE table before trying to make a macro to generalize it.
Here is a working example.
I created an XLS file with junk at top and bottom and a 3 column table in the middle.
So this code reads the first column. Checks it to find the start and stop column for the table. Then uses that to build the RANGE to give to PROC IMPORT.
%let path=c:\downloads;
%let fname=spanrows.xls;
proc import datafile="&path/&fname" dbms=xls out=raw replace;
getnames=no;
startcol='A';
endcol='A';
run;
data _null_;
set raw;
if upcase(strip(A))='NAME' then call symputx('startrow',_n_);
if left(A)=:'**' then call symputx('endrow',_n_-1);
run;
proc import datafile="&path/&fname" dbms=xls out=want replace;
getnames=yes;
range="$A&startrow:C&endrow";
run;
Results
Obs NAME DAY VALUE 1 ABC 1 34 2 2 56 3 3 78 4 4 101 5 ABC 1 34 6 2 56 7 3 78 8 4 101
Finding a method to carry the NAME values from the merged cells forward I will leave as an exercise.
Here an example of the different tables (It is not the names neither the real data because I’m not allowed to)
What would you want a SAS dataset of one of your tables to look like? For example your tables have subtotal and total rows. Typically a SAS dataset is just detail rows from which a tabular report can be created with subtotals and total rows calculated on the fly.
I would like to have a normal table at the end. I prepar the code to delete the sub total and total, and also code to file the void on variables name and date. I just ned to find a way to import all my tables at once because it is a lot (daily table on more than 7 years)
data want;
set Work.have;
if Day="" then Day='12/01/2022';
if Day="Sous Total" then delete;
else Day='12/01/2022';
NDay=input(Day,MMDDYY10.);
format NDay MMDDYY10.;
Nhour=input(translate(hour,':','H'),time15.);
format NHour tod5.;
retain File;
if not missing(Name) then File=Name;
else Name=File;
drop File Day hour;
run;
Try something like this to determine the start and end of the table part of the sheet.
Read in the file without trying to generate variable names.
Then check the first column to see where the row header is and where the footnote is.
%macro import(T01);
FILENAME REFFILE "Z:\tune\BAD\FAC\T01\&T01..xls";
proc import datafile=reffile out=raw replace dbms=XLS;
sheet="XTH";
startcol=1;
endcol=1;
getnames=no;
run;
data _null_;
set raw;
if 'NAME'=upcase(strip(A)) then call symputx('startrow',_n_);
if A ne ' ' and left(A)=:'**' then call symputx('endrow',cats('endrow=',_n_));
run;
proc import datafile=reffile dbms=xls out=%sysfunc(compress(&T01. , "- ")) replace;
sheet="XTH";
startrow=&startrow;
&endrow;
getnames=yes;
run;
%mend import;
Here are the statements that work with XLS engine.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/acpcref/n0msy4hy1so0ren1acm90iijxn8j.htm
DBMS= Identifier |
Option |
Valid Value |
Default Value |
PROC IMPORT |
PROC EXPORT |
---|---|---|---|---|---|
XLS |
ENDCOL |
Last column for data |
Last column that contains data |
Yes |
No |
ENDNAMEROW |
Last row for variable names |
Same as NAMEROW |
Yes |
No |
|
ENDROW |
Last row for data |
Last row that contains data |
Yes |
No |
|
GETNAMES |
Yes | No |
Yes |
Yes |
No |
|
NAMEROW |
First row for variable names |
First row that contains variable names |
Yes |
No |
|
NEWFILE |
Yes | No |
No |
No |
Yes |
|
PUTNAMES |
Yes | No |
Yes |
No |
Yes |
|
RANGE |
name | sheet$ul:lr |
First row |
Yes |
No |
|
SHEET |
Sheet name |
First sheet |
Yes |
Yes |
|
STARTCOL |
First column for data |
Last column that contains data |
Yes |
No |
|
STARTROW |
First row for data |
First row that contains data |
Yes |
No |
Thanks Tom. I still get the table with the text below and above
Log result: 1 %macro import(T01); 2 FILENAME REFFILE "filepath\&T01..xls"; 3 proc import datafile=reffile out=raw replace dbms=XLS; 4 sheet="XTH"; 5 getnames=no; 6 run; 7 data _null_; 8 set raw; 9 if 'NAME'=upcase(strip(A)) then call symputx('startrow',_n_); 10 if A ne ' ' and left(A)=:'**' then call symputx('endrow',cats('endrow=',_n_)); 11 run; 12 proc import datafile=reffile dbms=xls out=%sysfunc(compress(&T01. , "- ")) replace; 13 sheet="XTH"; 14 namerow=&startrow; 15 &endrow; 16 getnames=yes; 17 run; 18 19 %mend import; 20 21 %import (file1name); NOTE: The import data set has 34 observations and 12 variables. NOTE: WORK.RAW data set was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 0.09 seconds cpu time 0.01 seconds NOTE: There were 34 observations read from the data set WORK.RAW. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds NOTE: Variable Name Change. Ré -> R_ NOTE: Variable Name Change. Ti Ré -> Ti_R_ NOTE: Variable Name Change. Ti max -> Ti_max NOTE: The import data set has 33 observations and 12 variables. NOTE: WORK.table1name data set was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 0.20 seconds cpu time 0.00 seconds 22 %import (file2name); NOTE: The import data set has 31 observations and 9 variables. NOTE: WORK.RAW data set was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 0.01 seconds cpu time 0.00 seconds NOTE: There were 31 observations read from the data set WORK.RAW. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds NOTE: Variable Name Change. Ré -> _R_ NOTE: Variable Name Change. Ti Ré -> Ti_R_ NOTE: Variable Name Change. Ti max -> Ti_max NOTE: The import data set has 30 observations and 9 variables. NOTE: WORK.table2name data set was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 0.21 seconds cpu time 0.03 seconds 23 %import (file3name) NOTE: The import data set has 47 observations and 14 variables. NOTE: WORK.RAW data set was successfully created. NOTE: PROCEDURE IMPORT used (Total process time): real time 0.01 seconds cpu time 0.00 seconds NOTE: There were 47 observations read from the data set WORK.RAW. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds WARNING: Apparent symbolic reference ENDROW not resolved. NOTE: Line generated by the invoked macro "IMPORT". 2 ' ' and left(A)=:'**' then call symputx('endrow',cats('endrow=',_n_)); run; proc import 2 ! datafile=reffile dbms=xls out=%sysfunc(compress(&T01. , "- ")) replace; sheet="XTH"; 2 ! namerow=&startrow; &endrow; getnames=yes; run; - 180 ERROR 180-322: Statement is not valid or it is used out of proper order. NOTE: The previous statement has been deleted. NOTE: The previous statement has been deleted. NOTE: The SAS System stopped processing this step because of errors. NOTE: PROCEDURE IMPORT used (Total process time): real time 0.02 seconds cpu time 0.01 seconds 24 /*etc.*/ 25 run;
Get it to work for ONE table before trying to make a macro to generalize it.
Here is a working example.
I created an XLS file with junk at top and bottom and a 3 column table in the middle.
So this code reads the first column. Checks it to find the start and stop column for the table. Then uses that to build the RANGE to give to PROC IMPORT.
%let path=c:\downloads;
%let fname=spanrows.xls;
proc import datafile="&path/&fname" dbms=xls out=raw replace;
getnames=no;
startcol='A';
endcol='A';
run;
data _null_;
set raw;
if upcase(strip(A))='NAME' then call symputx('startrow',_n_);
if left(A)=:'**' then call symputx('endrow',_n_-1);
run;
proc import datafile="&path/&fname" dbms=xls out=want replace;
getnames=yes;
range="$A&startrow:C&endrow";
run;
Results
Obs NAME DAY VALUE 1 ABC 1 34 2 2 56 3 3 78 4 4 101 5 ABC 1 34 6 2 56 7 3 78 8 4 101
Finding a method to carry the NAME values from the merged cells forward I will leave as an exercise.
I'll work on it, and see what I can do with your proposition. Thanks
This program works with one table. I'm trying the macro one and still on it.
I'll let you know if I find my way. Thanks @Tom
If the files are truly complex then SAVE the data as a text file.
Write a data step to read the data.
Import, especially with many files, can result in all sorts of garbage variables because of multiple line headers and row or column spanning cells. Different files that should have the same structure will end up with different lengths of variables and can well have variables change types from numeric to character. The data step, while it takes some time to write will be consistent and has many more tools to deal with problem cases.
With a data step you set the constant properties for the variables, such as type and length. Then to read different files you change the input file name and maybe the output data set name.
Thanks. How do I do that? I have multiples files with multiples tables with the same structure.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.