Hi all,
I have multiple Excel files saved in a folder, each containing only one sheet with randomly named sheet tabs. I need to find the first row that contains the column header "Job No." and import all the data starting from that column header for each file. There are 11 columns and random rows in each files.
Below is my code for processing a single Excel file to identify the row containing "Job No." However, I'm encountering an error on the line do i = 1 to dim(cols);
that says, "ERROR: A loop variable cannot be an array name or a character variable; It must be a scalar numeric."
Can someone help me correct the code and make it applicable for multiple files? Thanks!
%let folder = /mnt; %let filename = file.xlsx; proc import datafile="&folder./&filename." out=raw_data dbms=xlsx replace; getnames=no; run; data _null_; set raw_data; array cols _all_; if _n_ = 1 then do; found = 0; do i = 1 to dim(cols); if upcase(vname(cols[i])) = "JOB NO." then do; call symputx('header_row', _n_); found = 1; /* Set found flag */ leave; end; end; if found = 0 then do; put "ERROR: 'Job No.' not found in the first row."; call symputx('header_row', 0); end; end; run; data final_data; set raw_data(firstobs=&header_row); run; data final_data; set final_data; if _n_ = 1 then do; rename col1 = Job_No col2 = Requesting_Client col3 = Contact_Full_Name col4 = Start_time col5 = End_Time col6 = Language /* there are more columns */ ; end; run; proc print data=final_data; run;
In a case where there is no clear header row in the spreadsheet, PROC IMPORT will use the Excel column identifiers as variable names. This means you already have a character variable i in the dataset, which you can't use in the iterative DO. Use a variable which won't be automatically created, like _i.
Your task will be easier if you save the spreadsheet to a text file, where you can read (in a first step) row-by-row until you encounter the "header", from which you take the column names and row number where the data starts; you can then use these findings to dynamically create the code that actually creates the wanted dataset.
I would actually expect a majority of the Excel files to throw an error at the ARRAY statement as an array must have all members of the same type. So if the spreadsheet has at least one numeric and one character variable, regardless of number of actual variables then you will get an array error of:
ERROR: All variables in array list must be the same type, i.e., all numeric or character.
At which point the data step would stop.
Your code appears to be assuming that PROC IMPORT will name the variable COL1, COL2 , ...
Instead it will name the variable A,B,...
So if you have enough columns in the XLSX sheet there will already be a variable named I.
Are you saying you only want to scan the values of the first ROW in the speadsheet to determine the number of COLUMNS in the spreadsheet to skip?
To get the names and values of the first observation in a dataset you can use PROC TRANSPOSE with the OBS=1 dataset option. Make sure to use VAR _ALL_ so that transpose takes all of the variables and not just the numeric ones.
Let's do an example. First let's make an XLSX file. Then read it in with GETNAMES=NO. Then transpose to get the list of variable names and first row values.
filename xlsx temp;
proc export data=sashelp.class file=xlsx dbms=xlsx replace;
run;
proc import file=xlsx dbms=xlsx out=want replace;
getnames=no;
run;
proc transpose data=want(obs=1) out=names;
var _all_;
run;
proc print;
run;
Results:
Obs _NAME_ _LABEL_ COL1 1 A A Name 2 B B Sex 3 C C Age 4 D D Height 5 E E Weight
So you might do something like this:
* Generate fileref that points to the XLSX file ;
filename xlsx "&folder/&file";
* Generate dataset with list of new names to use ;
data newnames;
input newname $32. ;
cards;
Job_No
Requesting_Client
Contact_Full_Name
Start_time
End_Time
Language
;
* read in the file without names ;
proc import file=xlsx dbms=xlsx out=want replace;
getnames=no;
run;
* Get the header row ;
proc transpose data=want(obs=1) out=names;
var _all_;
run;
* Find the first column location and assign newnames ;
* Set RANGE value for PROC IMPORT ;
%let range=;
data renames ;
colnum + 1;
set names ;
if lowcase(col1)='job no.' then do;
call symputx('range',cats('$',_name_,'2:'));
firstvar = colnum;
end;
retain firstvar ;
if firstvar then set newnames;
run;
* Generate old=new rename pairs ;
proc sql noprint;
select catx('=',_name_,newname) into :renames separated by ' '
from renames
where firstvar and newname ne ' '
;
quit;
* Read the file starting from second row and found column without name ;
* Rename the variables ;
proc import file=xlsx dbms=xlsx out=want(rename=(&renames)) replace;
range="&range";
getnames=no;
run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.