How to find the fist row where the column header contains specific key...

peishanxin · Posted 10-25-2024 04:05 PM

Hi all,

I have multiple Excel files saved in a folder, each containing only one sheet with randomly named sheet tabs. I need to find the first row that contains the column header "Job No." and import all the data starting from that column header for each file. There are 11 columns and random rows in each files.

Below is my code for processing a single Excel file to identify the row containing "Job No." However, I'm encountering an error on the line do i = 1 to dim(cols); that says, "ERROR: A loop variable cannot be an array name or a character variable; It must be a scalar numeric."

Can someone help me correct the code and make it applicable for multiple files? Thanks!

%let folder = /mnt;
%let filename = file.xlsx; 

proc import datafile="&folder./&filename."
            out=raw_data
            dbms=xlsx
            replace;
    getnames=no; 
run;

data _null_;
    set raw_data;
    array cols _all_; 
    if _n_ = 1 then do; 
        found = 0; 
        do i = 1 to dim(cols);
            if upcase(vname(cols[i])) = "JOB NO." then do; 
                call symputx('header_row', _n_); 
                found = 1; /* Set found flag */
                leave; 
            end;
        end;
        if found = 0 then do;
            put "ERROR: 'Job No.' not found in the first row.";
            call symputx('header_row', 0); 
        end;
    end;
run;

data final_data;
    set raw_data(firstobs=&header_row);
run;

data final_data;
    set final_data;
    if _n_ = 1 then do;
        rename 
            col1 = Job_No
			col2 = Requesting_Client
			col3 = Contact_Full_Name
			col4 = Start_time
			col5 = End_Time
			col6 = Language
            /* there are more columns */
            ;
    end;
run;

proc print data=final_data;
run;

Kurt_Bremser · Posted 10-25-2024 05:35 PM

In a case where there is no clear header row in the spreadsheet, PROC IMPORT will use the Excel column identifiers as variable names. This means you already have a character variable i in the dataset, which you can't use in the iterative DO. Use a variable which won't be automatically created, like _i.

Your task will be easier if you save the spreadsheet to a text file, where you can read (in a first step) row-by-row until you encounter the "header", from which you take the column names and row number where the data starts; you can then use these findings to dynamically create the code that actually creates the wanted dataset.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 10-25-2024 05:45 PM

I would actually expect a majority of the Excel files to throw an error at the ARRAY statement as an array must have all members of the same type. So if the spreadsheet has at least one numeric and one character variable, regardless of number of actual variables then you will get an array error of:

ERROR: All variables in array list must be the same type, i.e., all numeric or character.

At which point the data step would stop.

Tom · Posted 10-25-2024 06:21 PM

Your code appears to be assuming that PROC IMPORT will name the variable COL1, COL2 , ...

Instead it will name the variable A,B,...

So if you have enough columns in the XLSX sheet there will already be a variable named I.

Are you saying you only want to scan the values of the first ROW in the speadsheet to determine the number of COLUMNS in the spreadsheet to skip?

To get the names and values of the first observation in a dataset you can use PROC TRANSPOSE with the OBS=1 dataset option. Make sure to use VAR _ALL_ so that transpose takes all of the variables and not just the numeric ones.

Let's do an example. First let's make an XLSX file. Then read it in with GETNAMES=NO. Then transpose to get the list of variable names and first row values.

filename xlsx temp;
proc export data=sashelp.class file=xlsx dbms=xlsx replace;
run;
proc import file=xlsx dbms=xlsx out=want replace;
  getnames=no;
run;
proc transpose data=want(obs=1) out=names;
 var _all_;
run;
proc print;
run;

Results:

Obs    _NAME_    _LABEL_    COL1

 1       A          A       Name
 2       B          B       Sex
 3       C          C       Age
 4       D          D       Height
 5       E          E       Weight

So you might do something like this:

* Generate fileref that points to the XLSX file ;
filename xlsx "&folder/&file";

* Generate dataset with list of new names to use ;
data newnames;
  input newname $32. ;
cards;
Job_No
Requesting_Client
Contact_Full_Name
Start_time
End_Time
Language
;

* read in the file without names ;
proc import file=xlsx dbms=xlsx out=want replace;
  getnames=no;
run;
* Get the header row ;
proc transpose data=want(obs=1) out=names;
  var _all_;
run;

* Find the first column location and assign newnames ;
* Set RANGE value for PROC IMPORT ;
%let range=;
data renames ;
  colnum + 1;
  set names ;
  if lowcase(col1)='job no.' then do;
     call symputx('range',cats('$',_name_,'2:'));
     firstvar = colnum;
  end;
  retain firstvar ;
  if firstvar then set newnames;
run;
* Generate old=new rename pairs ;
proc sql noprint;
  select catx('=',_name_,newname) into :renames separated by ' '
  from renames 
  where firstvar and newname ne ' '
  ;
quit;
* Read the file starting from second row and found column without name ;
* Rename the variables ;
proc import file=xlsx dbms=xlsx out=want(rename=(&renames)) replace;
  range="&range";
  getnames=no;
run;

How to find the fist row where the column header contains specific keywords

Re: How to find the fist row where the column header contains specific keywords

Re: How to find the fist row where the column header contains specific keywords

Re: How to find the fist row where the column header contains specific keywords

How to find the fist row where the column header contains specific keywords

Re: How to find the fist row where the column header contains specific keywords

Re: How to find the fist row where the column header contains specific keywords

Re: How to find the fist row where the column header contains specific keywords

Registration is open

SAS Training: Just a Click Away