Determine first row in which dataset starts dynamically

Reply
Super Contributor
Posts: 418

Determine first row in which dataset starts dynamically

Hello everyone. I have been given the task of looking at hundreds of excel files with multiple tabs, and finding the heaers of each tab per file.  The problem is the header rows exist on different Rows within the excel file per tab, and are not consistent in any way across time.

I have included an excel file for an Example of what  I am looking to try to solve.  I have been given one saving grace which is I know which field should START every dataset. Meaning I know that hte first field should always be "loanNumber", or "Loan_number" or etc...

I have some code that data _Null_ gave me to read through a csv file to map each of the column names into a SAS dataset, shown below.

data output;

length name $8000;

infile "c:\mydesktop\myfile"

     lrecl=32000 dsd firstobs=1 obs=1 length=L column=C;

do until(C>L);

  input name @;

  output;

  end;

run;

I was thinking I could wrap this into a Macro row by row on the CSV file, and then when the first record is equal to the record I'm looking too stop the loop.

First question:

How do I stop a %do  loop in SAS when a condition is met? (Aka what is the syntax for an exit loop clause. I want something like following.

%do i=1 %to &csvrowcount;

data output;

length name $8000;

infile "c:\mydesktop\myfile"

     lrecl=32000 dsd firstobs=1 obs=1 length=L column=C;

do until(C>L);

  input name @;

  output;

  end;

run;

data testoutput;

set output;

if _n_=1 and name="LOAN_NUMBER" Then LEAVE MACRO LOOP;

end;

%end;

Second Question:

Does anyone know a way to do this without a macro loop, but perhaps with a do loop within the dataset call defined above?

Explanation of Attached Excel file:

By definition I only want the fields that are colored Yellow. I would grab these columns by looking at the FIRST value against a mapping table of "allowable" table start names (Loan_number, client_loan_number, loan_number_client, etc..).

Note that on the final tab we don't find this field so we simply don't want this tab to pull any information at all.

Please let me know if anyone has ANY questions on what I need / what I am looking for, or if anyone has a bright way to get to this in an easier manner!

Super Contributor
Posts: 418

Re: Determine first row in which dataset starts dynamically

Posted in reply to Anotherdream

This just became much more complicated. It appears that many of the tabs DONT have a mapping table for the valid start record.

meaning many of these tabs have never been mapped at all, but I still need to dynamically grab the headers, KNOWING NOTHING about the tabs before the process runs.....

Any ideas from anyone? I'd take any advice at this point as I am stuck.

Super User
Posts: 19,862

Re: Determine first row in which dataset starts dynamically

Posted in reply to Anotherdream

Hire two summer students? Two people do it independently and compare answers. Likelihood of mistakes is small, but still possible.

I'd consider ignoring all of that, and using the back end data. So save your .xlxs file as .zip, unzip it and look at the components in the files. There *may* be a way there, but I'm not sure how it will store the text especially when they're on the same row, sheet but belong to different records like one of your examples.

Super User
Posts: 19,862

Re: Determine first row in which dataset starts dynamically

Posted in reply to Anotherdream

Can you use DDE?

Super Contributor
Posts: 418

Re: Determine first row in which dataset starts dynamically

Hiya Reeza, I can use any method for this.

The trick is, how do we know where column headers start knowing NOTHING about the tab names, file names, or the starting location of the column headers, GIVEN that the headers are not always the first row of data found on the file (often they are not), and they do not start in any consistent location.

I'm not sure What you mean by the zip trick, but i'm going to look into it now! maybe there's something i'm not realizing there

Super User
Posts: 19,862

Re: Determine first row in which dataset starts dynamically

Posted in reply to Anotherdream

So you're working on Windows?

You don't need to know tab names, hopefully you know the file name Smiley Happy

I'd use DDE to find the starting location. It's essentially brute force but computer brute force not Brandon reading many excel files.

I do have to ask, is it worth really programming or easier to do manually though. It will be a lot of work to program.

Super Contributor
Posts: 418

Re: Determine first row in which dataset starts dynamically

Posted in reply to Anotherdream

It's over 600 files so it seems like programming might be faster. However your point is well taken.

In related to the DDE, i'm not sure I understand how that would work, as wouldn't it grab the first piece of data in general, regardless of if it was a headers row( see tab #3 in my example)

Super User
Posts: 19,862

Re: Determine first row in which dataset starts dynamically

Posted in reply to Anotherdream

No because you'd loop it through until it found the headers and then start reading in the data. I'll see if I can mock something up but I doubt it will be today.

I'm assuming that for your last tab in the sample worksheet you won't be reading anything from that tab?

Super Contributor
Posts: 418

Re: Determine first row in which dataset starts dynamically

Posted in reply to Anotherdream

There are X tabs, all of which have data on them (even the last one sadly).

In addition it is entirely possible for a tab to have ZERO data on it..... I know this is incredibly challenging, so thank you very much for taking time out of your day to help me with this! Frankly this task is just above me I think.

So another unique issue just came up. IT appears that one tab can have multiple datasets on it (multiple header rows) like two tables stacked on top of each-other.

In these circumstances I am fine with grabbing either table (or both), whatever is easier. I am also find with failing these tabs as "can't read" and having someone manaully check them as they are apparently extremely rare.

I just wanted to bring that up in-case that affects your coding (i'm sure it will, but if that's too much of a challenge don't worry about it, ANY help you give me is better than where I am now).

Super User
Posts: 19,862

Re: Determine first row in which dataset starts dynamically

Posted in reply to Anotherdream

Can you mock up a workbook with some of these situations and data? Gives me a starting point and saves time Smiley Happy

Super Contributor
Posts: 418

Re: Determine first row in which dataset starts dynamically

Posted in reply to Anotherdream

I sure can! i'll have it to you in like 10-15 minutes.

Here it is!

Respected Advisor
Posts: 3,799

Re: Determine first row in which dataset starts dynamically

Posted in reply to Anotherdream

Seems like there is very little (from you examples) that a program could use to find the data about the data you need.

Perhaps you could create named ranges "by hand' for each sheet and create a "map sheet" as the first sheet in each workbook.  Then you can read the map sheet with PROC IMPORT and code gen the rest of the PROC IMPORTS to read the named ranges from the various tabs.

Just be sure to make the map sheet easy to read with PROC IMPORT.

Super Contributor
Posts: 418

Re: Determine first row in which dataset starts dynamically

Posted in reply to Anotherdream

Did that help at all

Super User
Posts: 19,862

Re: Determine first row in which dataset starts dynamically

Posted in reply to Anotherdream

Apparently the updates section doesn't show the attachment!

I'll give this a try today and/or tomorrow and get back to you.

Super User
Posts: 19,862

Re: Determine first row in which dataset starts dynamically

Send me an email and I'll send you what I have. I can't figure out how to add connections so you can add me and then send a direct message if you can't find my email.

Ask a Question
Discussion stats
  • 14 replies
  • 971 views
  • 0 likes
  • 3 in conversation