DATA Step, Macro, Functions and more

HTML Input

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

HTML Input

Hi all,

I'm trying to convert some html that has company listings into a data set that has the company name and address. The general format for each company is something like this:

<p class="search-1">1.</p>

<p class="search-2"><strong>Company Name</strong> <br />

1 Test St <br />

Suite 1<br />

Test City, CA, 91763<br />

(555) 555-5555</p>

The input statement I'm using is:

input @'<p class="search-1">' ID 2.

         @'<p class="search-2"><strong>' Name $40.

         Address1 $40.

         Address2 $40.

         City $40.

         Phone $40.

    ;

This comes close but has extra html in some fields and provides some erratic results, especially with the name and address fields. I've tried a few different things but I can't quite get it to work. Any input is greatly appreciated.

thanks in advance,

Rob


Accepted Solutions
Solution
‎04-10-2012 03:45 AM
Valued Guide
Posts: 2,177

Re: HTML Input

WE need to be careful with INformats on input statements. A specific length there will require and use that much data even when you might think it should 'finish earlier'!

For example

address1 $40.

will consume 40 characters

Placing a : before the informat changes behaviour closer to what you want by using input buffer up to the next delimiter or all the informat length has been used whichever comes first.

You might want to allow '<' to be your delimiter as well as blank.

View solution in original post


All Replies
Super User
Posts: 10,028

Re: HTML Input

Can you attach a html file ?

Solution
‎04-10-2012 03:45 AM
Valued Guide
Posts: 2,177

Re: HTML Input

WE need to be careful with INformats on input statements. A specific length there will require and use that much data even when you might think it should 'finish earlier'!

For example

address1 $40.

will consume 40 characters

Placing a : before the informat changes behaviour closer to what you want by using input buffer up to the next delimiter or all the informat length has been used whichever comes first.

You might want to allow '<' to be your delimiter as well as blank.

New Contributor
Posts: 4

Re: HTML Input

Thanks Peter! Adding a colon and changing the delimiter worked just right.

In case any one is curious this is the input statement I ended up using:

infile test firstobs=77 dlm='<' ;

input @'<p class="search-1">' ID 2.

    @'<p class="search-2"><strong>' Name :  $40. /

      Address1 : $40. /

      Address2 :  $40. /

      City : $40. /

   Phone : $40.

    ;

Thanks again!

Rob

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 260 views
  • 0 likes
  • 3 in conversation