All Packages  Class Hierarchy  This Package  Previous  Next  Index


Class Acme.HtmlScanner

java.lang.Object
   |
   +----java.io.InputStream
           |
           +----java.io.FilterInputStream
                   |
                   +----Acme.HtmlScanner

public class HtmlScanner
extends FilterInputStream
A fast HTML scanning class.

This is a FilterInputStream that lets you read an HTML file, and at the same time scans it for URLs. You get the full text of the file through the normal read() calls, and you also get special callbacks with the URL strings.

The scanning is done by a hand-built finite-state machine.

Fetch the software.
Fetch the entire Acme package.


Variable Index

 o gettingUrl
Whether the interpreter is currently accumulating a URL.

Constructor Index

 o HtmlScanner(InputStream, URL, HtmlObserver)
Constructor.
 o HtmlScanner(InputStream, URL, HtmlObserver, Object)
Constructor with clientData.

Method Index

 o addObserver(HtmlObserver)
Add an extra observer to this scanner.
 o addObserver(HtmlObserver, Object)
Add an extra observer to this scanner.
 o close()
Override close() with one that makes sure the entire file gets read, so that all its URLs get extracted, even if the caller isn't interested in the data.
 o finalize()
Add a finalize method to try and make sure that our jiggered close() gets called.
 o markSupported()
Disallow mark()/reset().
 o read()
Override to make sure this goes through the above read( byte[], int, int) method.
 o read(byte[])
Override to make sure this goes through the above read( byte[], int, int) method.
 o read(byte[], int, int)
Special version of read() that runs all data through the HTML scanner.
 o skip(long)
Override to make sure this goes through the above read( byte[], int, int) method.
 o substitute(int, String)
Can be used to change the scan buffer in the middle of a scan.

Variables

 o gettingUrl
 protected boolean gettingUrl
Whether the interpreter is currently accumulating a URL.

Constructors

 o HtmlScanner
 public HtmlScanner(InputStream s,
                    URL thisUrl,
                    HtmlObserver observer)
Constructor. If the client is not interested in getting called back with URLs, observer can be null (but then there's not much point in using this class).

 o HtmlScanner
 public HtmlScanner(InputStream s,
                    URL thisUrl,
                    HtmlObserver observer,
                    Object clientData)
Constructor with clientData. If the client is not interested in getting called back with URLs, observer can be null (but then there's not much point in using this class).

Methods

 o addObserver
 public void addObserver(HtmlObserver observer)
Add an extra observer to this scanner. Multiple observers get called in the order they were added.

 o addObserver
 public void addObserver(HtmlObserver observer,
                         Object clientData)
Add an extra observer to this scanner. Multiple observers get called in the order they were added.

 o read
 public int read(byte b[],
                 int off,
                 int len) throws IOException
Special version of read() that runs all data through the HTML scanner.

Overrides:
read in class FilterInputStream
 o close
 public void close() throws IOException
Override close() with one that makes sure the entire file gets read, so that all its URLs get extracted, even if the caller isn't interested in the data.

Overrides:
close in class FilterInputStream
 o finalize
 protected void finalize() throws Throwable
Add a finalize method to try and make sure that our jiggered close() gets called.

Throws: Throwable
if there's a problem
Overrides:
finalize in class Object
 o read
 public int read() throws IOException
Override to make sure this goes through the above read( byte[], int, int) method.

Overrides:
read in class FilterInputStream
 o read
 public int read(byte b[]) throws IOException
Override to make sure this goes through the above read( byte[], int, int) method.

Overrides:
read in class FilterInputStream
 o skip
 public long skip(long n) throws IOException
Override to make sure this goes through the above read( byte[], int, int) method.

Overrides:
skip in class FilterInputStream
 o markSupported
 public boolean markSupported()
Disallow mark()/reset().

Overrides:
markSupported in class FilterInputStream
 o substitute
 protected void substitute(int oldLen,
                           String newStr)
Can be used to change the scan buffer in the middle of a scan. Black Magic! Dangerous! Be careful! For use only by HtmlEditScanner - any other use voids warranty.


All Packages  Class Hierarchy  This Package  Previous  Next  Index

ACME Java  ACME Labs