Class HTMLScanner.ContentScanner

java.lang.Object
org.cyberneko.html.HTMLScanner.ContentScanner
All Implemented Interfaces:
HTMLScanner.Scanner
Enclosing class:
HTMLScanner

public class HTMLScanner.ContentScanner extends Object implements HTMLScanner.Scanner
The primary HTML document scanner.
Author:
Andy Clark
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    protected void
    addLocationItem(org.apache.xerces.xni.XMLAttributes attributes, int index)
    Adds location augmentations to the specified attribute.
    protected String
    nextContent(int len)
    Reads the next characters WITHOUT impacting the buffer content up to current offset.
    boolean
    scan(boolean complete)
    Scan.
    protected boolean
    scanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty)
    Scans a real attribute.
    protected boolean
    scanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty, char endc)
    Scans an attribute, pseudo or real.
    protected void
    Scans a CDATA section.
    protected void
    Scans characters.
    protected void
    Scans a comment.
    protected void
    Scans an end element.
    protected boolean
    scanMarkupContent(org.apache.xerces.util.XMLStringBuffer buffer, char cend)
    Scans markup content.
    protected void
    Scans a processing instruction.
    protected boolean
    scanPseudoAttribute(org.apache.xerces.util.XMLAttributesImpl attributes)
    Scans a pseudo attribute.
    protected String
    scanStartElement(boolean[] empty)
    Scans a start element.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • ContentScanner

      public ContentScanner()
  • Method Details

    • scan

      public boolean scan(boolean complete) throws IOException
      Scan.
      Specified by:
      scan in interface HTMLScanner.Scanner
      Parameters:
      complete - True if the scanner should not return until scanning is complete.
      Returns:
      True if additional scanning is required.
      Throws:
      IOException - Thrown if I/O error occurs.
    • nextContent

      protected String nextContent(int len) throws IOException
      Reads the next characters WITHOUT impacting the buffer content up to current offset.
      Parameters:
      len - the number of characters to read
      Returns:
      the read string (length may be smaller if EOF is encountered)
      Throws:
      IOException
    • scanCharacters

      protected void scanCharacters() throws IOException
      Scans characters.
      Throws:
      IOException
    • scanCDATA

      protected void scanCDATA() throws IOException
      Scans a CDATA section.
      Throws:
      IOException
    • scanComment

      protected void scanComment() throws IOException
      Scans a comment.
      Throws:
      IOException
    • scanMarkupContent

      protected boolean scanMarkupContent(org.apache.xerces.util.XMLStringBuffer buffer, char cend) throws IOException
      Scans markup content.
      Throws:
      IOException
    • scanPI

      protected void scanPI() throws IOException
      Scans a processing instruction.
      Throws:
      IOException
    • scanStartElement

      protected String scanStartElement(boolean[] empty) throws IOException
      Scans a start element.
      Parameters:
      empty - Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").
      Throws:
      IOException
    • scanAttribute

      protected boolean scanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty) throws IOException
      Scans a real attribute.
      Parameters:
      attributes - The list of attributes.
      empty - Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").
      Throws:
      IOException
    • scanPseudoAttribute

      protected boolean scanPseudoAttribute(org.apache.xerces.util.XMLAttributesImpl attributes) throws IOException
      Scans a pseudo attribute.
      Parameters:
      attributes - The list of attributes.
      Throws:
      IOException
    • scanAttribute

      protected boolean scanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty, char endc) throws IOException
      Scans an attribute, pseudo or real.
      Parameters:
      attributes - The list of attributes.
      empty - Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").
      endc - The end character that appears before the closing angle bracket ('>').
      Throws:
      IOException
    • addLocationItem

      protected void addLocationItem(org.apache.xerces.xni.XMLAttributes attributes, int index)
      Adds location augmentations to the specified attribute.
    • scanEndElement

      protected void scanEndElement() throws IOException
      Scans an end element.
      Throws:
      IOException