Class OfficeReader

java.lang.Object
writer2latex.office.OfficeReader

public class OfficeReader extends Object

This class reads and collects global information about an OOo document. This includes styles, forms, information about indexes and references etc.

  • Constructor Details

    • OfficeReader

      public OfficeReader(OfficeDocument oooDoc, boolean bAllParagraphsAreSoft)
      Constructor; read a document
  • Method Details

    • isTextElement

      public static boolean isTextElement(Node node)
      Checks, if a node is an element in the text namespace
      Parameters:
      node - the node to check
      Returns:
      true if this is a text element
    • isTableElement

      public static boolean isTableElement(Node node)
      Checks, if a node is an element in the table namespace
      Parameters:
      node - the node to check
      Returns:
      true if this is a table element
    • isDrawElement

      public static boolean isDrawElement(Node node)
      Checks, if a node is an element in the draw namespace
      Parameters:
      node - the node to check
      Returns:
      true if this is a draw element
    • isNoteElement

      public static boolean isNoteElement(Node node)
      Checks, if a node is an element representing a note (footnote/endnote)
      Parameters:
      node - the node to check
      Returns:
      true if this is a note element
    • isSingleParagraph

      public static boolean isSingleParagraph(Node node)
      Checks, if this node contains at most one element, and that this is a paragraph.
      Parameters:
      node - the node to check
      Returns:
      true if the node contains a single paragraph or nothing
    • isWhitespaceContent

      public static boolean isWhitespaceContent(Node node)

      Checks, if the only text content of this node is whitespace

      Parameters:
      node - the node to check (should be a paragraph node or a child of a paragraph node)
      Returns:
      true if the node contains whitespace only
    • isWhitespace

      public static boolean isWhitespace(String s)

      Checks, if this text is whitespace

      Parameters:
      s - the String to check
      Returns:
      true if the String contains whitespace only
    • getCharacterCount

      public static int getCharacterCount(Node node)
      Counts the number of characters (text nodes) in this element excluding footnotes etc.
      Parameters:
      node - the node to count in
      Returns:
      the number of characters
    • getTextContent

      public String getTextContent(Node node)
    • getNextChar

      public static char getNextChar(Node node)
      Return the next character in logical order
    • isPackageFormat

      public boolean isPackageFormat()
      Checks whether or not this document is in package format
      Returns:
      true if it's in package format
    • isInPackage

      public boolean isInPackage(String sUrl)
      Checks whether this url is internal to the package
      Parameters:
      sUrl - the url to check
      Returns:
      true if the url is internal to the package
    • getFontDeclarations

      public OfficeStyleFamily getFontDeclarations()

      Get the collection of all font declarations.

      Returns:
      the OfficeStyleFamily of font declarations
    • getFontDeclaration

      public FontDeclaration getFontDeclaration(String sName)

      Get a specific font declaration

      Parameters:
      sName - the name of the font declaration
      Returns:
      a FontDeclaration representing the font
    • getTextStyles

      public OfficeStyleFamily getTextStyles()
    • getTextStyle

      public StyleWithProperties getTextStyle(String sName)
    • getParStyles

      public OfficeStyleFamily getParStyles()
    • getParStyle

      public StyleWithProperties getParStyle(String sName)
    • getDefaultParStyle

      public StyleWithProperties getDefaultParStyle()
    • getSectionStyles

      public OfficeStyleFamily getSectionStyles()
    • getSectionStyle

      public StyleWithProperties getSectionStyle(String sName)
    • getTableStyles

      public OfficeStyleFamily getTableStyles()
    • getTableStyle

      public StyleWithProperties getTableStyle(String sName)
    • getColumnStyles

      public OfficeStyleFamily getColumnStyles()
    • getColumnStyle

      public StyleWithProperties getColumnStyle(String sName)
    • getRowStyles

      public OfficeStyleFamily getRowStyles()
    • getRowStyle

      public StyleWithProperties getRowStyle(String sName)
    • getCellStyles

      public OfficeStyleFamily getCellStyles()
    • getCellStyle

      public StyleWithProperties getCellStyle(String sName)
    • getDefaultCellStyle

      public StyleWithProperties getDefaultCellStyle()
    • getFrameStyles

      public OfficeStyleFamily getFrameStyles()
    • getFrameStyle

      public StyleWithProperties getFrameStyle(String sName)
    • getDefaultFrameStyle

      public StyleWithProperties getDefaultFrameStyle()
    • getPresentationStyles

      public OfficeStyleFamily getPresentationStyles()
    • getPresentationStyle

      public StyleWithProperties getPresentationStyle(String sName)
    • getDefaultPresentationStyle

      public StyleWithProperties getDefaultPresentationStyle()
    • getDrawingPageStyles

      public OfficeStyleFamily getDrawingPageStyles()
    • getDrawingPageStyle

      public StyleWithProperties getDrawingPageStyle(String sName)
    • getDefaultDrawingPageStyle

      public StyleWithProperties getDefaultDrawingPageStyle()
    • getListStyles

      public OfficeStyleFamily getListStyles()
    • getListStyle

      public ListStyle getListStyle(String sName)
    • getPageLayouts

      public OfficeStyleFamily getPageLayouts()
    • getPageLayout

      public PageLayout getPageLayout(String sName)
    • getMasterPages

      public OfficeStyleFamily getMasterPages()
    • getMasterPage

      public MasterPage getMasterPage(String sName)
    • getOutlineStyle

      public ListStyle getOutlineStyle()
    • getFootnotesConfiguration

      public PropertySet getFootnotesConfiguration()
    • getEndnotesConfiguration

      public PropertySet getEndnotesConfiguration()
    • getHeadingStyle

      public StyleWithProperties getHeadingStyle(int nLevel)

      Returns the paragraph style associated with headings of a specific level. Returns null if no such style is known.

      In principle, different styles can be used for each heading, in practice the same (soft) style is used for all headings of a specific level.

      Parameters:
      nLevel - the level of the heading
      Returns:
      a StyleWithProperties object representing the style
    • getFirstMasterPage

      public MasterPage getFirstMasterPage()

      Returns the first master page used in the document. If no master page is used explicitly, the first master page found in the styles is returned. Returns null if no master pages exists.

      Returns:
      a MasterPage object representing the master page
    • getMajorityLanguage

      public String getMajorityLanguage()
      Return the iso language used in most paragaph styles (in a well-structured document this will be the default language) TODO: Base on content rather than style
      Returns:
      the iso language
    • getTocReader

      public TocReader getTocReader(Element onode)

      Returns a reader for a specific toc

      Parameters:
      onode - the text:table-of-content-node
      Returns:
      the reader, or null
    • isIndexSourceStyle

      public boolean isIndexSourceStyle(String sStyleName)

      Is this style used in some toc as an index source style?

      Parameters:
      sStyleName - the name of the style
      Returns:
      true if this is an index source style
    • isFigureSequenceName

      public boolean isFigureSequenceName(String sName)

      Does this sequence name belong to a lof?

      Parameters:
      sName - the name of the sequence
      Returns:
      true if it belongs to an index
    • isTableSequenceName

      public boolean isTableSequenceName(String sName)

      Does this sequence name belong to a lot?

      Parameters:
      sName - the name of the sequence
      Returns:
      true if it belongs to an index
    • addTableSequenceName

      public void addTableSequenceName(String sName)

      Add a sequence name for table captions.

      OpenDocument has a very weak notion of table captions: A caption is a paragraph containing a text:sequence element. Moreover, the only source to identify which sequence number to use is the list(s) of tables. If there's no list of tables, captions cannot be identified. Thus this method lets the user add a sequence name to identify the table captions.

      Parameters:
      sName - the name to add
    • addFigureSequenceName

      public void addFigureSequenceName(String sName)

      Add a sequence name for figure captions.

      OpenDocument has a very weak notion of figure captions: A caption is a paragraph containing a text:sequence element. Moreover, the only source to identify which sequence number to use is the list(s) of figures. If there's no list of figures, captions cannot be identified. Thus this method lets the user add a sequence name to identify the figure captions.

      Parameters:
      sName - the name to add
    • getSequenceName

      public String getSequenceName(Element par)

      Get the sequence name associated with a paragraph

      Parameters:
      par - the paragraph to look up
      Returns:
      the sequence name or null
    • getSequenceFromRef

      public String getSequenceFromRef(String sRefName)

      Get the sequence name associated with a reference name

      Parameters:
      sRefName - the reference name to use
      Returns:
      the sequence name or null
    • hasFootnoteRefTo

      public boolean hasFootnoteRefTo(String sId)

      Is there a reference to this footnote id?

      Parameters:
      sId - the id of the footnote
      Returns:
      true if there is a reference
    • hasEndnoteRefTo

      public boolean hasEndnoteRefTo(String sId)

      Is there a reference to this endnote?

      Parameters:
      sId - the id of the endnote
      Returns:
      true if there is a reference
    • referenceMarkInHeading

      public boolean referenceMarkInHeading(String sName)
      Is this reference mark contained in a heading?
      Parameters:
      sName - the name of the reference mark
      Returns:
      true if so
    • hasReferenceRefTo

      public boolean hasReferenceRefTo(String sName)
      Is there a reference to this reference mark?
      Parameters:
      sName - the name of the reference mark
      Returns:
      true if there is a reference
    • bookmarkInHeading

      public boolean bookmarkInHeading(String sName)
      Is this bookmark contained in a heading?
      Parameters:
      sName - the name of the bookmark
      Returns:
      true if so
    • hasBookmarkRefTo

      public boolean hasBookmarkRefTo(String sName)

      Is there a reference to this bookmark?

      Parameters:
      sName - the name of the bookmark
      Returns:
      true if there is a reference
    • hasSequenceRefTo

      public boolean hasSequenceRefTo(String sId)

      Is there a reference to this sequence field?

      Parameters:
      sId - the id of the sequence field
      Returns:
      true if there is a reference
    • hasLinkTo

      public boolean hasLinkTo(String sName)

      Is there a link to this sequence anchor name?

      Parameters:
      sName - the name of the anchor
      Returns:
      true if there is a link
    • isOpenDocument

      public boolean isOpenDocument()

      Is this an OASIS OpenDocument or an OOo 1.0 document?

      Returns:
      true if it's an OASIS OpenDocument
    • isText

      public boolean isText()

      Is this an text document?

      Returns:
      true if it's a text document
    • isSpreadsheet

      public boolean isSpreadsheet()

      Is this a spreadsheet document?

      Returns:
      true if it's a spreadsheet document
    • isPresentation

      public boolean isPresentation()

      Is this a presentation document?

      Returns:
      true if it's a presentation document
    • getContent

      public Element getContent()

      Get the content element

      In the old file format this means the office:body element

      In the OpenDocument format this means a office:text, office:spreadsheet or office:presentation element.

      Returns:
      the content Element
    • getForms

      public FormsReader getForms()

      Get the forms belonging to this document.

      Returns:
      a FormsReader representing the forms
    • getTableReader

      public TableReader getTableReader(Element node)

      Read a table from a table:table node

      Parameters:
      node - the table:table Element node
      Returns:
      a TableReader object representing the table