alto package¶
Module contents¶
Top-level package for Alto.
- class alto.Alternative(content: str)[source]¶
Bases:
object
- content: str¶
- classmethod from_xml(element: xml.etree.ElementTree.Element) → alto.Alternative[source]¶
- class alto.Alto(description: alto.Description, layout: alto.Layout)[source]¶
Bases:
object
Alto dataclass for manipulating Tesseract output files.
- Parameters
description (Description) – The “description” tag of alto xml documents, containing metadata
layout (Layout) – The “layout” tag of alto xml documents, containing parsed elements
- description: alto.Description¶
- extract_composed_blocks() → List[alto.ComposedBlock][source]¶
- extract_grouped_words(group_by: Union[Literal[TextLine], Literal[TextBlock], Literal[ComposedBlock]]) → List[List[str]][source]¶
Extracts all parsed words grouped at the required level.
- Args:
group_by (Union[Literal[‘TextLine’], Literal[‘TextBlock’], Literal[‘ComposedBlock’]]) : group level
- Returns:
List[List[str]]: List of list of words in each entity of target level
- extract_text_blocks() → List[alto.TextBlock][source]¶
- extract_text_lines() → List[alto.TextLine][source]¶
- extract_words() → List[str][source]¶
Extracts all parsed words regardless of their positions.
- Returns:
List[str]: List of words extracted from file
- layout: alto.Layout¶
- class alto.ComposedBlock(id: str, height: float, width: float, hpos: float, vpos: float, text_blocks: List[alto.TextBlock])[source]¶
Bases:
object
- extract_words() → List[str][source]¶
Extracts all parsed words regardless of their positions.
- Returns:
List[str]: List of words extracted from file
- classmethod from_xml(element: xml.etree.ElementTree.Element) → alto.ComposedBlock[source]¶
- height: float¶
- hpos: float¶
- id: str¶
- text_blocks: List[alto.TextBlock]¶
- vpos: float¶
- width: float¶
- class alto.Description(file_name: Optional[str])[source]¶
Bases:
object
- file_name: Optional[str]¶
- classmethod from_xml(element: xml.etree.ElementTree.Element) → alto.Description[source]¶
- class alto.Layout(pages: List[alto.Page])[source]¶
Bases:
object
- classmethod from_xml(element: xml.etree.ElementTree.Element) → alto.Layout[source]¶
- class alto.Page(id: str, height: float, width: float, physical_img_nr: int, printed_img_nr: Optional[int], print_spaces: List[alto.PrintSpace])[source]¶
Bases:
object
- extract_blocks() → List[alto.ComposedBlock][source]¶
- extract_lines() → List[alto.TextLine][source]¶
- extract_strings() → List[alto.String][source]¶
- extract_text_blocks() → List[alto.TextBlock][source]¶
- extract_words() → List[str][source]¶
Extracts all parsed words regardless of their positions.
- Returns:
List[str]: List of words extracted from file
- height: float¶
- id: str¶
- physical_img_nr: int¶
- print_spaces: List[alto.PrintSpace]¶
- printed_img_nr: Optional[int]¶
- width: float¶
- class alto.PrintSpace(height: float, width: float, hpos: float, vpos: float, pc: Optional[float], composed_blocks: List[alto.ComposedBlock])[source]¶
Bases:
object
- composed_blocks: List[alto.ComposedBlock]¶
- extract_words() → List[str][source]¶
Extracts all parsed words regardless of their positions.
- Returns:
List[str]: List of words extracted from file
- classmethod from_xml(element: xml.etree.ElementTree.Element) → alto.PrintSpace[source]¶
- height: float¶
- hpos: float¶
- pc: Optional[float]¶
- vpos: float¶
- width: float¶
- class alto.SP(width: float, hpos: float, vpos: float)[source]¶
Bases:
object
- hpos: float¶
- vpos: float¶
- width: float¶
- class alto.String(id: str, height: float, width: float, hpos: float, vpos: float, content: str, confidence: float, alternatives: List[alto.Alternative])[source]¶
Bases:
object
- alternatives: List[alto.Alternative]¶
- confidence: float¶
- content: str¶
- classmethod from_xml(element: xml.etree.ElementTree.Element) → alto.String[source]¶
- height: float¶
- hpos: float¶
- id: str¶
- vpos: float¶
- width: float¶
- class alto.TextBlock(id: Optional[str], height: float, width: float, hpos: float, vpos: float, text_lines: List[alto.TextLine])[source]¶
Bases:
object
- extract_words() → List[str][source]¶
Extracts all parsed words regardless of their positions.
- Returns:
List[str]: List of words extracted from file
- classmethod from_xml(element: xml.etree.ElementTree.Element) → alto.TextBlock[source]¶
- height: float¶
- hpos: float¶
- id: Optional[str]¶
- text_lines: List[alto.TextLine]¶
- vpos: float¶
- width: float¶
- class alto.TextLine(id: str, height: float, width: float, hpos: float, vpos: float, strings: List[Union[alto.String, alto.SP]])[source]¶
Bases:
object
- extract_words() → List[str][source]¶
Extracts all parsed words regardless of their positions.
- Returns:
List[str]: List of words extracted from file
- classmethod from_xml(element: xml.etree.ElementTree.Element) → alto.TextLine[source]¶
- height: float¶
- hpos: float¶
- id: str¶
- strings: List[Union[alto.String, alto.SP]]¶
- vpos: float¶
- width: float¶