API Reference#

This section provides complete API documentation for all classes and methods in epub-utils.

Document Class#

class Document(path)#

Main class for working with EPUB files.

Parameters:

path (str) – Path to the EPUB file

Example:

from epub_utils import Document

doc = Document("book.epub")
print(doc.package.metadata.title)
container#

Access to the container information.

Type:

Container

Returns:

Container object with container.xml information

Example:

container = doc.container
print(f"Package path: {container.rootfile_path}")
package#

Access to the package (OPF) information.

Type:

Package

Returns:

Package object with OPF file information

Example:

package = doc.package
print(f"Title: {package.metadata.title}")
toc#

Access to the table of contents.

Type:

TableOfContents

Returns:

Table of contents object

Example:

toc = doc.toc
toc_xml = toc.to_xml()
ncx#

Access to the NCX (Navigation Control for XML) table of contents.

Type:

TableOfContents or None

Returns:

NCX table of contents object for EPUB 2, or for EPUB 3 if NCX is present, None otherwise

Example:

ncx = doc.ncx
if ncx:
    ncx_xml = ncx.to_xml()

Note: For EPUB 2, this returns the same as toc. For EPUB 3, this specifically accesses the NCX file if present, which provides backward compatibility.

nav#

Access to the Navigation Document (EPUB 3 only).

Type:

TableOfContents or None

Returns:

Navigation Document table of contents object for EPUB 3, None for EPUB 2 or if not present

Example:

nav = doc.nav
if nav:
    nav_xml = nav.to_xml()

Note: This property specifically accesses EPUB 3 Navigation Documents. Returns None for EPUB 2 documents.

get_files_info()#

Get detailed information about all files in the EPUB.

Returns:

List of dictionaries containing file information

Return type:

List[Dict[str, Union[str, int]]]

Each dictionary contains: - path (str): File path within the EPUB - size (int): Uncompressed size in bytes - compressed_size (int): Compressed size in bytes - modified (str): Last modified date in ISO format

Example:

files = doc.get_files_info()
for file_info in files:
    print(f"{file_info['path']}: {file_info['size']} bytes")
list_files()#

Get basic information about all files in the EPUB.

Returns:

List of dictionaries with basic file information

Return type:

List[Dict[str, str]]

Example:

files = doc.list_files()
print(f"EPUB contains {len(files)} files")

Container Class#

class Container#

Represents the META-INF/container.xml file information.

rootfile_path#

Path to the main package file within the EPUB.

Type:

str

rootfile_media_type#

Media type of the main package file.

Type:

str

to_xml(highlight_syntax=True)#

Get formatted XML representation.

Parameters:

highlight_syntax (bool) – Whether to apply syntax highlighting

Returns:

Formatted XML string

Return type:

str

to_str()#

Get raw XML content.

Returns:

Raw XML string

Return type:

str

Package Class#

class Package#

Represents the main OPF package file.

metadata#

Package metadata information.

Type:

Metadata

manifest#

Package manifest information.

Type:

Manifest

spine#

Package spine information.

Type:

Spine

to_xml(highlight_syntax=True)#

Get formatted XML representation of the complete package.

Parameters:

highlight_syntax (bool) – Whether to apply syntax highlighting

Returns:

Formatted XML string

Return type:

str

to_str()#

Get raw XML content of the complete package.

Returns:

Raw XML string

Return type:

str

Metadata Class#

class Metadata#

Represents Dublin Core and EPUB-specific metadata.

title#

Book title from dc:title element.

Type:

str

creator#

Book author/creator from dc:creator element.

Type:

str

language#

Language code from dc:language element.

Type:

str

identifier#

Unique identifier from dc:identifier element.

Type:

str

publisher#

Publisher from dc:publisher element.

Type:

str

date#

Publication date from dc:date element.

Type:

str

subject#

Subject/keywords from dc:subject element.

Type:

str

description#

Description from dc:description element.

Type:

str

contributor#

Contributor from dc:contributor element.

Type:

str

type#

Resource type from dc:type element.

Type:

str

format#

Format from dc:format element.

Type:

str

source#

Source from dc:source element.

Type:

str

relation#

Relation from dc:relation element.

Type:

str

coverage#

Coverage from dc:coverage element.

Type:

str

rights#

Rights information from dc:rights element.

Type:

str

__getattr__(name)#

Dynamic attribute access for any metadata field.

Parameters:

name (str) – Metadata field name

Returns:

Metadata value or empty string

Return type:

str

Example:

# Access any metadata field
isbn = metadata.isbn if hasattr(metadata, 'isbn') else 'Not available'
series = getattr(metadata, 'series', 'Not available')
to_xml(highlight_syntax=True)#

Get formatted XML representation of metadata.

Parameters:

highlight_syntax (bool) – Whether to apply syntax highlighting

Returns:

Formatted XML string

Return type:

str

to_kv()#

Get metadata as key-value pairs.

Returns:

Key-value formatted string

Return type:

str

Example:

kv_data = metadata.to_kv()
print(kv_data)
# Output:
# title: The Great Gatsby
# creator: F. Scott Fitzgerald
# language: en
to_str()#

Get raw XML content of metadata.

Returns:

Raw XML string

Return type:

str

Manifest Class#

class Manifest#

Represents the package manifest section.

items#

Dictionary of manifest items.

Type:

Dict[str, Dict[str, str]]

Each item contains: - href: File path - media-type: MIME type - Other attributes as needed

Example:

for item_id, item in manifest.items.items():
    print(f"ID: {item_id}")
    print(f"  File: {item['href']}")
    print(f"  Type: {item['media-type']}")
to_xml(highlight_syntax=True)#

Get formatted XML representation.

Parameters:

highlight_syntax (bool) – Whether to apply syntax highlighting

Returns:

Formatted XML string

Return type:

str

to_str()#

Get raw XML content.

Returns:

Raw XML string

Return type:

str

Spine Class#

class Spine#

Represents the package spine section.

items#

List of spine items in reading order.

Type:

List[Dict[str, str]]

Example:

for item in spine.items:
    print(f"Reading order item: {item}")
to_xml(highlight_syntax=True)#

Get formatted XML representation.

Parameters:

highlight_syntax (bool) – Whether to apply syntax highlighting

Returns:

Formatted XML string

Return type:

str

to_str()#

Get raw XML content.

Returns:

Raw XML string

Return type:

str

TableOfContents Class#

class TableOfContents#

Represents the table of contents (NCX or Navigation Document).

to_xml(highlight_syntax=True)#

Get formatted XML representation.

Parameters:

highlight_syntax (bool) – Whether to apply syntax highlighting

Returns:

Formatted XML string

Return type:

str

to_str()#

Get raw XML content.

Returns:

Raw XML string

Return type:

str

Content Classes#

class Content#

Base class for EPUB content documents.

to_xml(highlight_syntax=True)#

Get formatted content.

Parameters:

highlight_syntax (bool) – Whether to apply syntax highlighting

Returns:

Formatted content string

Return type:

str

to_str()#

Get raw content.

Returns:

Raw content string

Return type:

str

class XHTMLContent#

Specialized class for XHTML content documents.

Inherits from Content with additional XHTML-specific methods.

to_plain()#

Get plain text content with HTML tags stripped.

Returns:

Plain text string

Return type:

str

Example:

from epub_utils.content import XHTMLContent

# This would typically be accessed through Document
# content = XHTMLContent(raw_html)
# plain_text = content.to_plain()

Exception Classes#

exception ParseError#

Raised when there’s an error parsing EPUB content.

Base class: Exception

Example:

from epub_utils import Document
from epub_utils.exceptions import ParseError

try:
    doc = Document("corrupted.epub")
    title = doc.package.metadata.title
except ParseError as e:
    print(f"Failed to parse EPUB: {e}")
except FileNotFoundError:
    print("EPUB file not found")

Usage Examples#

Basic Usage#

from epub_utils import Document

# Load document
doc = Document("book.epub")

# Access metadata
metadata = doc.package.metadata
print(f"Title: {metadata.title}")
print(f"Author: {metadata.creator}")

# Check file structure
files = doc.get_files_info()
print(f"Contains {len(files)} files")

# Get formatted output
toc_xml = doc.toc.to_xml()
metadata_kv = metadata.to_kv()

Error Handling#

from epub_utils import Document
from epub_utils.exceptions import ParseError

def safe_load_epub(path):
    try:
        doc = Document(path)
        return {
            'status': 'success',
            'document': doc,
            'title': getattr(doc.package.metadata, 'title', 'Unknown')
        }
    except ParseError as e:
        return {
            'status': 'parse_error',
            'error': str(e)
        }
    except FileNotFoundError:
        return {
            'status': 'file_not_found',
            'error': 'EPUB file not found'
        }
    except Exception as e:
        return {
            'status': 'unknown_error',
            'error': str(e)
        }

Batch Processing#

import os
from pathlib import Path
from epub_utils import Document

def process_epub_directory(directory):
    epub_files = Path(directory).glob("*.epub")
    results = []

    for epub_path in epub_files:
        try:
            doc = Document(str(epub_path))
            metadata = doc.package.metadata

            result = {
                'file': epub_path.name,
                'title': getattr(metadata, 'title', ''),
                'author': getattr(metadata, 'creator', ''),
                'language': getattr(metadata, 'language', ''),
                'file_size': epub_path.stat().st_size,
                'epub_files': len(doc.get_files_info())
            }
            results.append(result)

        except Exception as e:
            results.append({
                'file': epub_path.name,
                'error': str(e)
            })

    return results

Type Hints#

For better IDE support and type checking, here are the main type hints:

from typing import Dict, List, Union, Optional
from epub_utils import Document

# Function signatures for reference
def get_files_info(self) -> List[Dict[str, Union[str, int]]]: ...
def list_files(self) -> List[Dict[str, str]]: ...
def to_xml(self, highlight_syntax: bool = True) -> str: ...
def to_str(self) -> str: ...
def to_kv(self) -> str: ...

# Type-safe usage example
doc: Document = Document("book.epub")
files_info: List[Dict[str, Union[str, int]]] = doc.get_files_info()
title: str = doc.package.metadata.title
kv_data: str = doc.package.metadata.to_kv()

Module Structure#

The epub-utils package is organized as follows:

epub_utils/
├── __init__.py          # Main exports (Document, Container)
├── doc.py               # Document class
├── container.py         # Container class
├── package/
│   ├── __init__.py      # Package class
│   ├── metadata.py      # Metadata class
│   ├── manifest.py      # Manifest class
│   └── spine.py         # Spine class
├── content/
│   ├── __init__.py      # Content classes
│   ├── base.py          # Base Content class
│   └── xhtml.py         # XHTMLContent class
├── toc.py               # TableOfContents class
├── exceptions.py        # Exception classes
├── highlighters.py      # Syntax highlighting utilities
└── cli.py               # Command-line interface

For detailed implementation examples, see Use as a Python library and Examples and Use Cases.