Use as a command-line tool#
This tutorial will guide you through using epub-utils
from the command line. We’ll cover all
available commands with practical examples and tips for everyday usage.
Getting Started#
The basic syntax for epub-utils is:
epub-utils [OPTIONS] EPUB_FILE COMMAND [COMMAND_OPTIONS]
Let’s start with a simple example:
# Display help
epub-utils --help
# Check version
epub-utils --version
Basic File Inspection#
Container Information#
The container command shows the EPUB’s container.xml file, which points to the main package file:
# Show container with syntax highlighting (default)
epub-utils book.epub container
# Show raw XML without highlighting
epub-utils book.epub container --format raw
# Show container with pretty formatting
epub-utils book.epub container --pretty-print
Example output:
<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/>
</rootfiles>
</container>
Package Information#
The package command displays the main OPF (Open Packaging Format) file:
# Show package file with highlighting
epub-utils book.epub package
# Show raw package content
epub-utils book.epub package --format raw
# Show package with pretty formatting
epub-utils book.epub package --pretty-print
This reveals the complete EPUB structure including metadata, manifest, and spine.
Working with Metadata#
Extracting Metadata#
The metadata command is perfect for getting book information:
# Pretty-printed metadata with highlighting
epub-utils book.epub metadata
# Key-value format for scripting
epub-utils book.epub metadata --format kv
# Metadata with pretty formatting
epub-utils book.epub metadata --pretty-print
Example key-value output:
title: The Great Gatsby
creator: F. Scott Fitzgerald
language: en
identifier: urn:uuid:12345678-1234-1234-1234-123456789abc
publisher: Scribner
date: 2021-01-01
subject: Fiction, Classic Literature
Scripting with Metadata#
The key-value format is perfect for shell scripting:
# Extract just the title
epub-utils book.epub metadata --format kv | grep "^title:" | cut -d' ' -f2-
# Get author name
author=$(epub-utils book.epub metadata --format kv | grep "^creator:" | cut -d' ' -f2-)
echo "Author: $author"
# Batch process multiple files
for epub in *.epub; do
title=$(epub-utils "$epub" metadata --format kv | grep "^title:" | cut -d' ' -f2-)
echo "$epub: $title"
done
Understanding EPUB Structure#
Table of Contents#
View the navigation structure of your EPUB:
# Show table of contents with highlighting (auto-detect format)
epub-utils book.epub toc
# Raw TOC for processing
epub-utils book.epub toc --format raw
# TOC with pretty formatting
epub-utils book.epub toc --pretty-print
EPUB Version-Specific Access:
For precise control over which navigation format to access:
# Force NCX format (EPUB 2 navigation control file)
epub-utils book.epub toc --ncx
# Force Navigation Document (EPUB 3 navigation file)
epub-utils book.epub toc --nav
Use Cases:
Use
--ncx
when you specifically need the EPUB 2 style navigation or want to access backward-compatible NCX in EPUB 3Use
--nav
when you specifically need the EPUB 3 Navigation Document featuresUse the default (no flags) for general TOC access that works with any EPUB version
Manifest Inspection#
The manifest lists all files contained in the EPUB:
# View manifest with syntax highlighting
epub-utils book.epub manifest
# Raw manifest content
epub-utils book.epub manifest --format raw
# Manifest with pretty formatting
epub-utils book.epub manifest --pretty-print
What you’ll see: Each item in the manifest includes:
- id
: Unique identifier for the item
- href
: File path within the EPUB
- media-type
: MIME type of the file
Spine Information#
The spine defines the reading order of the book:
# View spine with highlighting
epub-utils book.epub spine
# Raw spine for processing
epub-utils book.epub spine --format raw
Content Extraction#
Viewing Document Content#
Extract content from specific documents using their manifest ID:
# Show content with syntax highlighting
epub-utils book.epub content chapter1
# Raw HTML/XHTML content
epub-utils book.epub content chapter1 --format raw
# Plain text (HTML tags stripped)
epub-utils book.epub content chapter1 --format plain
Finding Content IDs: Use the manifest command to see available content IDs:
# First, check the manifest for available IDs
epub-utils book.epub manifest
# Then extract specific content
epub-utils book.epub content intro --format plain
File Listing and Content Access#
Get detailed information about all files in the EPUB, or access specific file content:
# Formatted table of files
epub-utils book.epub files
# Raw file list
epub-utils book.epub files --format raw
# Display content of a specific file by path
epub-utils book.epub files OEBPS/chapter1.xhtml
# Access different file types
epub-utils book.epub files META-INF/container.xml
epub-utils book.epub files OEBPS/styles/main.css
epub-utils book.epub files OEBPS/images/cover.jpg
# Different output formats for XHTML content
epub-utils book.epub files OEBPS/chapter1.xhtml --format raw
epub-utils book.epub files OEBPS/chapter1.xhtml --format xml --pretty-print
epub-utils book.epub files OEBPS/chapter1.xhtml --format plain
Key advantages of the files command:
Access any file in the EPUB archive by its path
No need to know manifest item IDs
Works with all file types (XHTML, CSS, XML, images, etc.)
Complements the
content
command which uses manifest IDs
Content Analysis#
Analyze EPUB content structure:
#!/bin/bash
# analyze-content.sh - Analyze EPUB content structure
epub_file="$1"
echo "=== Content Analysis for $epub_file ==="
# Get all content files from manifest
epub-utils "$epub_file" manifest --format raw | \
grep 'media-type="application/xhtml+xml"' | \
sed 's/.*id="\([^"]*\)".*/\1/' | \
while read -r content_id; do
echo "--- Content ID: $content_id ---"
word_count=$(epub-utils "$epub_file" content "$content_id" --format plain | wc -w)
echo "Word count: $word_count"
echo ""
done
Output Format Options#
epub-utils supports multiple output formats for different use cases:
XML Format (Default)#
epub-utils book.epub metadata
# Produces syntax-highlighted, formatted XML
Raw Format#
epub-utils book.epub metadata --format raw
# Produces unformatted XML, perfect for piping to other tools
Key-Value Format#
epub-utils book.epub metadata --format kv
# Produces key: value pairs, ideal for scripting
Plain Text Format#
epub-utils book.epub content chapter1 --format plain
# Strips HTML tags, produces readable text
Pretty-Print Option#
Use the --pretty-print
(or -pp
) option to format XML output with proper indentation:
# Default output (compact XML)
epub-utils book.epub metadata --format raw
# Pretty-formatted output (with indentation)
epub-utils book.epub metadata --format raw --pretty-print
# Works with syntax highlighting too
epub-utils book.epub package --pretty-print
Next Steps#
Now that you’re familiar with the CLI basics, you might want to:
Explore the Use as a Python library for programmatic access
Check out more Examples and Use Cases for real-world use cases
Learn about EPUB Standards for deeper understanding
Contribute to the project via Contributing