EPUB Standards#
Understanding EPUB Specifications#
EPUB (Electronic Publication) is an open standard for digital books and publications. This guide covers the EPUB specifications and how epub-utils ensures compliance.
EPUB 3.3 Specification#
Current Standard#
EPUB 3.3 is the current specification, published by the W3C. It defines:
Package Document: Contains metadata, manifest, and spine
Container Format: ZIP-based archive structure
Content Documents: XHTML5, SVG, and other media types
Navigation Document: Replaces NCX for table of contents
Key Components#
Container Structure#
book.epub
├── META-INF/
│ ├── container.xml # Points to package document
│ └── signatures.xml # Digital signatures (optional)
├── OEBPS/ # Content folder (common name)
│ ├── package.opf # Package document
│ ├── nav.xhtml # Navigation document
│ ├── content/ # Text content
│ ├── images/ # Images
│ ├── styles/ # CSS files
│ └── fonts/ # Font files (optional)
└── mimetype # Must be first file, uncompressed
Package Document (OPF)#
The package document defines three main sections:
Metadata Section:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title>Book Title</dc:title>
<dc:creator>Author Name</dc:creator>
<dc:identifier id="bookid">urn:uuid:12345</dc:identifier>
<dc:language>en</dc:language>
<meta property="dcterms:modified">2024-01-01T00:00:00Z</meta>
</metadata>
Manifest Section:
<manifest>
<item id="nav" href="nav.xhtml" media-type="application/xhtml+xml"
properties="nav"/>
<item id="chapter1" href="content/chapter1.xhtml"
media-type="application/xhtml+xml"/>
<item id="cover-image" href="images/cover.jpg"
media-type="image/jpeg" properties="cover-image"/>
</manifest>
Spine Section:
<spine>
<itemref idref="chapter1"/>
<itemref idref="chapter2"/>
</spine>
EPUB Compliance with epub-utils#
Validation Capabilities#
epub-utils helps ensure EPUB compliance by:
Structure Validation: Checks container format
Metadata Validation: Verifies required elements
Manifest Validation: Ensures all files are declared
Spine Validation: Checks reading order
Content Validation: Basic XHTML structure checks
Checking Compliance#
Use epub-utils to validate EPUB structure:
# Check basic structure
epub-utils info book.epub
# Detailed manifest information
epub-utils manifest book.epub --format table
# Extract and examine package document
epub-utils extract book.epub --output-dir temp/
cat temp/OEBPS/package.opf
Python API for Validation#
from epub_utils import Document
def validate_epub_structure(epub_path):
"""Validate basic EPUB structure."""
try:
doc = Document(epub_path)
# Check required components
checks = {
'has_container': hasattr(doc, 'container'),
'has_package': hasattr(doc, 'package'),
'has_metadata': len(doc.metadata) > 0,
'has_manifest': len(doc.manifest) > 0,
'has_spine': len(doc.spine) > 0,
}
# Check required metadata
required_metadata = ['title', 'language', 'identifier']
metadata_present = {}
for item in doc.metadata:
for req in required_metadata:
if req in item.get('name', '').lower():
metadata_present[req] = True
print("Structure Validation:")
for check, passed in checks.items():
status = "✓" if passed else "✗"
print(f" {status} {check}")
print("\nRequired Metadata:")
for req in required_metadata:
status = "✓" if metadata_present.get(req) else "✗"
print(f" {status} {req}")
return all(checks.values()) and len(metadata_present) >= 2
except Exception as e:
print(f"Validation failed: {e}")
return False
Common Compliance Issues#
Missing Required Elements#
Problem: EPUB missing required metadata
# Check metadata completeness
epub-utils metadata book.epub --format table
Solution: Ensure these elements are present:
dc:title
dc:language
dc:identifier
(with unique ID)meta property="dcterms:modified"
(EPUB 3)
Invalid File References#
Problem: Manifest references files that don’t exist
def check_file_references(epub_path):
"""Check if all manifest files exist in the archive."""
doc = Document(epub_path)
missing_files = []
for item in doc.manifest:
file_path = item.get('href')
if file_path:
# Check if file exists in the EPUB
try:
# This would need zip file checking
pass
except:
missing_files.append(file_path)
if missing_files:
print("Missing files referenced in manifest:")
for file in missing_files:
print(f" - {file}")
Incorrect MIME Types#
Problem: Wrong media-type attributes in manifest
Common correct MIME types:
XHTML:
application/xhtml+xml
CSS:
text/css
JPEG:
image/jpeg
PNG:
image/png
NCX:
application/x-dtbncx+xml
EPUB 2 vs EPUB 3 Differences#
Format Evolution#
Feature |
EPUB 2 |
EPUB 3 |
---|---|---|
Navigation |
NCX file required |
XHTML nav document |
Content Types |
XHTML 1.1, limited |
XHTML5, SVG, MathML |
Metadata |
Dublin Core only |
Enhanced metadata |
Accessibility |
Limited |
Rich accessibility |
Scripting |
Not allowed |
Limited JavaScript |
Migration Considerations#
When working with older EPUB 2 files:
def detect_epub_version(epub_path):
"""Detect EPUB version from package document."""
doc = Document(epub_path)
# Check package document for version attribute
# This is a simplified example
for item in doc.manifest:
if 'nav' in item.get('properties', ''):
return "EPUB 3"
# Check for NCX file (EPUB 2 indicator)
for item in doc.manifest:
if item.get('media-type') == 'application/x-dtbncx+xml':
return "EPUB 2"
return "Unknown"
Best Practices for Compliance#
Metadata Best Practices#
Always include required elements:
<dc:title>Complete Book Title</dc:title> <dc:creator>Author Full Name</dc:creator> <dc:identifier id="bookid">urn:uuid:unique-identifier</dc:identifier> <dc:language>en-US</dc:language>
Use proper Dublin Core refinements:
<dc:creator id="author">Jane Doe</dc:creator> <meta refines="#author" property="role" scheme="marc:relators">aut</meta>
Include modification date for EPUB 3:
<meta property="dcterms:modified">2024-05-25T10:30:00Z</meta>
File Organization#
Use consistent folder structure
Declare all files in manifest
Use proper MIME types
Include fallbacks for specialized content
Content Guidelines#
Valid XHTML: Ensure all content files are well-formed
Proper encoding: Use UTF-8 encoding
Relative links: Use relative paths for internal references
Alt text: Include alt attributes for images
Testing and Validation Tools#
External Validators#
EPUBCheck: Official EPUB validator
Ace by DAISY: Accessibility checker
pagina EPUB-Checker: Online validator
Integration with epub-utils#
# Basic structure check
epub-utils info book.epub
# Export for external validation
epub-utils extract book.epub --output-dir validation/
# Run EPUBCheck on extracted content
# Check specific components
epub-utils manifest book.epub --format xml > manifest.xml
epub-utils metadata book.epub --format xml > metadata.xml
Future Standards#
EPUB 3.3 and Beyond#
Current developments in EPUB standards:
Enhanced accessibility features
Better multimedia support
Improved metadata vocabularies
Web standards alignment
Staying Current#
Monitor W3C EPUB Working Group
Test with latest validators
Follow accessibility guidelines (WCAG)
Use semantic markup