Metapub Documentation
Metapub is a Python library for accessing biomedical literature and databases. It provides Python objects fetched via eutils that represent PubMed papers and concepts found within NCBI databases.
🌐 Project Homepage: metapub.org
Created and maintained by Naomi Most
What Metapub Does
- 🔬 Literature Access
Metapub handles the complexity of NCBI’s E-utilities APIs, providing clean Python interfaces for literature searches and article retrieval.
- 📚 Multi-Database Support
PubMed: Biomedical literature citations and abstracts
MedGen: Medical genetics concepts and disease-gene relationships
ClinVar: Clinical significance of genetic variants
CrossRef: DOI resolution and publication metadata
- 🔓 PDF Discovery
The FindIt module locates downloadable PDFs using publisher-specific strategies for major academic publishers.
- ⚙️ Research Tools
Intelligent caching with SQLite backends
Comprehensive error handling and diagnostics
Rate limiting that respects NCBI guidelines
Batch processing capabilities for large datasets
Key Use Cases
- Literature Analysis
Programmatically collect and analyze large sets of biomedical literature with comprehensive metadata extraction.
- Clinical Genetics Workflows
Connect genetic variants from ClinVar with supporting literature for evidence-based analysis.
- Bioinformatics Integration
Integrate literature data into analysis pipelines for automatic annotation of genomic findings.
- Research Applications
Build biomedical research tools with standardized access to multiple literature databases.
What You Can Do in Minutes
Find Full-Text PDFs Instantly
from metapub import FindIt
# Get downloadable PDF for any paper
src = FindIt('33157158') # PMID
if src.url:
print(f"📄 PDF available: {src.url}")
else:
print(f"❌ Access restricted: {src.reason}")
Build Literature Datasets
from metapub import PubMedFetcher
fetch = PubMedFetcher()
# Collect recent CRISPR research
pmids = fetch.pmids_for_query(
'CRISPR gene editing',
since='2023/01/01',
retmax=100
)
# Extract comprehensive metadata
for pmid in pmids:
article = fetch.article_by_pmid(pmid)
print(f"{article.journal} ({article.year}): {article.title}")
Research Genetic Conditions
from metapub import MedGenFetcher, ClinVarFetcher
# Investigate Brugada syndrome
mg = MedGenFetcher()
concepts = mg.concepts_for_term('Brugada syndrome')
# Find clinical variants
cv = ClinVarFetcher()
variant_ids = cv.ids_by_gene('SCN5A')
# Get supporting literature for each variant
for var_id in variant_ids[:5]:
pmids = cv.pmids_for_id(var_id)
print(f"Variant {var_id}: {len(pmids)} supporting papers")
Core Features
- 🏢 Publisher Intelligence
FindIt includes strategies for major publishers with knowledge of their access patterns, embargo policies, and URL structures.
- 🧬 Genomics Focus
Built with genomics research in mind, providing native support for gene-disease relationships, variant annotations, and clinical significance data.
- ⚡ Performance Features
Thread-safe SQLite caching with persistent storage
Automatic rate limiting respecting NCBI guidelines
Response validation preventing error caching
Batch processing optimizations for large datasets
Memory-efficient XML parsing
- 🛡️ Reliability
Error handling that distinguishes service outages from code issues
Automatic retry logic for transient failures
Extensive logging for debugging and monitoring
NCBI API key support for higher rate limits
- 🔄 Standards Support
Follows NCBI E-utilities best practices
Respects publisher robots.txt and access policies
Generates properly formatted citations
Supports standard identifiers (PMID, DOI, PMC ID, HGVS)
Getting Started
Installation
pip install metapub
Quick Start
from metapub import PubMedFetcher, FindIt
# Search and analyze literature
fetch = PubMedFetcher()
pmids = fetch.pmids_for_query('machine learning genomics', retmax=10)
# Check PDF availability
accessible_papers = []
for pmid in pmids:
article = fetch.article_by_pmid(pmid)
src = FindIt(pmid)
if src.url:
accessible_papers.append({
'title': article.title,
'journal': article.journal,
'pdf_url': src.url
})
print(f"Found {len(accessible_papers)} papers with accessible PDFs")
Next Steps
Ready to transform your biomedical research workflow? Start with our comprehensive guides:
User Guide:
API Reference:
Community & Support:
Who Uses Metapub
🎯 Research Teams building literature reviews, systematic analyses, and evidence synthesis workflows.
🧬 Genomics Labs connecting genetic variants with supporting literature for clinical decision-making.
⚙️ Developers creating biomedical applications, research tools, and automated analysis pipelines.
📊 Institutions including pharmaceutical companies, academic institutions, and government research agencies.
For peer-reviewed academic citations, see metapub.org/citations.
Documentation Navigation
📚 New to Metapub? Start with Quick Start Guide for basic usage patterns.
🔧 Building Applications? See API Overview for architectural guidance.
💡 Looking for Examples? Check Examples and Tutorials for complete workflows.
🌐 Project Updates: Visit metapub.org for news and community resources.
The documentation provides comprehensive coverage for both simple scripts and production applications.