API Overview

Metapub provides a comprehensive Python API for accessing biomedical literature and databases. The library is organized into several core modules, each serving specific functionality.

Core Modules

Data Retrieval Classes

These are the primary classes for fetching data from various NCBI databases:

PubMedFetcher

Primary interface for PubMed/NCBI literature searches. Supports article retrieval by PMID, DOI, PMC ID, and complex query searches.

MedGenFetcher

Access to NCBI’s MedGen database for medical genetics concepts, disease-gene relationships, and clinical phenotypes.

ClinVarFetcher

Interface to ClinVar database for clinical significance of genetic variants and variant-literature associations.

CrossRefFetcher

CrossRef API integration for DOI resolution and publication metadata when PubMed data is incomplete.

Data Model Classes

These classes represent structured data returned by the fetcher classes:

PubMedArticle

Rich representation of a scientific article with automatic parsing of titles, authors, abstracts, MeSH terms, and bibliographic details.

MedGenConcept

Medical genetics concept with CUI identifiers, definitions, synonyms, and related literature.

ClinVarVariant

Clinical variant with HGVS notation, clinical significance, molecular consequences, and supporting evidence.

Full-Text Discovery

FindIt

Sophisticated system for locating full-text PDFs using publisher-specific strategies. Supports 68+ major publishers (97.1% coverage) with embargo detection, CrossRef API integration, and legal access verification. Includes pre-populated journal registry for out-of-the-box functionality.

Utility Functions

Text Mining and Validation

Conversion and Citation

Error Handling

Common Usage Patterns

Basic Article Retrieval

from metapub import PubMedFetcher

# Initialize fetcher (singleton pattern)
fetch = PubMedFetcher()

# Get article by PMID
article = fetch.article_by_pmid('12345678')
print(f"{article.title} - {article.journal} ({article.year})")

Full-Text Discovery

from metapub import FindIt

# Find PDF for an article
src = FindIt('12345678')  # PMID

if src.url:
    print(f"PDF available: {src.url}")
else:
    print(f"No access: {src.reason}")

Medical Genetics Research

from metapub import MedGenFetcher, ClinVarFetcher

# Research genetic condition
mg = MedGenFetcher()
concepts = mg.concepts_for_term('cystic fibrosis')

# Find clinical variants
cv = ClinVarFetcher()
variants = cv.variants_for_gene('CFTR')

Architecture Notes

Singleton Pattern

Most fetcher classes use the Borg singleton pattern, meaning all instances share the same state and cache. This ensures efficient resource usage and consistent caching across your application.

Caching Strategy

  • SQLite-based caching for all API responses

  • Configurable cache directories via environment variables

  • TTL-based cache expiration to ensure data freshness

  • Cache warming capabilities for batch processing

Error Handling

  • Intelligent error diagnosis distinguishes between service outages and code issues

  • Automatic retry logic for transient network failures

  • Comprehensive exception hierarchy for specific error handling

  • Graceful degradation when services are unavailable

API Keys and Rate Limiting

  • NCBI API key support via environment variables for higher rate limits

  • Built-in rate limiting respects NCBI guidelines

  • Request batching for efficient bulk operations

See Also