Metapub Documentation ==================== .. image:: https://img.shields.io/pypi/dm/metapub :alt: PyPI - Monthly Downloads :target: https://pypi.org/project/metapub/ **Metapub** is a Python library for accessing biomedical literature and databases. It provides Python objects fetched via eutils that represent PubMed papers and concepts found within NCBI databases. 🌐 **Project Homepage:** `metapub.org `_ **Created and maintained by** `Naomi Most `_ What Metapub Does ---------------- 🔬 **Literature Access** Metapub handles the complexity of NCBI's E-utilities APIs, providing clean Python interfaces for literature searches and article retrieval. 📚 **Multi-Database Support** - **PubMed**: Biomedical literature citations and abstracts - **MedGen**: Medical genetics concepts and disease-gene relationships - **ClinVar**: Clinical significance of genetic variants - **CrossRef**: DOI resolution and publication metadata 🔓 **PDF Discovery** The FindIt module locates downloadable PDFs using publisher-specific strategies for major academic publishers. ⚙️ **Research Tools** - Intelligent caching with SQLite backends - Comprehensive error handling and diagnostics - Rate limiting that respects NCBI guidelines - Batch processing capabilities for large datasets Key Use Cases ------------ **Literature Analysis** Programmatically collect and analyze large sets of biomedical literature with comprehensive metadata extraction. **Clinical Genetics Workflows** Connect genetic variants from ClinVar with supporting literature for evidence-based analysis. **Bioinformatics Integration** Integrate literature data into analysis pipelines for automatic annotation of genomic findings. **Research Applications** Build biomedical research tools with standardized access to multiple literature databases. What You Can Do in Minutes -------------------------- **Find Full-Text PDFs Instantly** .. code-block:: python from metapub import FindIt # Get downloadable PDF for any paper src = FindIt('33157158') # PMID if src.url: print(f"📄 PDF available: {src.url}") else: print(f"❌ Access restricted: {src.reason}") **Build Literature Datasets** .. code-block:: python from metapub import PubMedFetcher fetch = PubMedFetcher() # Collect recent CRISPR research pmids = fetch.pmids_for_query( 'CRISPR gene editing', since='2023/01/01', retmax=100 ) # Extract comprehensive metadata for pmid in pmids: article = fetch.article_by_pmid(pmid) print(f"{article.journal} ({article.year}): {article.title}") **Research Genetic Conditions** .. code-block:: python from metapub import MedGenFetcher, ClinVarFetcher # Investigate Brugada syndrome mg = MedGenFetcher() concepts = mg.concepts_for_term('Brugada syndrome') # Find clinical variants cv = ClinVarFetcher() variant_ids = cv.ids_by_gene('SCN5A') # Get supporting literature for each variant for var_id in variant_ids[:5]: pmids = cv.pmids_for_id(var_id) print(f"Variant {var_id}: {len(pmids)} supporting papers") Core Features ------------ **🏢 Publisher Intelligence** FindIt includes strategies for major publishers with knowledge of their access patterns, embargo policies, and URL structures. **🧬 Genomics Focus** Built with genomics research in mind, providing native support for gene-disease relationships, variant annotations, and clinical significance data. **⚡ Performance Features** - Thread-safe SQLite caching with persistent storage - Automatic rate limiting respecting NCBI guidelines - Response validation preventing error caching - Batch processing optimizations for large datasets - Memory-efficient XML parsing **🛡️ Reliability** - Error handling that distinguishes service outages from code issues - Automatic retry logic for transient failures - Extensive logging for debugging and monitoring - NCBI API key support for higher rate limits **🔄 Standards Support** - Follows NCBI E-utilities best practices - Respects publisher robots.txt and access policies - Generates properly formatted citations - Supports standard identifiers (PMID, DOI, PMC ID, HGVS) Getting Started -------------- **Installation** .. code-block:: bash pip install metapub **Quick Start** .. code-block:: python from metapub import PubMedFetcher, FindIt # Search and analyze literature fetch = PubMedFetcher() pmids = fetch.pmids_for_query('machine learning genomics', retmax=10) # Check PDF availability accessible_papers = [] for pmid in pmids: article = fetch.article_by_pmid(pmid) src = FindIt(pmid) if src.url: accessible_papers.append({ 'title': article.title, 'journal': article.journal, 'pdf_url': src.url }) print(f"Found {len(accessible_papers)} papers with accessible PDFs") **Next Steps** Ready to transform your biomedical research workflow? Start with our comprehensive guides: .. toctree:: :maxdepth: 2 :caption: User Guide: installation quickstart examples citation_formatting pubmedarticle_properties advanced tutorials command_line_tools error_handling .. toctree:: :maxdepth: 2 :caption: API Reference: api api_fetchers api_models api_findit api/modules .. toctree:: :maxdepth: 1 :caption: Community & Support: showcase changelog Who Uses Metapub --------------- **🎯 Research Teams** building literature reviews, systematic analyses, and evidence synthesis workflows. **🧬 Genomics Labs** connecting genetic variants with supporting literature for clinical decision-making. **⚙️ Developers** creating biomedical applications, research tools, and automated analysis pipelines. **📊 Institutions** including pharmaceutical companies, academic institutions, and government research agencies. For peer-reviewed academic citations, see `metapub.org/citations `_. **Documentation Navigation** 📚 **New to Metapub?** Start with :doc:`quickstart` for basic usage patterns. 🔧 **Building Applications?** See :doc:`api` for architectural guidance. 💡 **Looking for Examples?** Check :doc:`examples` and :doc:`tutorials` for complete workflows. 🌐 **Project Updates:** Visit `metapub.org `_ for news and community resources. The documentation provides comprehensive coverage for both simple scripts and production applications.