Data Fetcher Classes =================== The core of Metapub consists of several fetcher classes that provide access to different biomedical databases. All fetchers use the Borg singleton pattern and include comprehensive caching. **🔄 Borg Singleton Pattern** Metapub fetchers use the Borg pattern, which means all instances of the same fetcher class share the same state (cache, configuration, etc.). This provides several benefits: - **Shared cache:** Multiple ``PubMedFetcher()`` instances automatically share cached data - **Consistent configuration:** API keys and settings apply across all instances - **Memory efficiency:** No duplicate caches or redundant API calls - **Consistency:** Safe to use across different parts of your application .. code-block:: python # These two fetchers share the same cache and configuration fetch1 = PubMedFetcher() fetch2 = PubMedFetcher() # Article cached by fetch1 is immediately available to fetch2 article = fetch1.article_by_pmid('12345678') same_article = fetch2.article_by_pmid('12345678') # Uses cache, no API call PubMedFetcher ------------ .. currentmodule:: metapub .. autoclass:: PubMedFetcher :members: :show-inheritance: The PubMedFetcher is the primary interface for accessing PubMed literature via NCBI's E-utilities API. It provides methods for: * **Article retrieval** by PMID, DOI, or PMC ID * **Literature searches** with complex query support * **Citation-based lookups** for bibliographic matching * **Related article discovery** using NCBI's eLink service **NCBI E-utilities Documentation:** `PubMed E-utilities `_ | `PubMed Search Field Descriptions `_ Key Methods ~~~~~~~~~~ .. automethod:: PubMedFetcher.__init__ .. automethod:: PubMedFetcher.article_by_pmid .. automethod:: PubMedFetcher.article_by_doi .. automethod:: PubMedFetcher.article_by_pmcid .. automethod:: PubMedFetcher.pmids_for_query .. automethod:: PubMedFetcher.pmids_for_citation .. automethod:: PubMedFetcher.related_pmids Example Usage ~~~~~~~~~~~~ .. code-block:: python from metapub import PubMedFetcher # Initialize fetcher fetch = PubMedFetcher() # Get specific article article = fetch.article_by_pmid('33157158') print(f"Title: {article.title}") print(f"Journal: {article.journal}") print(f"DOI: {article.doi}") # Search for articles pmids = fetch.pmids_for_query( query='CRISPR gene editing', since='2020/01/01', retmax=100 ) # Citation-based lookup citation_pmids = fetch.pmids_for_citation( journal='Nature', year=2023, volume=615, first_page=123, aulast='Smith' ) MedGenFetcher ------------ .. autoclass:: MedGenFetcher :members: :show-inheritance: The MedGenFetcher provides access to NCBI's MedGen database for medical genetics concepts and disease-gene relationships. **NCBI MedGen Documentation:** `MedGen Database `_ | `MedGen Help `_ Key Methods ~~~~~~~~~~ .. automethod:: MedGenFetcher.__init__ .. automethod:: MedGenFetcher.uids_by_term .. automethod:: MedGenFetcher.concept_by_uid .. automethod:: MedGenFetcher.concept_by_cui .. automethod:: MedGenFetcher.uid_for_cui .. automethod:: MedGenFetcher.pubmeds_for_cui Example Usage ~~~~~~~~~~~~ .. code-block:: python from metapub import MedGenFetcher # Initialize fetcher mg = MedGenFetcher() # Search for genetic condition uids = mg.uids_by_term('Brugada syndrome') # Get detailed concept information for uid in uids[:3]: # First 3 results concept = mg.concept_by_uid(uid) print(f"Name: {concept.name}") print(f"CUI: {concept.cui}") print(f"Definition: {concept.definition}") # Get related literature pmids = mg.pubmeds_for_cui(concept.cui) print(f"Related papers: {len(pmids)}") ClinVarFetcher ------------- .. autoclass:: ClinVarFetcher :members: :show-inheritance: The ClinVarFetcher provides access to NCBI's ClinVar database for clinical significance of genetic variants. **NCBI ClinVar Documentation:** `ClinVar Database `_ | `ClinVar API Guide `_ **Note:** unlike the ClinVar clinical significance classes, `clinical_significance` values are in all lowercase--this was a conscious decision documented futher `here `_ Key Methods ~~~~~~~~~~ .. automethod:: ClinVarFetcher.__init__ .. automethod:: ClinVarFetcher.ids_by_gene .. automethod:: ClinVarFetcher.variant .. automethod:: ClinVarFetcher.pmids_for_id .. automethod:: ClinVarFetcher.pmids_for_hgvs Example Usage ~~~~~~~~~~~~ .. code-block:: python from metapub import ClinVarFetcher # Initialize fetcher cv = ClinVarFetcher() # Find variants for a gene variant_ids = cv.ids_by_gene('BRCA1', single_gene=True) # Get detailed variant information for var_id in variant_ids[:5]: # First 5 variants variant = cv.variant(var_id) print(f"Accession: {variant.accession}") print(f"HGVS: {variant.hgvs_c}") print(f"Clinical significance: {variant.clinical_significance}") print(f"Molecular consequences: {variant.molecular_consequences}") # Get supporting literature pmids = cv.pmids_for_id(var_id) print(f"Supporting papers: {len(pmids)}") CrossRefFetcher -------------- .. autoclass:: CrossRefFetcher :members: :show-inheritance: The CrossRefFetcher provides access to CrossRef's API for DOI resolution and publication metadata when PubMed data is incomplete. **CrossRef API Documentation:** `CrossRef REST API `_ | `Works API Reference `_ Example Usage ~~~~~~~~~~~~ .. code-block:: python from metapub import CrossRefFetcher, PubMedFetcher # Initialize fetchers fetch = PubMedFetcher() cr = CrossRefFetcher() # Get article that might be missing DOI in PubMed article = fetch.article_by_pmid('12345678') if not article.doi: # Try CrossRef as fallback work = cr.article_by_pma(article) if work and work.score > 80: # High confidence match print(f"Found DOI via CrossRef: {work.doi}") print(f"Match score: {work.score}") Advanced Configuration --------------------- Custom Cache Directory ~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python import os # Set custom cache directory os.environ['METAPUB_CACHE_DIR'] = '/path/to/large/cache' # Or specify per-fetcher fetch = PubMedFetcher(cachedir='/custom/cache/path') NCBI API Key Setup ~~~~~~~~~~~~~~~~~ **📈 Why Use an API Key?** NCBI provides free API keys that increase your rate limits from 3 to 10 requests per second, essential for production applications and large-scale data collection. **🔑 Getting Your API Key** 1. **Apply for a key:** `NCBI API Key Registration `_ 2. **No approval needed** - keys are issued immediately 3. **Free for academic and commercial use** **⚙️ Configuration Options** .. code-block:: python import os # Method 1: Environment variable (recommended) os.environ['NCBI_API_KEY'] = 'your_api_key_here' # Method 2: Direct parameter fetch = PubMedFetcher(api_key='your_api_key_here') # Method 3: Config file # Create ~/.metapub/config with: # [DEFAULT] # ncbi_api_key = your_api_key_here **🚀 Rate Limit Benefits** - **Without API key:** 3 requests/second - **With API key:** 10 requests/second - **Large datasets:** 3x faster processing - **Production reliability:** Reduced throttling errors Error Handling Patterns ~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python from metapub.exceptions import MetaPubError, InvalidPMID, NCBIServiceError try: article = fetch.article_by_pmid('12345678') except InvalidPMID: print("Invalid PMID provided") except NCBIServiceError as e: print(f"NCBI service issue: {e.user_message}") print(f"Suggested actions: {e.suggested_actions}") except MetaPubError as e: print(f"General MetaPub error: {e}") Performance Considerations ------------------------- Batch Processing ~~~~~~~~~~~~~~~ .. code-block:: python # Process large lists efficiently pmids = ['12345678', '23456789', '34567890'] # ... many more for i, pmid in enumerate(pmids): if i % 100 == 0: print(f"Progress: {i}/{len(pmids)}") try: article = fetch.article_by_pmid(pmid) # Process article... except Exception as e: print(f"Error with {pmid}: {e}") continue Cache Warming ~~~~~~~~~~~~ .. code-block:: python # Pre-warm cache for known PMIDs def warm_cache(pmid_list): for pmid in pmid_list: try: # Just accessing loads into cache article = fetch.article_by_pmid(pmid) except Exception: continue