Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
Documentation for World Historical Gazetteer Latest Release
Logo
Documentation for World Historical Gazetteer Latest Release
  • Introduction
  • Guides & Tutorials
    • 1. Our Indexes
    • 2. Workbench
    • 3. Publishing Data
    • 4. Uploading Data
    • 5. Reconciliation & Accessioning
    • 6. Reviewing accessioning results
    • 7. Collection Groups
  • Technical
    • 1. Repositories
    • 2. APIs
    • 3. Issues
  • Development Roadmap
    • v3.5: Toponym Phonetics
      • 1. Overview
      • 2. Elastic Management Guide
      • 3. Infrastructure Summary
      • 4. Components
      • 5. Data Flow
      • 6. Elasticsearch Index Design
      • 7. Training the Phonetic Similarity Model
      • 8. Query Pipeline
      • 9. Advantages of This Architecture
      • 10. Monitoring & Observability
      • 11. Future Extensions
      • 12. Deployment Plan
      • 13. Risk Assessment
      • 14. Success Criteria
      • 15. Summary & References
    • V4: Graph Datamodel
      • 1. User Guide
        • 1.1.1. Quick Start Guide
        • 1.1.2. Understanding WHG Concepts
        • 1.1.3. Place Record Anatomy
        • 1.1.4. Contributing Data Overview
        • 1.1.5. Reconciliation Overview
        • 1.1.6. Tutorial: Creating a Historical Route
        • 1.1.7. Frequently Asked Questions
        • 1.1.8. Glossary
      • 2. Open Educational Resources (OER)
      • 3. Data Model
        • 3.1. Introduction
        • 3.2. Overview
        • 3.3. Attestations & Relations
        • 3.4. Vocabularies
        • 3.5. Special Thing Patterns
        • 3.6. Contribution Types & Data Formats
        • 3.7. RDF Representation
        • 3.8. Platform Use Cases
        • 3.9. Implementation in ArangoDB
        • 3.10. Summary & Future Directions
      • 4. System Architecture
        • 4.1. Database Technology Assessment
        • 4.2. Kubernetes Configuration
        • 4.3. SSH Key Setup
        • 4.4. Deploying the Management Pod
        • 4.5. Deploying Services
        • 4.6. Service Configuration
  • License
Back to top
View this page
Edit this page
  • Introduction
    • Vision
    • Mission
  • Guides & Tutorials
    • 1. Our Indexes
      • 1.1. Wikidata+GeoNames Index (Augmentation)
      • 1.2. WHG Publication Index (Searchable Publication)
      • 1.3. WHG Union Index (Final Integration & Clustering)
    • 2. Workbench
      • 2.1. Individual datasets
      • 2.2. Multiple datasets
      • 2.3. Thematic place collections
        • 2.3.1. Instructional exercise in a class setting, or workshop
        • 2.3.2. Authored publication
    • 3. Publishing Data
      • 3.1. Create and publish a Place Collection
      • 3.2. Create and publish a Dataset Collection
    • 4. Uploading Data
      • 4.1. Choosing an upload data format: LPF or LP-TSV?
      • 4.2. Preparing data for upload
        • 4.2.1. The simple case
        • 4.2.2. The not so simple case: extracting places
    • 5. Reconciliation & Accessioning
      • 5.1. What does closeMatch mean?
    • 6. Reviewing accessioning results
    • 7. Collection Groups
      • 7.1. Create and manage a Collection Group for a class or workshop
  • Technical
    • 1. Repositories
    • 2. APIs
      • 2.1. Entity API
      • 2.2. Reconciliation Service API
        • 2.2.1. Using the WHG Reconciliation API in OpenRefine
      • 2.3. API Tokens
        • 2.3.1. Using an API Token
    • 3. Issues
  • Development Roadmap
    • v3.5: Toponym Phonetics
      • 1. Overview
        • 1.1. Goals
        • 1.2. Why Not Elasticsearch’s Built-in Phonetic Analysis?
        • 1.3. Limitations
        • 1.4. Architecture Summary
        • 1.5. Data Sources
      • 2. Elastic Management Guide
        • 2.1. Table of Contents
        • 2.2. Architecture Overview
        • 2.3. Installation
        • 2.4. Configuration
        • 2.5. Storage Architecture
        • 2.6. Production Instance (VM)
        • 2.7. Staging Instance (Slurm)
        • 2.8. Authority Data Ingestion
        • 2.9. Index Management
        • 2.10. Snapshot Management
        • 2.11. Production Deployment
        • 2.12. Health Monitoring
        • 2.13. Troubleshooting
        • 2.14. Quick Reference
      • 3. Infrastructure Summary
        • 3.1. Overview
        • 3.2. Architecture
        • 3.3. Authority Data Sources
        • 3.4. WHG-Contributed Datasets
        • 3.5. Phonetic Search
        • 3.6. Index Schemas
        • 3.7. Deployment Strategy
        • 3.8. Storage Requirements
        • 3.9. Snapshot Strategy
        • 3.10. Resource Summary
        • 3.11. Operational Commands
        • 3.12. Directory Structure
        • 3.13. References
      • 4. Components
        • 4.1. Infrastructure
        • 4.2. Elasticsearch Indices
        • 4.3. Processing Components
      • 5. Data Flow
        • 5.1. Authority File Ingestion
        • 5.2. WHG-Contributed Dataset Ingestion
        • 5.3. Embedding Generation
        • 5.4. Incremental Updates
      • 6. Elasticsearch Index Design
        • 6.1. Index Schemas
        • 6.2. Ingest Pipelines
        • 6.3. HNSW Configuration
        • 6.4. Analysers
      • 7. Training the Phonetic Similarity Model
        • 7.1. 1. Architecture Overview
        • 7.2. 2. Key Features
        • 7.3. 3. Installation
        • 7.4. 4. The Training Pipeline
        • 7.5. 5. Python Inference API
        • 7.6. 6. Configuration
      • 8. Query Pipeline
        • 8.1. Search Strategy
        • 8.2. Elasticsearch Query Structure
        • 8.3. Error Handling
        • 8.4. Performance Optimisation
      • 9. Advantages of This Architecture
        • 9.1. Two-Instance Isolation
        • 9.2. Unified Infrastructure
        • 9.3. Scalable Storage Tiers
        • 9.4. Zero-Downtime Deployments
        • 9.5. Flexible Embedding Generation
        • 9.6. Toponym Deduplication
        • 9.7. Efficient Phonetic Search
        • 9.8. Graceful Degradation
        • 9.9. Unified Search Across Sources
        • 9.10. Maintainability
        • 9.11. Research Reproducibility
      • 10. Monitoring & Observability
        • 10.1. Key Metrics
        • 10.2. Health Check Endpoints
        • 10.3. Log Locations
        • 10.4. Dashboards
        • 10.5. Alerting
        • 10.6. Runbook: Common Issues
      • 11. Future Extensions
        • 11.1. Short-Term Enhancements
        • 11.2. Medium-Term Development
        • 11.3. Long-Term Vision
      • 12. Deployment Plan
        • 12.1. Phase 1: Infrastructure Setup
        • 12.2. Phase 2: Core Index Population
        • 12.3. Phase 3: Model Training
        • 12.4. Phase 4: Embedding Generation
        • 12.5. Phase 5: Query Integration
        • 12.6. Phase 6: Production Rollout
        • 12.7. Ongoing Operations
      • 13. Risk Assessment
        • 13.1. Technical Risks
        • 13.2. Operational Risks
        • 13.3. Data Quality Risks
        • 13.4. Mitigation Strategies
      • 14. Success Criteria
        • 14.1. Technical Metrics
        • 14.2. User Experience Metrics
        • 14.3. Research Impact
        • 14.4. Acceptance Criteria by Phase
      • 15. Summary & References
        • 15.1. Key Design Decisions
        • 15.2. Technology Stack
        • 15.3. Storage Requirements
        • 15.4. Timeline
        • 15.5. References
    • V4: Graph Datamodel
      • 1. User Guide
        • 1.1. Note to Documentation Team
        • 1.2. Getting Help
      • 2. Open Educational Resources (OER)
        • 2.1. Vision
        • 2.2. Strategic Goals
        • 2.3. Technical Requirements
      • 3. Data Model
        • 3.1. Introduction
        • 3.2. Overview
        • 3.3. Attestations & Relations
        • 3.4. Vocabularies
        • 3.5. Special Thing Patterns
        • 3.6. Contribution Types & Data Formats
        • 3.7. RDF Representation
        • 3.8. Platform Use Cases
        • 3.9. Implementation in ArangoDB
        • 3.10. Summary & Future Directions
      • 4. System Architecture
        • 4.1. Database Technology Assessment
        • 4.2. Kubernetes Configuration
        • 4.3. SSH Key Setup
        • 4.4. Deploying the Management Pod
        • 4.5. Deploying Services
        • 4.6. Service Configuration
  • License
    • Creative Commons Attribution-NonCommercial 4.0 International Public License
      • Section 1 – Definitions.
      • Section 2 – Scope.
      • Section 3 – License Conditions.
      • Section 4 – Sui Generis Database Rights.
      • Section 5 – Disclaimer of Warranties and Limitation of Liability.
      • Section 6 – Term and Termination.
      • Section 7 – Other Terms and Conditions.
      • Section 8 – Interpretation.
Copyright ©2017–2025 World Historical Gazetteer
Last updated on 22 December 2025