1. Overview¶
This document outlines the complete integration plan for adding multilingual phonetic search to the existing WHG stack (Django + PostgreSQL/PostGIS + Elasticsearch). The goal is to support IPA-based matching, phonetic embeddings, and robust cross-lingual similarity without replacing existing infrastructure.
The system is split into two operational domains:
Online stack (DigitalOcean)
Django, PostgreSQL/PostGIS, Elasticsearch indices, query-time IPA conversion, and real-time search.Offline phonetic pipeline (Pitt CRC)
Bulk IPA generation, embedding creation, Siamese model training, and ingestion of enriched data back into Elasticsearch.
Reference: Full technical background in WHG Place Discussion #81