3.10. Summary & Future Directions¶
3.10.1. Core Architecture¶
The WHG v4 data model achieves historical place representation through a graph-based attestation architecture:
Entity nodes (documents in document collections):
Things - unified entities (places, periods, routes, networks)
Names - multilingual labels with phonetic embeddings
Geometries - spatial representations with derived fields
Timespans - temporal bounds with PeriodO integration
Attestations - evidentiary nodes containing metadata about claims
Authorities - sources, datasets, relation types, periods
Relationship edges (documents in edges collection):
Edges - typed connections between all nodes (subject_of, attests_name, attests_geometry, attests_timespan, sourced_by, typed_by, relates_to, meta_attestation, part_of)
Critical distinction: Attestations are NOT edges; they are nodes in a document collection that serve as junction points. All relationships are expressed through edges in a separate edge collection.
This separation of entities from evidence enables:
Multiple conflicting claims to coexist with source attribution
Temporal awareness throughout (not just date fields)
Flexible relationship vocabulary extensible via AUTHORITY documents
Provenance chains from any claim to original sources
3.10.2. Key Design Principles¶
Graph-first architecture: Attestations are nodes in a document collection (not edges, not embedded records), making relationships first-class citizens traversable in both directions. Edges are stored separately in an edge collection, enabling efficient graph traversal while maintaining clean separation between entities and their relationships.
Single table inheritance in AUTHORITY: One collection with authority_type discriminator replaces separate tables for sources, datasets, relation types, and periods—simplifying queries and reducing joins.
Temporal reification: Timespans as separate entities (not date fields) enable PeriodO integration, inheritance, and multiple temporal perspectives on the same Thing.
Vector embeddings as requirement: Name embeddings power cross-linguistic reconciliation and are mandatory (not optional) for toponymic matching.
Derived geometric fields: Pre-computed representative_point, bbox, hull optimize spatial queries and enable geometry inheritance.
Namespaced identifiers: Compact external references (pleiades:579885) preserve authority IDs while Django resolves to full URIs.
Consistent field naming: Timespan boundaries use end_earliest and end_latest (not stop_earliest/stop_latest) for consistency with W3C Time Ontology across all representations.
3.10.3. What This Enables¶
Beyond traditional gazetteers:
Temporal gazetteering - “What was this place called in 800 CE?” not just “Where is X?”
Historiographical depth - Multiple sources with varying certainty, not synthesized “facts”. Each attestation node preserves the evidentiary basis through sourced_by edges to Authority documents, enabling scholars to evaluate claims rather than accept aggregated results.
Network/route modeling - Connections and movements, not just point locations
Cross-cultural representation - Phonetic similarity across scripts via embeddings
Dynamic clustering - Multi-dimensional similarity (name + space + time + type) for reconciliation
Contribution flexibility - LPF, CSV, GPX, edge lists all map to graph structure
DOI integration - Dataset-level citability with persistent identifiers
3.10.4. Implementation Choice: ArangoDB¶
The graph model maps naturally to ArangoDB’s multi-model architecture:
Property graph natively represents Attestations as nodes in a document collection with edges in a separate edge collection, perfectly matching the conceptual model
GeoJSON support handles complex historical geometries (6 types)
Vector indexes (FAISS-backed) enable phonetic similarity search
Unified AQL integrates graph traversal + spatial + temporal + vector queries
Single system eliminates synchronization complexity
Key tradeoffs:
⚠️ Enterprise Edition required for production scale (Community Edition limited to 100 GiB)
⚠️ Vector search maturity requires early validation at 10M+ scale
⚠️ No
GeometryCollectionsupport (workaround: multiple geometry attestations, which aligns with attestation model)✅ Operational simplicity for small team
✅ Real-time consistency (no eventual consistency)
3.10.5. Future Extensions¶
3.10.5.1. Enhanced Relationship Types¶
Uncertainty: possibly_same_as, probably_member_of, likely_connected_to
Causation: caused_by, influenced for historical causation patterns
Meta-attestations: Attestations about other attestations for scholarly discourse modeling. Implemented through edges with edge_type: "meta_attestation" connecting attestation nodes, with properties.meta_type specifying the relationship (contradicts, supports, supersedes).
3.10.5.2. Advanced Analytics¶
Network analysis: Centrality measures, community detection, flow analysis pre-computed as Thing attributes
Temporal evolution: Track network emergence/dissolution, territorial changes, migration patterns
Machine learning: Automated reconciliation suggestions, name variant generation, temporal inference, quality assessment
3.10.5.3. Interoperability¶
IIIF integration: Link Things to IIIF manifests for maps/manuscripts
Wikidata sync: Bidirectional linking with import/export of claims
RDF exports: SPARQL endpoint for semantic web integration
Schema.org markup: Enable search engine rich snippets
3.10.6. Research Challenges¶
Algorithmic:
Efficient geometry inheritance at scale (recursive unions with caching)
Temporal reasoning with Allen’s interval algebra for complex queries
Hierarchical clustering for millions of candidates
Modeling:
Contested territories (multiple claims on same space/time)
Fuzzy boundaries (gradual transitions, borderlands)
Uncertainty propagation through inheritance chains
Interface:
Temporal visualization (time-traveling maps, animation)
Network visualization at scale with filtering
Source comparison views for conflicting claims
Graph traversal optimization:
Efficient multi-hop queries through attestation-edge chains
Caching strategies for frequently-accessed provenance paths
Index optimization for complex edge type filtering
3.10.7. Sustainability Model¶
Data quality: Editorial review workflows, community flagging, automated checks, version control
Community: Academic credit via DOI citations, teaching integration, comprehensive documentation
Technical: Open-source codebase, Pitt CRC hosting, regular updates, standards compliance
Governance: Conflict resolution processes, scholarly debate documentation, multiple perspectives preserved
3.10.8. Architectural Clarity¶
The WHG v4 model’s success depends on maintaining clear architectural boundaries:
Attestations (document collection):
Store only metadata: certainty, notes, sequence, connection_metadata, timestamps
Serve as junction points in the graph
Enable bundling multiple claims with shared provenance
Edges (edge collection):
Express all relationships: subject_of, attests_name, attests_geometry, attests_timespan, sourced_by, typed_by, relates_to, meta_attestation
Enable efficient graph traversal
Support complex multi-hop queries
Authorities (document collection):
Provide controlled vocabularies and reference data
Unified collection with
authority_typediscriminatorEliminate redundancy through reference rather than duplication
This separation enables:
Flexible relationship vocabulary without schema changes
Efficient querying of provenance chains
Multiple perspectives on the same entities
Scholarly rigor through explicit source attribution
3.10.9. Conclusion¶
The attestation-based graph architecture positions WHG as:
Not just a gazetteer - a knowledge graph of historical geographic claims
Not just place lookup - a platform for critical historical inquiry
Not just where - but when, according to whom, with what certainty
This distinguishes WHG from:
Commercial mapping platforms (no temporal depth, no source transparency)
Search engines/LLMs (no provenance, flatten uncertainty into false consensus)
Traditional gazetteers (point locations without networks/routes, limited temporality)
The graph model scales from simple place records to complex research questions about historical geography—from student contributions to scholarly debates—while maintaining rigorous provenance and enabling computational analysis impossible with document-oriented or relational approaches.