3.2. Overview

v4.0-beta

This overview introduces how the WHG v4 data model uses Attestations as the cornerstone of its design. The sections that follow detail the core entities (Things, Names, Geometries, Timespans) and how they interact through the attestation pattern to create a rich, provenance-tracked knowledge graph capable of representing the full complexity of historical geographic information.

3.2.1. Core Entities

        erDiagram
%% Single edge collection connecting all entities
    ATTESTATION }o--|| EDGE : "attests_name, attests_geometry, attests_timespan"
    ATTESTATION }o--|| EDGE : "typed_by, sourced_by, relates_to, meta_attestation"
    EDGE }o--|| ATTESTATION : "subject_of, meta_attestation"
    THING }o--|| EDGE : "subject_of"
    THING ||--}o EDGE : "relates_to"
    AUTHORITY }o--|| EDGE : "part_of"
    AUTHORITY ||--}o EDGE : "typed_by, sourced_by, part_of"
    EDGE }o--|| NAME : "attests_name"
    EDGE }o--|| GEOMETRY : "attests_geometry"
    EDGE }o--|| TIMESPAN : "attests_timespan"


%% Single unified edge collection
    EDGE {
        string _key PK
        string _from "any collection/xyz"
        string _to "any collection/abc"
        string edge_type "subject_of, attests_name, attests_geometry, attests_timespan, relates_to, meta_attestation, typed_by, sourced_by, part_of"
        string meta_type "bundles, contradicts, supersedes, challenges, supports (for meta_attestations only)"
        json properties "flexible storage for edge-specific attributes"
        timestamp created
    }


%% Core entity collections (document collections - vertices/nodes)
    THING {
        string _key PK "ArangoDB key"
        string _id "things/xyz"
        text description
        string thing_type "location, historical_entity, collection, period, route, itinerary, network"
        string primary_name "denormalized from highest-certainty name attestation"
        point representative_point "denormalized from geometry for spatial indexing"
        timestamp created
        timestamp modified
    }
    NAME {
        string _key PK
        string _id "names/xyz"
        string name
        string language "ISO 639-3"
        string script "ISO 15924"
        array name_type "toponym, chrononym, ethnonym, odonym, hydronym"
        string ipa "International Phonetic Alphabet"
        string romanized "romanized/transliterated form"
        string transliteration_system "e.g., Pinyin, BGN/PCGN, ISO 259"
        vector embedding "REQUIRED: 256-dimensional vector for phonetic search"
    }
    TIMESPAN {
        string _key PK
        string _id "timespans/xyz"
        bigint start_earliest "Unix timestamp (milliseconds) or geological time"
        bigint start_latest "Unix timestamp (milliseconds)"
        bigint stop_earliest "Unix timestamp (milliseconds)"
        bigint stop_latest "Unix timestamp (milliseconds) or future sentinel"
        string label "human-readable period name"
        string precision "year, decade, century, era, geological_period"
        integer precision_value "numeric precision in years"
        string periodo_id "PeriodO URI for standard period definitions"
    }
    ATTESTATION {
        string _key PK
        string _id "attestations/xyz"
        integer sequence "for ordered sequences in routes/itineraries (nullable)"
        json connection_metadata "for networks: trade goods, flow direction, route_type"
        float certainty "0.0 to 1.0 (nullable if unknown)"
        text certainty_note "explanation of certainty assessment"
        text notes "additional context or commentary"
        timestamp created
        timestamp modified
    }
    GEOMETRY {
        string _key PK
        string _id "geometries/xyz"
        geometry geom "GeoJSON: Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon"
        point representative_point "single point for spatial indexing and distance queries"
        geometry hull "convex hull for quick spatial filters"
        array bbox "[min_lon, min_lat, max_lon, max_lat]"
        array precision "e.g., [exact, approximate, uncertain, historical_approximate]"
        array precision_km "uncertainty radius in km (can be multiple if heterogeneous)"
        string source_crs "EPSG:4326 or historical/custom CRS identifier"
    }
%% Authority collection (single table inheritance)
    AUTHORITY {
        string _key PK
        string _id "authorities/xyz"
        string authority_type "dataset, source, relation_type, period, classification"
        text description "general description applicable to all types"
        timestamp created
        timestamp modified
        %% Dataset fields
        string title "for datasets: dataset name"
        string version "for datasets: version identifier"
        string publisher "for datasets: publishing institution"
        string license "for datasets: CC-BY, CC0, etc"
        string doi "for datasets: persistent identifier doi:10.83427/whg-dataset-123"
        %% Source fields
        string citation "for sources: bibliographic citation"
        array source_type "for sources: manuscript, inscription, archaeological, published, etc"
        string record_id "for sources: identifier in original source/dataset"
        %% Relation type fields
        string label "for relation_types, periods, classifications: machine-readable identifier"
        string inverse "for relation_types: inverse relation label"
        array domain "for relation_types: valid subject entity types"
        array range "for relation_types: valid object entity types"
        %% Period fields (from PeriodO)
        bigint start_earliest "for periods: temporal bounds"
        bigint start_latest "for periods: temporal bounds"
        bigint stop_earliest "for periods: temporal bounds"
        bigint stop_latest "for periods: temporal bounds"
        %% Classification fields
        string classification_system "for classifications: geonames_fclasses, aat_getty, custom"
        string classification_code "for classifications: A.ADM1, P.PPLA, H.STM, S.ARCH, etc"
        string classification_label "for classifications: human-readable name"
        %% Common fields
        string uri "for all types: external URI (PeriodO, source URL, dataset landing page, authority gazetteer)"
    }
    

Fig. 3.2 Entity–relationship diagram for the WHG v4 data model.


3.2.2. Core Design Philosophy

The WHG data model is built around a property graph structure where information is represented as:

  • Things: Primary entities (locations, historical entities, collections, periods, routes, itineraries, networks)

  • Attributes: Descriptions of Things (Names, Geometries, Timespans)

  • Attestations: Source-backed claims connecting Things to attributes or other Things

This approach enables WHG to:

  • Capture multiple scholarly perspectives

  • Model uncertainty explicitly

  • Preserve complete provenance

  • Track temporal change

  • Represent complex networks of relationships

3.2.3. The Thing Entity

3.2.3.1. What is a Thing?

A Thing is the primary entity in WHG - any object of scholarly interest that can be described and related to other entities. The term “Thing” is borrowed from schema.org’s root type, chosen for its generality and future-extensibility.

Types of Things in WHG:

  • Locations: Geographic places (cities, regions, landmarks, etc.)

  • Historical entities: Political entities, empires, states

  • Collections: Curated sets of related Things

  • Periods: Temporal spans with cultural/historical significance

  • Routes: Ordered sequences of locations representing journeys

  • Itineraries: Specific instances of travel along routes

  • Networks: Systems of interconnected Things

3.2.3.2. Why “Thing”?

The term may seem informal, but it offers crucial advantages:

Generality: Accommodates any type of entity without forcing artificial classifications. A medieval monastery might be simultaneously a location, a religious institution, and a network node - “Thing” encompasses all these facets.

Extensibility: As WHG could eventually evolve to include new place-linked entity types (people, events, documents), “Thing” remains applicable without terminology shifts.

Interoperability: Aligns with schema.org’s vocabulary, facilitating linked data integration and semantic web compatibility.

Philosophical honesty: Acknowledges that historical entities resist rigid categorization. What we call “Byzantium” refers to a complex, evolving reality that was simultaneously a place, an empire, an idea, and a cultural sphere.

3.2.3.3. Thing Structure

A Thing in WHG consists of:

{
  "id": "whg:12345",
  "thing_type": "location",
  "description": "Major Byzantine/Ottoman city on the Bosphorus",
  "created": "2023-01-15T10:30:00Z",
  "modified": "2024-02-20T14:45:00Z"
}

Key Properties:

  • id: Unique persistent identifier (URI)

  • thing_type: Classification (location, historical_entity, collection, period, route, itinerary, network)

  • description: Human-readable summary

  • created, modified: Temporal metadata for curation

Notably absent: Names, coordinates, dates, types, relations. These are all asserted through Attestations, not intrinsic to the Thing itself.

3.2.3.4. The Separation Principle

WHG separates the Thing itself from descriptions of the Thing. This distinction is crucial:

The Thing: The abstract entity that existed/exists in reality Descriptions: Claims about that Thing from various sources at various times

This enables WHG to:

  • Accommodate disagreement (two sources, two different coordinate claims)

  • Track change (names evolve, boundaries shift)

  • Preserve provenance (who said what, when)

  • Model uncertainty (tentative vs. confident claims)

Example:

  • Thing ID whg:12345 represents the conceptual city

  • One attestation claims it was called “Byzantion” (-650 to 330 CE)

  • Another attestation claims it was called “Constantinople” (330 to 1453 CE)

  • Another attestation claims it was called “Istanbul” (1453 to present)

  • All coexist; none overwrites the others

3.2.4. Name Entity

A Name represents a linguistic form by which a Thing is known.

3.2.4.1. Structure

{
  "id": "name:67890",
  "name": "القسطنطينية",
  "language": "ara",
  "script": "Arab",
  "variant": "standard",
  "transliteration": "al-Qusṭanṭīnīyah",
  "ipa": "ʔalqustˤɑntˤiːnijːɐ",
  "name_type": [
    "toponym"
  ],
  "embedding": [
    0.123,
    -0.456,
    ...
  ]
}

Key Properties:

  • name: The actual text in original script

  • language: ISO 639-3 language code

  • script: ISO 15924 script code

  • variant: Relationship to other forms (official, colloquial, historical, etc.)

  • transliteration: Romanization for searchability

  • ipa: International Phonetic Alphabet representation

  • name_type: Array of classifications (toponym, chrononym, ethnonym, etc.)

  • embedding: Vector representation for phonetic similarity search

3.2.4.2. Name Types

WHG distinguishes several name types:

  • Toponym: Geographic place name

  • Chrononym: Period or era name

  • Ethnonym: Name for a people or ethnic group

  • Demonym: Name for inhabitants of a place

See Vocabularies for complete name type taxonomy.

3.2.5. Geometry Entity

A Geometry represents a spatial location or extent of a Thing at a particular time.

3.2.5.1. Structure

{
  "id": "geom:11223",
  "geom": {
    "type": "Point",
    "coordinates": [
      28.9784,
      41.0082
    ]
  },
  "representative_point": {
    "type": "Point",
    "coordinates": [
      28.9784,
      41.0082
    ]
  },
  "hull": {
    "type": "Polygon",
    "coordinates": [
      [
        ...
      ]
    ]
  },
  "bbox": [
    28.9,
    41.0,
    29.0,
    41.1
  ],
  "precision": "approximate",
  "precision_km": 5,
  "source_crs": "EPSG:4326"
}

Key Properties:

  • geom: GeoJSON geometry (Point, Polygon, LineString, Multi*)

  • representative_point: Single point for mapping/search

  • hull: Convex hull of the geometry

  • bbox: Bounding box [min_lon, min_lat, max_lon, max_lat]

  • precision: Spatial certainty indicator (exact, approximate, uncertain)

  • precision_km: Uncertainty radius in kilometers

  • source_crs: Original coordinate reference system (EPSG code or historical CRS)

3.2.5.2. Geometry Formats

WHG supports both GeoJSON (for internal storage and LPF export) and WKT (Well-Known Text, for GeoSPARQL compliance):

GeoJSON format (internal):

{
  "type": "Point",
  "coordinates": [
    28.9784,
    41.0082
  ]
}

WKT format (RDF export):

"POINT(28.9784 41.0082)"^^
geo:wktLiteral
        

This dual format support ensures:

  • Web application compatibility (GeoJSON)

  • Triplestore compatibility (WKT for GeoSPARQL queries)

  • Seamless conversion between formats on export

3.2.5.3. Why Multiple Geometries?

A single Thing may have multiple Geometries because:

  • Temporal change: Borders expand, cities relocate

  • Uncertainty: Multiple proposed locations

  • Source disagreement: Conflicting geographic claims

  • Representation levels: Point for searching, polygon for extent

Each Geometry is connected via an Attestation with temporal bounds and source citation.

Important Note on GeometryCollection: ArangoDB does not support the GeoJSON GeometryCollection type. For places with heterogeneous geometry sets (e.g., both point and polygon), store multiple geometry attestations—one per geometry type. This aligns naturally with the attestation model where each geometry claim is a separate evidential statement.

3.2.6. Timespan Entity

A Timespan represents a temporal interval with explicit uncertainty modeling.

3.2.6.1. Structure

{
  "id": "time:33445",
  "start_earliest": "0802-01-01",
  "start_latest": "0802-12-31",
  "end_earliest": "1431-01-01",
  "end_latest": "1432-12-31",
  "label": "Angkor period",
  "precision": "year",
  "precision_value": 1
}

Key Properties:

  • start_earliest, start_latest: Range of possible start dates

  • end_earliest, end_latest: Range of possible end dates

  • label: Human-readable period name

  • precision: Temporal granularity (year, decade, century, era, geological_period)

  • precision_value: Numeric precision indicator

Field Naming Convention: Internally, WHG uses end_earliest and end_latest for consistency with W3C Time Ontology and RDF representations. Some legacy documentation may reference stop_earliest and stop_latest, which are equivalent fields. Going forward, all documentation and implementations should use the “end” terminology for consistency.

3.2.6.2. Modeling Temporal Uncertainty

The four-date model captures uncertainty:

  • Certain dates: All four values identical

  • Uncertain start: start_earlieststart_latest

  • Uncertain end: end_earliestend_latest

  • Fuzzy boundaries: Wide ranges (e.g., “sometime in 7th century”)

Special values:

  • null: Unknown or inapplicable

  • -infinity: From geological prehistory

  • +infinity: Into indefinite future

  • present: Current day (dynamic)

See Implementation in Database for details on null handling.

3.2.7. Attestation Entity

An Attestation is a source-backed claim connecting a Thing to an attribute (Name, Geometry, Timespan) or to another Thing (relationship).

3.2.7.1. Critical Clarification: Attestations as Document Collection

Attestations are NODES (documents in a document collection), NOT edges. This is a crucial architectural distinction:

  • Attestations collection: A standard document collection containing attestation metadata

  • Edges collection: A separate edge collection containing all graph relationships

The attestation model works through edges that connect attestation nodes to other entities. An attestation does not contain relationship fields—instead, it is connected to other entities through edges in the EDGE collection.

3.2.7.2. Structure

{
  "id": "att:55667",
  "sequence": null,
  "connection_metadata": null,
  "certainty": 0.95,
  "certainty_note": "Well-documented in primary chronicles",
  "notes": "Name used during Byzantine period",
  "created": "2023-01-15T10:30:00Z",
  "modified": "2024-02-20T14:45:00Z",
  "contributor": "researcher@example.edu"
}

Key Properties:

  • sequence: Ordering for routes and itineraries

  • connection_metadata: JSON object for network relationships (e.g., trade goods, flow direction)

  • certainty: Confidence value (0.0-1.0)

  • certainty_note: Explanation of uncertainty assessment

  • notes: Additional context

  • created, modified: Temporal metadata

  • contributor: User or system that created the attestation

What’s NOT in the Attestation document:

  • No thing_id field

  • No relation_type field

  • No object_type or object_id fields

  • No sources array

These relationships are all expressed through edges in the EDGE collection:

// Example edges connecting an attestation
{
  "_from": "things/constantinople",
  "_to": "attestations/att-001",
  "edge_type": "subject_of"
}

{
  "_from": "attestations/att-001",
  "_to": "names/konstantinoupolis",
  "edge_type": "attests_name"
}

{
  "_from": "attestations/att-001",
  "_to": "timespans/byzantine-period",
  "edge_type": "attests_timespan"
}

{
  "_from": "attestations/att-001",
  "_to": "authorities/source-chronicle",
  "edge_type": "sourced_by"
}

3.2.7.3. Attestation Types via Edge Patterns

Attestations connect Things to different entity types through different edge patterns:

  1. Names: Thing → Attestation (subject_of), Attestation → Name (attests_name)

  2. Geometries: Thing → Attestation (subject_of), Attestation → Geometry (attests_geometry)

  3. Timespans: Thing → Attestation (subject_of), Attestation → Timespan (attests_timespan)

  4. Classifications: Thing → Attestation (subject_of), Attestation → Authority (typed_by with classification)

  5. Other Things: Thing → Attestation (subject_of), Attestation → Authority (typed_by with relation_type), Attestation → Thing (relates_to)

3.2.7.4. Special Attestation Features

For Routes and Itineraries:

  • sequence: Integer indicating order of waypoints along a route

For Networks:

  • connection_metadata: JSON storing relationship details (trade goods, volume, direction, etc.)

For All Attestations:

  • Can reference a Timespan via edges to indicate temporal scope

  • Support meta-attestations (attestations about other attestations) through edges

3.2.8. Entity Relationships

Entities relate through Attestations and Edges (see Attestations & Relations):

Thing --[edge: subject_of]--> Attestation
Attestation --[edge: attests_name]--> Name
Attestation --[edge: attests_geometry]--> Geometry
Attestation --[edge: attests_timespan]--> Timespan
Attestation --[edge: relates_to]--> Thing (relationships)
Attestation --[edge: meta_attestation]--> Attestation (meta-attestations)
Attestation --[edge: sourced_by]--> Authority

Every connection includes:

  • Edge type classification

  • Optional edge properties

  • Timestamp metadata

This creates a rich, provenance-tracked knowledge graph.

3.2.9. Entity Lifecycle

3.2.9.1. Creation

  • Thing created with minimal information (ID, type, description)

  • Attestations added to build out the entity

  • Multiple contributors can add Attestations

3.2.9.2. Evolution

  • New Attestations add information

  • Conflicting Attestations coexist

  • Temporal Attestations track change

  • Meta-Attestations comment on other Attestations

3.2.9.3. Persistence

  • Things are never deleted (only deprecated with explanation)

  • Attestations are versioned

  • Full provenance maintained

  • Changes auditable

3.2.10. Design Rationale

3.2.10.1. Why This Model?

Historical knowledge is complex:

  • Sources disagree

  • Information changes over time

  • Certainty varies

  • Provenance matters

Traditional models fail:

  • Single “truth” per field → loses scholarly debate

  • No temporal context → obscures change

  • No provenance → can’t evaluate claims

  • No uncertainty modeling → false precision

The Attestation model succeeds:

  • ✅ Multiple perspectives coexist

  • ✅ Everything is temporally situated

  • ✅ Sources always cited

  • ✅ Uncertainty explicitly captured

  • ✅ Enables scholarly rigor

3.2.10.2. Influences

This model draws from:

  • Linked Data / RDF: Subject-predicate-object triples

  • Property Graphs: Nodes and edges with properties

  • Temporal Databases: Bitemporal modeling

  • Provenance Standards: W3C PROV

  • Domain models: Nomisma, Pelagios, Pleiades

3.2.10.3. Trade-offs

Complexity: More complex than flat records

  • Mitigation: Hide complexity in interfaces, provide simple views

Query complexity: Joining across attestations and edges

  • Mitigation: Use graph database, provide query helpers

Data entry burden: More structures to create

  • Mitigation: Make many fields optional, provide good defaults

Benefits outweigh costs: Richer, more honest, more scholarly.

3.2.11. Next Steps