3.2. Overview¶
This overview introduces how the WHG v4 data model uses Attestations as the cornerstone of its design. The sections that follow detail the core entities (Things, Names, Geometries, Timespans) and how they interact through the attestation pattern to create a rich, provenance-tracked knowledge graph capable of representing the full complexity of historical geographic information.
3.2.1. Core Entities¶
erDiagram
%% Single edge collection connecting all entities
ATTESTATION }o--|| EDGE : "attests_name, attests_geometry, attests_timespan"
ATTESTATION }o--|| EDGE : "typed_by, sourced_by, relates_to, meta_attestation"
EDGE }o--|| ATTESTATION : "subject_of, meta_attestation"
THING }o--|| EDGE : "subject_of"
THING ||--}o EDGE : "relates_to"
AUTHORITY }o--|| EDGE : "part_of"
AUTHORITY ||--}o EDGE : "typed_by, sourced_by, part_of"
EDGE }o--|| NAME : "attests_name"
EDGE }o--|| GEOMETRY : "attests_geometry"
EDGE }o--|| TIMESPAN : "attests_timespan"
%% Single unified edge collection
EDGE {
string _key PK
string _from "any collection/xyz"
string _to "any collection/abc"
string edge_type "subject_of, attests_name, attests_geometry, attests_timespan, relates_to, meta_attestation, typed_by, sourced_by, part_of"
string meta_type "bundles, contradicts, supersedes, challenges, supports (for meta_attestations only)"
json properties "flexible storage for edge-specific attributes"
timestamp created
}
%% Core entity collections (document collections - vertices/nodes)
THING {
string _key PK "ArangoDB key"
string _id "things/xyz"
text description
string thing_type "location, historical_entity, collection, period, route, itinerary, network"
string primary_name "denormalized from highest-certainty name attestation"
point representative_point "denormalized from geometry for spatial indexing"
timestamp created
timestamp modified
}
NAME {
string _key PK
string _id "names/xyz"
string name
string language "ISO 639-3"
string script "ISO 15924"
array name_type "toponym, chrononym, ethnonym, odonym, hydronym"
string ipa "International Phonetic Alphabet"
string romanized "romanized/transliterated form"
string transliteration_system "e.g., Pinyin, BGN/PCGN, ISO 259"
vector embedding "REQUIRED: 256-dimensional vector for phonetic search"
}
TIMESPAN {
string _key PK
string _id "timespans/xyz"
bigint start_earliest "Unix timestamp (milliseconds) or geological time"
bigint start_latest "Unix timestamp (milliseconds)"
bigint stop_earliest "Unix timestamp (milliseconds)"
bigint stop_latest "Unix timestamp (milliseconds) or future sentinel"
string label "human-readable period name"
string precision "year, decade, century, era, geological_period"
integer precision_value "numeric precision in years"
string periodo_id "PeriodO URI for standard period definitions"
}
ATTESTATION {
string _key PK
string _id "attestations/xyz"
integer sequence "for ordered sequences in routes/itineraries (nullable)"
json connection_metadata "for networks: trade goods, flow direction, route_type"
float certainty "0.0 to 1.0 (nullable if unknown)"
text certainty_note "explanation of certainty assessment"
text notes "additional context or commentary"
timestamp created
timestamp modified
}
GEOMETRY {
string _key PK
string _id "geometries/xyz"
geometry geom "GeoJSON: Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon"
point representative_point "single point for spatial indexing and distance queries"
geometry hull "convex hull for quick spatial filters"
array bbox "[min_lon, min_lat, max_lon, max_lat]"
array precision "e.g., [exact, approximate, uncertain, historical_approximate]"
array precision_km "uncertainty radius in km (can be multiple if heterogeneous)"
string source_crs "EPSG:4326 or historical/custom CRS identifier"
}
%% Authority collection (single table inheritance)
AUTHORITY {
string _key PK
string _id "authorities/xyz"
string authority_type "dataset, source, relation_type, period, classification"
text description "general description applicable to all types"
timestamp created
timestamp modified
%% Dataset fields
string title "for datasets: dataset name"
string version "for datasets: version identifier"
string publisher "for datasets: publishing institution"
string license "for datasets: CC-BY, CC0, etc"
string doi "for datasets: persistent identifier doi:10.83427/whg-dataset-123"
%% Source fields
string citation "for sources: bibliographic citation"
array source_type "for sources: manuscript, inscription, archaeological, published, etc"
string record_id "for sources: identifier in original source/dataset"
%% Relation type fields
string label "for relation_types, periods, classifications: machine-readable identifier"
string inverse "for relation_types: inverse relation label"
array domain "for relation_types: valid subject entity types"
array range "for relation_types: valid object entity types"
%% Period fields (from PeriodO)
bigint start_earliest "for periods: temporal bounds"
bigint start_latest "for periods: temporal bounds"
bigint stop_earliest "for periods: temporal bounds"
bigint stop_latest "for periods: temporal bounds"
%% Classification fields
string classification_system "for classifications: geonames_fclasses, aat_getty, custom"
string classification_code "for classifications: A.ADM1, P.PPLA, H.STM, S.ARCH, etc"
string classification_label "for classifications: human-readable name"
%% Common fields
string uri "for all types: external URI (PeriodO, source URL, dataset landing page, authority gazetteer)"
}
Fig. 3.2 Entity–relationship diagram for the WHG v4 data model.¶
3.2.2. Core Design Philosophy¶
The WHG data model is built around a property graph structure where information is represented as:
Things: Primary entities (locations, historical entities, collections, periods, routes, itineraries, networks)
Attributes: Descriptions of Things (Names, Geometries, Timespans)
Attestations: Source-backed claims connecting Things to attributes or other Things
This approach enables WHG to:
Capture multiple scholarly perspectives
Model uncertainty explicitly
Preserve complete provenance
Track temporal change
Represent complex networks of relationships
3.2.3. The Thing Entity¶
3.2.3.1. What is a Thing?¶
A Thing is the primary entity in WHG - any object of scholarly interest that can be described and related to other entities. The term “Thing” is borrowed from schema.org’s root type, chosen for its generality and future-extensibility.
Types of Things in WHG:
Locations: Geographic places (cities, regions, landmarks, etc.)
Historical entities: Political entities, empires, states
Collections: Curated sets of related Things
Periods: Temporal spans with cultural/historical significance
Routes: Ordered sequences of locations representing journeys
Itineraries: Specific instances of travel along routes
Networks: Systems of interconnected Things
3.2.3.2. Why “Thing”?¶
The term may seem informal, but it offers crucial advantages:
Generality: Accommodates any type of entity without forcing artificial classifications. A medieval monastery might be simultaneously a location, a religious institution, and a network node - “Thing” encompasses all these facets.
Extensibility: As WHG could eventually evolve to include new place-linked entity types (people, events, documents), “Thing” remains applicable without terminology shifts.
Interoperability: Aligns with schema.org’s vocabulary, facilitating linked data integration and semantic web compatibility.
Philosophical honesty: Acknowledges that historical entities resist rigid categorization. What we call “Byzantium” refers to a complex, evolving reality that was simultaneously a place, an empire, an idea, and a cultural sphere.
3.2.3.3. Thing Structure¶
A Thing in WHG consists of:
{
"id": "whg:12345",
"thing_type": "location",
"description": "Major Byzantine/Ottoman city on the Bosphorus",
"created": "2023-01-15T10:30:00Z",
"modified": "2024-02-20T14:45:00Z"
}
Key Properties:
id: Unique persistent identifier (URI)thing_type: Classification (location, historical_entity, collection, period, route, itinerary, network)description: Human-readable summarycreated,modified: Temporal metadata for curation
Notably absent: Names, coordinates, dates, types, relations. These are all asserted through Attestations, not intrinsic to the Thing itself.
3.2.3.4. The Separation Principle¶
WHG separates the Thing itself from descriptions of the Thing. This distinction is crucial:
The Thing: The abstract entity that existed/exists in reality Descriptions: Claims about that Thing from various sources at various times
This enables WHG to:
Accommodate disagreement (two sources, two different coordinate claims)
Track change (names evolve, boundaries shift)
Preserve provenance (who said what, when)
Model uncertainty (tentative vs. confident claims)
Example:
Thing ID
whg:12345represents the conceptual cityOne attestation claims it was called “Byzantion” (-650 to 330 CE)
Another attestation claims it was called “Constantinople” (330 to 1453 CE)
Another attestation claims it was called “Istanbul” (1453 to present)
All coexist; none overwrites the others
3.2.4. Name Entity¶
A Name represents a linguistic form by which a Thing is known.
3.2.4.1. Structure¶
{
"id": "name:67890",
"name": "القسطنطينية",
"language": "ara",
"script": "Arab",
"variant": "standard",
"transliteration": "al-Qusṭanṭīnīyah",
"ipa": "ʔalqustˤɑntˤiːnijːɐ",
"name_type": [
"toponym"
],
"embedding": [
0.123,
-0.456,
...
]
}
Key Properties:
name: The actual text in original scriptlanguage: ISO 639-3 language codescript: ISO 15924 script codevariant: Relationship to other forms (official, colloquial, historical, etc.)transliteration: Romanization for searchabilityipa: International Phonetic Alphabet representationname_type: Array of classifications (toponym, chrononym, ethnonym, etc.)embedding: Vector representation for phonetic similarity search
3.2.4.2. Name Types¶
WHG distinguishes several name types:
Toponym: Geographic place name
Chrononym: Period or era name
Ethnonym: Name for a people or ethnic group
Demonym: Name for inhabitants of a place
See Vocabularies for complete name type taxonomy.
3.2.5. Geometry Entity¶
A Geometry represents a spatial location or extent of a Thing at a particular time.
3.2.5.1. Structure¶
{
"id": "geom:11223",
"geom": {
"type": "Point",
"coordinates": [
28.9784,
41.0082
]
},
"representative_point": {
"type": "Point",
"coordinates": [
28.9784,
41.0082
]
},
"hull": {
"type": "Polygon",
"coordinates": [
[
...
]
]
},
"bbox": [
28.9,
41.0,
29.0,
41.1
],
"precision": "approximate",
"precision_km": 5,
"source_crs": "EPSG:4326"
}
Key Properties:
geom: GeoJSON geometry (Point, Polygon, LineString, Multi*)representative_point: Single point for mapping/searchhull: Convex hull of the geometrybbox: Bounding box [min_lon, min_lat, max_lon, max_lat]precision: Spatial certainty indicator (exact, approximate, uncertain)precision_km: Uncertainty radius in kilometerssource_crs: Original coordinate reference system (EPSG code or historical CRS)
3.2.5.2. Geometry Formats¶
WHG supports both GeoJSON (for internal storage and LPF export) and WKT (Well-Known Text, for GeoSPARQL compliance):
GeoJSON format (internal):
{
"type": "Point",
"coordinates": [
28.9784,
41.0082
]
}
WKT format (RDF export):
"POINT(28.9784 41.0082)"^^
geo:wktLiteral
This dual format support ensures:
Web application compatibility (GeoJSON)
Triplestore compatibility (WKT for GeoSPARQL queries)
Seamless conversion between formats on export
3.2.5.3. Why Multiple Geometries?¶
A single Thing may have multiple Geometries because:
Temporal change: Borders expand, cities relocate
Uncertainty: Multiple proposed locations
Source disagreement: Conflicting geographic claims
Representation levels: Point for searching, polygon for extent
Each Geometry is connected via an Attestation with temporal bounds and source citation.
Important Note on GeometryCollection: ArangoDB does not support the GeoJSON GeometryCollection type. For places with heterogeneous geometry sets (e.g., both point and polygon), store multiple geometry attestations—one per geometry type. This aligns naturally with the attestation model where each geometry claim is a separate evidential statement.
3.2.6. Timespan Entity¶
A Timespan represents a temporal interval with explicit uncertainty modeling.
3.2.6.1. Structure¶
{
"id": "time:33445",
"start_earliest": "0802-01-01",
"start_latest": "0802-12-31",
"end_earliest": "1431-01-01",
"end_latest": "1432-12-31",
"label": "Angkor period",
"precision": "year",
"precision_value": 1
}
Key Properties:
start_earliest,start_latest: Range of possible start datesend_earliest,end_latest: Range of possible end dateslabel: Human-readable period nameprecision: Temporal granularity (year, decade, century, era, geological_period)precision_value: Numeric precision indicator
Field Naming Convention: Internally, WHG uses end_earliest and end_latest for consistency with W3C Time Ontology and RDF representations. Some legacy documentation may reference stop_earliest and stop_latest, which are equivalent fields. Going forward, all documentation and implementations should use the “end” terminology for consistency.
3.2.6.2. Modeling Temporal Uncertainty¶
The four-date model captures uncertainty:
Certain dates: All four values identical
Uncertain start:
start_earliest≠start_latestUncertain end:
end_earliest≠end_latestFuzzy boundaries: Wide ranges (e.g., “sometime in 7th century”)
Special values:
null: Unknown or inapplicable-infinity: From geological prehistory+infinity: Into indefinite futurepresent: Current day (dynamic)
See Implementation in Database for details on null handling.
3.2.7. Attestation Entity¶
An Attestation is a source-backed claim connecting a Thing to an attribute (Name, Geometry, Timespan) or to another Thing (relationship).
3.2.7.1. Critical Clarification: Attestations as Document Collection¶
Attestations are NODES (documents in a document collection), NOT edges. This is a crucial architectural distinction:
Attestations collection: A standard document collection containing attestation metadata
Edges collection: A separate edge collection containing all graph relationships
The attestation model works through edges that connect attestation nodes to other entities. An attestation does not contain relationship fields—instead, it is connected to other entities through edges in the EDGE collection.
3.2.7.2. Structure¶
{
"id": "att:55667",
"sequence": null,
"connection_metadata": null,
"certainty": 0.95,
"certainty_note": "Well-documented in primary chronicles",
"notes": "Name used during Byzantine period",
"created": "2023-01-15T10:30:00Z",
"modified": "2024-02-20T14:45:00Z",
"contributor": "researcher@example.edu"
}
Key Properties:
sequence: Ordering for routes and itinerariesconnection_metadata: JSON object for network relationships (e.g., trade goods, flow direction)certainty: Confidence value (0.0-1.0)certainty_note: Explanation of uncertainty assessmentnotes: Additional contextcreated,modified: Temporal metadatacontributor: User or system that created the attestation
What’s NOT in the Attestation document:
No
thing_idfieldNo
relation_typefieldNo
object_typeorobject_idfieldsNo
sourcesarray
These relationships are all expressed through edges in the EDGE collection:
// Example edges connecting an attestation
{
"_from": "things/constantinople",
"_to": "attestations/att-001",
"edge_type": "subject_of"
}
{
"_from": "attestations/att-001",
"_to": "names/konstantinoupolis",
"edge_type": "attests_name"
}
{
"_from": "attestations/att-001",
"_to": "timespans/byzantine-period",
"edge_type": "attests_timespan"
}
{
"_from": "attestations/att-001",
"_to": "authorities/source-chronicle",
"edge_type": "sourced_by"
}
3.2.7.3. Attestation Types via Edge Patterns¶
Attestations connect Things to different entity types through different edge patterns:
Names: Thing → Attestation (subject_of), Attestation → Name (attests_name)
Geometries: Thing → Attestation (subject_of), Attestation → Geometry (attests_geometry)
Timespans: Thing → Attestation (subject_of), Attestation → Timespan (attests_timespan)
Classifications: Thing → Attestation (subject_of), Attestation → Authority (typed_by with classification)
Other Things: Thing → Attestation (subject_of), Attestation → Authority (typed_by with relation_type), Attestation → Thing (relates_to)
3.2.7.4. Special Attestation Features¶
For Routes and Itineraries:
sequence: Integer indicating order of waypoints along a route
For Networks:
connection_metadata: JSON storing relationship details (trade goods, volume, direction, etc.)
For All Attestations:
Can reference a Timespan via edges to indicate temporal scope
Support meta-attestations (attestations about other attestations) through edges
3.2.8. Entity Relationships¶
Entities relate through Attestations and Edges (see Attestations & Relations):
Thing --[edge: subject_of]--> Attestation
Attestation --[edge: attests_name]--> Name
Attestation --[edge: attests_geometry]--> Geometry
Attestation --[edge: attests_timespan]--> Timespan
Attestation --[edge: relates_to]--> Thing (relationships)
Attestation --[edge: meta_attestation]--> Attestation (meta-attestations)
Attestation --[edge: sourced_by]--> Authority
Every connection includes:
Edge type classification
Optional edge properties
Timestamp metadata
This creates a rich, provenance-tracked knowledge graph.
3.2.9. Entity Lifecycle¶
3.2.9.1. Creation¶
Thing created with minimal information (ID, type, description)
Attestations added to build out the entity
Multiple contributors can add Attestations
3.2.9.2. Evolution¶
New Attestations add information
Conflicting Attestations coexist
Temporal Attestations track change
Meta-Attestations comment on other Attestations
3.2.9.3. Persistence¶
Things are never deleted (only deprecated with explanation)
Attestations are versioned
Full provenance maintained
Changes auditable
3.2.10. Design Rationale¶
3.2.10.1. Why This Model?¶
Historical knowledge is complex:
Sources disagree
Information changes over time
Certainty varies
Provenance matters
Traditional models fail:
Single “truth” per field → loses scholarly debate
No temporal context → obscures change
No provenance → can’t evaluate claims
No uncertainty modeling → false precision
The Attestation model succeeds:
✅ Multiple perspectives coexist
✅ Everything is temporally situated
✅ Sources always cited
✅ Uncertainty explicitly captured
✅ Enables scholarly rigor
3.2.10.2. Influences¶
This model draws from:
Linked Data / RDF: Subject-predicate-object triples
Property Graphs: Nodes and edges with properties
Temporal Databases: Bitemporal modeling
Provenance Standards: W3C PROV
Domain models: Nomisma, Pelagios, Pleiades
3.2.10.3. Trade-offs¶
Complexity: More complex than flat records
Mitigation: Hide complexity in interfaces, provide simple views
Query complexity: Joining across attestations and edges
Mitigation: Use graph database, provide query helpers
Data entry burden: More structures to create
Mitigation: Make many fields optional, provide good defaults
Benefits outweigh costs: Richer, more honest, more scholarly.
3.2.11. Next Steps¶
Attestations: See Attestations & Relations
Vocabularies: See Controlled Vocabularies
Use Cases: See Platform Use Cases
Implementation: See Implementation in Database
RDF Representation: See RDF Representation